Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There should be a way to turn the questions we ask LLMs into benchmarks.

That way, we can have a benchmark that is always up to date.

 help



There are a few “updating” benchmarks out there. I periodically take a look at these two:

https://swe-rebench.com/

https://livebench.ai/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: