←back to thread

467 points mraniki | 1 comments | | HN request time: 0.211s | source
1. stared ◴[] No.43535624[source]
At this level, it is very contextual - depending on your tools, prompts, language, libraries, and the whole code base. For example, for one project, I am generating ggplot2 code in R; Claude 3.5 gives way better results than the newer Claude 3.7.

Compare and contrast https://aider.chat/docs/leaderboards/, https://web.lmarena.ai/leaderboard, https://livebench.ai/#/.