←back to thread

555 points maheshrijal | 1 comments | | HN request time: 0s | source
Show context
rsanheim ◴[] No.43708890[source]
`ETOOMANYMODELS`

Is there a reputable, non-blogspam site that offers a 'cheat sheet' of sorts for what models to use, in particular for development? Not just openAI, but across the main cloud offerings and feasible local models?

I know there are the benchmarks, and directories like huggingface, and you can get a 'feel' for things by scanning threads here or other forums.

I'm thinking more of something that provides use-case tailored "top 3" choices by collecting and summarizing different data points. For example:

* agent & tool based dev (cloud) - [top 3 models] * agent & tool based dev (local) - m1, m2, m,3 * code review / high level analysis - ... * general tech questions - ... * technical writing (ADRs, needs assessments, etc) - ...

Part of the problem is how quickly the landscape changes everyday, and also just relying on benchmarks isn't enough: it ignores cost, and more importantly ignores actual user experience (which I realize is incredibly hard to aggregate & quantify).

replies(3): >>43711451 #>>43713256 #>>43714126 #
1. departed ◴[] No.43713256[source]
LMArena might have some of the information you are looking for. It offers rankings of LLM models across main cloud offerings, and I feel that its evaluation method, human prompting and voting, is closer to real-world use case and less prone to data contamination than benchmarks.

https://lmarena.ai/

In the "Leaderboard">"Language" tab, it lists the top models in various categories such as overall, coding, math, and creative writing.

In the "Leaderboard">"Price Analysis" tab, it shows a chart comparing models by cost per million tokens.

In the "Prompt-to-Leaderboard" tab, there is even an LLM to help you find LLMs -- you enter a prompt, and it will find the top models for your particular prompt.