←back to thread

544 points tosh | 2 comments | | HN request time: 0s | source
Show context
simonw ◴[] No.43464243[source]
32B is one of my favourite model sizes at this point - large enough to be extremely capable (generally equivalent to GPT-4 March 2023 level performance, which is when LLMs first got really useful) but small enough you can run them on a single GPU or a reasonably well specced Mac laptop (32GB or more).
replies(9): >>43464289 #>>43464380 #>>43464443 #>>43464588 #>>43464688 #>>43467991 #>>43468940 #>>43469099 #>>43470619 #
YetAnotherNick ◴[] No.43464443[source]
I don't think these models are GPT-4 level. Yes they seem to be on benchmarks, but it has been known that models increasingly use A/B testing in dataset curation and synthesis(using GPT 4 level models) to optimize not just the benchmarks but things which could be benchmarked like academics.
replies(2): >>43464533 #>>43468989 #
simonw ◴[] No.43464533[source]
I'm not talking about GPT-4o here - every benchmark I've seen has had the new models from the past ~12 months out-perform the March 2023 GPT-4 model.

To pick just the most popular one, https://lmarena.ai/?leaderboard= has GPT-4-0314 ranked 83rd now.

replies(1): >>43465368 #
th0ma5 ◴[] No.43465368[source]
How have you been able to tie benchmark results to better results?
replies(1): >>43465877 #
1. simonw ◴[] No.43465877{3}[source]
Vibes and intuition. Not much more than that.
replies(1): >>43474204 #
2. th0ma5 ◴[] No.43474204[source]
Don't you think that presenting this as learning or knowledge is unethical?