(www.anthropic.com)

2127 points bakugo | 1 comments | 24 Feb 25 18:28 UTC | HN request time: 1.793s | source

Show context

azinman2 ◴[24 Feb 25 18:56 UTC] No.43163378[source]▶

To me the biggest surprise was seeking grok dominate in all of their published benchmarks. I haven’t seen any benchmarks of it yet (which I take with a giant heap of salt), but it’s still interesting nevertheless.

I’m rooting for Anthropic.

replies(4): >>43163397 #>>43163430 #>>43163485 #>>43163938 #

1. koakuma-chan ◴[24 Feb 25 19:39 UTC] No.43163938[source]▶

>>43163378 #

Grok does the most thinking out of all models I tried (it can think for 2+ minutes), and that's why it is so good, though I haven't tried Claude 3.7 yet.

↑

Claude 3.7 Sonnet and Claude Code