Most active commenters

weatherlite(3)
KoolKat23(3)

Popular/hot comments

>>43171064 #
>>43181666 #

←back to thread

Claude 3.7 Sonnet and Claude Code

(www.anthropic.com)

Show context

anotherpaulg ◴[24 Feb 25 20:40 UTC] No.43164684[source]▶

>>43163011 (OP) #

Claude 3.7 Sonnet scored 60.4% on the aider polyglot leaderboard [0], WITHOUT USING THINKING.

Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest non-thinking score, taking that title from Sonnet 3.5.

Aider 0.75.0 is out with support for 3.7 Sonnet [1].

Thinking support and thinking benchmark results coming soon.

[0] https://aider.chat/docs/leaderboards/

[1] https://aider.chat/HISTORY.html#aider-v0750

replies(18): >>43164827 #>>43165382 #>>43165504 #>>43165555 #>>43165786 #>>43166186 #>>43166253 #>>43166387 #>>43166478 #>>43166688 #>>43166754 #>>43166976 #>>43167970 #>>43170020 #>>43172076 #>>43173004 #>>43173088 #>>43176914 #

anotherpaulg ◴[25 Feb 25 00:46 UTC] No.43166754[source]▶

>>43164684 #

Using up to 32k thinking tokens, Sonnet 3.7 set SOTA with a 64.9% score.

  65% Sonnet 3.7, 32k thinking
  64% R1+Sonnet 3.5
  62% o1 high
  60% Sonnet 3.7, no thinking
  60% o3-mini high
  57% R1
  52% Sonnet 3.5

replies(4): >>43167134 #>>43168719 #>>43168852 #>>43169016 #

mikae1 ◴[25 Feb 25 06:31 UTC] No.43168852[source]▶

>>43166754 #

It's clear that progress is incremental at this point. At the same time Anthropic and OpenAI are bleeding money.

It's unclear to me how they'll shift to making money while providing almost no enhanced value.

replies(1): >>43168989 #

1. khafra ◴[25 Feb 25 06:52 UTC] No.43168989[source]▶

>>43168852 #

Yudkowsky just mentioned that even if LLM progress stopped right here, right now, there are enough fundamental economic changes to provide us a really weird decade. Even with no moat, if the labs are in any way placed to capture a little of the value they've created, they could make high multiples of their investors' money.

replies(5): >>43169795 #>>43169803 #>>43170002 #>>43171064 #>>43175528 #

2. jonplackett ◴[25 Feb 25 09:18 UTC] No.43169795[source]▶

>>43168989 (TP) #

Yep totally agree. It will also depend who captures the most eyeballs.

ChatGPT is already my default first place to check something, where it was Google for the previous 20+ years.

replies(2): >>43171092 #>>43174752 #

3. Amekedl ◴[25 Feb 25 09:19 UTC] No.43169803[source]▶

>>43168989 (TP) #

Oh really? How are these changes supposed to look like? Who will pay up essentially? I don't really see it, aside from the m$ business case of offering AI as a guise for violating privacy much harsher to better sell ads.

4. dragonwriter ◴[25 Feb 25 09:54 UTC] No.43170002[source]▶

>>43168989 (TP) #

With no moat, they aren't placed to capture much value; moats are what stops market competition from driving prices to the zero economic profit level, and that's even without further competition from free products that are being produced by people who aren’t even trying to support themselves in the market you are selling into, which can make even the zero economic profit price untenable.

replies(1): >>43171172 #

5. weatherlite ◴[25 Feb 25 12:33 UTC] No.43171064[source]▶

>>43168989 (TP) #

Like what economic changes? You can make a case people are 10% more productive in very specific fields (programming, perhaps consultancy etc). That's not really an earthquake, the internet/web was probably way more significant.

replies(3): >>43173649 #>>43173863 #>>43180029 #

6. sarchertech ◴[25 Feb 25 12:37 UTC] No.43171092[source]▶

>>43169795 #

Eyeballs aren’t enough though. Unlike Google ChatGPT is very expensive to run. It’s unlikely they can just slap ads on it like Google did.

replies(1): >>43172802 #

7. TeMPOraL ◴[25 Feb 25 12:46 UTC] No.43171172[source]▶

>>43170002 #

Market competition doesn't work in an instant; even without a moat, there's plenty of money they can capture before it evaporates.

Think pouring water from the faucet into a sink with open drain - if you have high enough flow rate, you can fill the sink faster than it drains. Then, when you turn the faucet off, as the sink is draining, you can still collect plenty of water from it with a cup or a bucket, before the sink fully drains.

replies(2): >>43172946 #>>43172969 #

8. AJ007 ◴[25 Feb 25 15:06 UTC] No.43172802{3}[source]▶

>>43171092 #

Inference costs will keep dropping. The stuff the average consumer does will be trivially cheap. More stuff will move on device. The edge capabilities of these models are already far beyond what the average person can use or comprehend.

The point I wonder about is the sustainability of every query being 30+ requests. Site owners aren't ready to have 98% of their requests be non-monetizable bot traffic. However, sites that have something to sell are..

9. dragonwriter ◴[25 Feb 25 15:15 UTC] No.43172946{3}[source]▶

>>43171172 #

> Market competition doesn't work in an instant; even without a moat, there's plenty of money they can capture before it evaporates.

Sure, in a hypothetical market where before they try to extract profits most participants aren't losing money with below-profitable prices trying to keep mindshare. But you’d need a breakthrough around which a participant had some kind lf a moat to get, even temporarily, there in the LLM market.

10. AJ007 ◴[25 Feb 25 15:17 UTC] No.43172969{3}[source]▶

>>43171172 #

The startups that are using API credits seem like the most likely to be able to achieve a good return on capital. There is a pretty clear cost structure and it's much more straightforward whether you are making money or not.

The infrastructure side of things, tens of billions and probably hundreds of billions going in, may not be fantastic for investors. The return on capital should approach cost of capital if someone does their job correctly. Add in government investment and subsidies (in China, the EU, the United States) and it become extremely difficult to make those calculations. In the long term, I don't think the AI infrastructure will be overbuilt (datacenters, fabs), but like the telecom bubble, it is easy to end up in a position where there is a lot of excess capacity and the way you made your bet means getting wiped out.

Of course if you aren't the investor and it isn't your capital, then there is a tremendous amount of money to be made because you have nothing to lose. I've been around a long time, and this is the closest thing I've felt to that inflection point where the web took off.

11. arisAlexis ◴[25 Feb 25 16:06 UTC] No.43173649[source]▶

>>43171064 #

Very limited thinking AI is a tool

replies(1): >>43176140 #

12. Seanambers ◴[25 Feb 25 16:22 UTC] No.43173863[source]▶

>>43171064 #

LLMs are fundamentally a new paradigm, it just isn't distributed yet.

It's not like the web suddenly was just there, it came slow at first, then everywhere at once, the money came even later.

replies(2): >>43174832 #>>43187422 #

13. ssl-3 ◴[25 Feb 25 17:24 UTC] No.43174752[source]▶

>>43169795 #

I use it for all kinds of unique things, but ChatGPT is the last place I look for facts.

14. weatherlite ◴[25 Feb 25 17:30 UTC] No.43174832{3}[source]▶

>>43173863 #

The LLMs are quite widely distributed already, they're just not that impactful. My wife is an accountant at a big 4 and they're all using them (everyone on Microsoft Office is probably using them, which is a lot of people). It's just not the earth shattering tech change CEOS make it to be , at least not yet. We need order of mangitude improvements in things like reliability, factuality and memory for the real economic efficiencies to come and its unclear to me when that's gonna happen.

replies(1): >>43175714 #

15. zeroq ◴[25 Feb 25 18:30 UTC] No.43175528[source]▶

>>43168989 (TP) #

It's an echo chamber.

It is - what? - a fifth anniversary of "the world will be a completely different place in 6 months due to AI advancement"?

"Sam Altman believes AI will change the world" - of course he does, what else is he supposed to say?

replies(1): >>43176101 #

16. KoolKat23 ◴[25 Feb 25 18:47 UTC] No.43175714{4}[source]▶

>>43174832 #

Not necessarily, workflows just need to be adapted to work with it rather than it working in existing workflows. It's something that happens during each industrial revolution.

Originally electric generators merely replaced steam generators but had no additional productivity gains, this only changed when they changed the rest of the processes around it.

replies(1): >>43181666 #

17. CamperBob2 ◴[25 Feb 25 19:20 UTC] No.43176101[source]▶

>>43175528 #

It is a different place. You just haven't noticed yet.

At some point fairly recently, we passed the point at which things that took longer than anyone thought they would take are happening faster than anyone thought they would happen.

18. ◴[25 Feb 25 19:23 UTC] No.43176140{3}[source]▶

>>43173649 #

19. harshreality ◴[26 Feb 25 02:41 UTC] No.43180029[source]▶

>>43171064 #

It's a force multiplier.

Think of having a secretary, or ten. These secretaries are not as good as an average human at most tasks, but they're good enough for tasks that are easy to double check. You can give them an immense amount of drudgery that would burn out a human.

replies(1): >>43181852 #

20. weatherlite ◴[26 Feb 25 07:53 UTC] No.43181666{5}[source]▶

>>43175714 #

I don't get this. What workflow can have occasional catastrophic lapses of reasoning, non factuality, no memory and hallucinations etc? Even in things like customer support this is a no go imo. As long as these very major problems aren't improved (by a lot) the tools will remain very limited.

replies(3): >>43183442 #>>43189359 #>>43189402 #

21. habinero ◴[26 Feb 25 08:26 UTC] No.43181852{3}[source]▶

>>43180029 #

What drudgery, though? Secretaries don't do a lot of drudgery. And a good one will see tasks that need doing that you didn't specify.

If you're generating immense amounts of really basic make work, that seems like you're managing your time poorly.

replies(1): >>43197683 #

22. jacob019 ◴[26 Feb 25 13:29 UTC] No.43183442{6}[source]▶

>>43181666 #

We are at the precipice of a new era. LLMs are only part of the story. Neural net architecture and tooling has matured to the point where building things like LLMs is possible. LLMs are important and will forever change "the interface" for both developers and users, but it's only the beginning. The Internet changed everything slowly, then quickly, then slowly. I expect that to repeat

replies(1): >>43188587 #

23. genewitch ◴[26 Feb 25 19:49 UTC] No.43187422{3}[source]▶

>>43173863 #

Government and healthcare workers have been using AI for notes for over a year in Louisiana; an additional anecdote to sibling.

24. parodysbird ◴[26 Feb 25 21:45 UTC] No.43188587{7}[source]▶

>>43183442 #

So you're just doing Delphic oracle prophecy. Mysticism is not actually that helpful or useful in most discussions, even if some mystical prediction accidently ends up correct.

replies(1): >>43189826 #

25. andreasmetsala ◴[26 Feb 25 23:12 UTC] No.43189359{6}[source]▶

>>43181666 #

> What workflow can have occasional catastrophic lapses of reasoning, non factuality, no memory and hallucinations etc?

LLMs might enable some completely new things to be automated that made no sense to automate before, even if it’s necessary to error correct with humans / computers.

26. KoolKat23 ◴[26 Feb 25 23:17 UTC] No.43189402{6}[source]▶

>>43181666 #

There's a lot of productivity gains from things like customer support. It can draft a response and the human merely validates it. Hallucination rates are falling and even minor savings add up in these areas with large scale, productivity targets and strict SLA's such as call centres. It's not a reach to say it could already do a lot of Business process outsourcing type work.

replies(1): >>43192397 #

27. jacob019 ◴[27 Feb 25 00:10 UTC] No.43189826{8}[source]▶

>>43188587 #

Observations and expectations are not prophecy, but thanks for replying to dismiss my thoughts. I've been working on a ML project outside of the LLM domain, and I am blown away by the power of the tooling compared to a few years ago.

28. abhpro ◴[27 Feb 25 08:27 UTC] No.43192397{7}[source]▶

>>43189402 #

Source on hallucination rates falling?

I use LLMs 20-30 times a day and while it feels invaluable for personal use where I can interpret the responses at my own discretion, they still hallucinate enough and have enough lapses in logic where I would never feel confident incorporating them into some critical system.

replies(1): >>43192592 #

29. KoolKat23 ◴[27 Feb 25 09:09 UTC] No.43192592{8}[source]▶

>>43192397 #

My own experience, but if you insist

https://www.visualcapitalist.com/ranked-ai-models-with-the-l...

99% of systems aren't critical and human validation is sufficient. My own use case, it is enough to replace plenty of hours of human labour.

30. harshreality ◴[27 Feb 25 19:40 UTC] No.43197683{4}[source]▶

>>43181852 #

As one example, LLMs are great at summarizing, or writing or brainstorming outlines of things. They won't display world-class creativity, but as long as they're not hallucinating, their output is quite usable.

Using them to replace core competencies will probably remain forbidden by professional ethics (writing court documents, diagnosing patients, building bridges). However, there are ways for LLMs to assist people without doing their jobs for them.

Law firms are already using LLMs to deal with large amounts of discovery materials. Doctors and researchers probably use it to summarize papers they want to be familiar with but don't have the energy to read themselves. Engineers might eventually be able to use AI to do a rough design, then do all the regulatory and finite element analysis necessary to prove that it's up to code, just like they'd have to do anyway.

I don't have a high-level LLM subscription, but I think with the right tooling, even existing LLMs might already be pretty good at managing schedules and providing reminders.

↑