Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest non-thinking score, taking that title from Sonnet 3.5.
Aider 0.75.0 is out with support for 3.7 Sonnet [1].
Thinking support and thinking benchmark results coming soon.
Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest non-thinking score, taking that title from Sonnet 3.5.
Aider 0.75.0 is out with support for 3.7 Sonnet [1].
Thinking support and thinking benchmark results coming soon.
65% Sonnet 3.7, 32k thinking
64% R1+Sonnet 3.5
62% o1 high
60% Sonnet 3.7, no thinking
60% o3-mini high
57% R1
52% Sonnet 3.5
ChatGPT is already my default first place to check something, where it was Google for the previous 20+ years.
Think pouring water from the faucet into a sink with open drain - if you have high enough flow rate, you can fill the sink faster than it drains. Then, when you turn the faucet off, as the sink is draining, you can still collect plenty of water from it with a cup or a bucket, before the sink fully drains.
The point I wonder about is the sustainability of every query being 30+ requests. Site owners aren't ready to have 98% of their requests be non-monetizable bot traffic. However, sites that have something to sell are..
Sure, in a hypothetical market where before they try to extract profits most participants aren't losing money with below-profitable prices trying to keep mindshare. But you’d need a breakthrough around which a participant had some kind lf a moat to get, even temporarily, there in the LLM market.
The infrastructure side of things, tens of billions and probably hundreds of billions going in, may not be fantastic for investors. The return on capital should approach cost of capital if someone does their job correctly. Add in government investment and subsidies (in China, the EU, the United States) and it become extremely difficult to make those calculations. In the long term, I don't think the AI infrastructure will be overbuilt (datacenters, fabs), but like the telecom bubble, it is easy to end up in a position where there is a lot of excess capacity and the way you made your bet means getting wiped out.
Of course if you aren't the investor and it isn't your capital, then there is a tremendous amount of money to be made because you have nothing to lose. I've been around a long time, and this is the closest thing I've felt to that inflection point where the web took off.
It's not like the web suddenly was just there, it came slow at first, then everywhere at once, the money came even later.
It is - what? - a fifth anniversary of "the world will be a completely different place in 6 months due to AI advancement"?
"Sam Altman believes AI will change the world" - of course he does, what else is he supposed to say?
Originally electric generators merely replaced steam generators but had no additional productivity gains, this only changed when they changed the rest of the processes around it.
At some point fairly recently, we passed the point at which things that took longer than anyone thought they would take are happening faster than anyone thought they would happen.
Think of having a secretary, or ten. These secretaries are not as good as an average human at most tasks, but they're good enough for tasks that are easy to double check. You can give them an immense amount of drudgery that would burn out a human.
If you're generating immense amounts of really basic make work, that seems like you're managing your time poorly.
LLMs might enable some completely new things to be automated that made no sense to automate before, even if it’s necessary to error correct with humans / computers.
I use LLMs 20-30 times a day and while it feels invaluable for personal use where I can interpret the responses at my own discretion, they still hallucinate enough and have enough lapses in logic where I would never feel confident incorporating them into some critical system.
https://www.visualcapitalist.com/ranked-ai-models-with-the-l...
99% of systems aren't critical and human validation is sufficient. My own use case, it is enough to replace plenty of hours of human labour.
Using them to replace core competencies will probably remain forbidden by professional ethics (writing court documents, diagnosing patients, building bridges). However, there are ways for LLMs to assist people without doing their jobs for them.
Law firms are already using LLMs to deal with large amounts of discovery materials. Doctors and researchers probably use it to summarize papers they want to be familiar with but don't have the energy to read themselves. Engineers might eventually be able to use AI to do a rough design, then do all the regulatory and finite element analysis necessary to prove that it's up to code, just like they'd have to do anyway.
I don't have a high-level LLM subscription, but I think with the right tooling, even existing LLMs might already be pretty good at managing schedules and providing reminders.