←back to thread

625 points lukebennett | 3 comments | | HN request time: 0.422s | source
1. aresant ◴[] No.42139647[source]
Taking a hollistic view informed by a disruptive OpenAI / AI / LLM twitter habit I would say this is AI's "What gets measured gets managed" moment and the narrative will change

This is supported by both general observations and recently this tweet from an OpenAI engineer that Sam responded to and engaged ->

"scaling has hit a wall and that wall is 100% eval saturation"

Which I interpert to mean his view is that models are no longer yielding significant performance improvements because the models have maxed out existing evaluation metrics.

Are those evaluations (or even LLMs) the RIGHT measures to achieve AGI? Probably not.

But have they been useful tools to demonstrate that the confluence of compute, engineering, and tactical models are leading towards signifigant breathroughts in artificial (computer) intelligence?

I would say yes.

Which in turn are driving the funding, power innovation, public policy etc needed to take that next step?

I hope so.

(1) https://x.com/willdepue/status/1856766850027458648

replies(2): >>42139702 #>>42142811 #
2. ActionHank ◴[] No.42139702[source]
> Which in turn are driving the funding, power innovation, public policy etc needed to take that next step?

They are driving the shoveling of VC money into a furnace to power their servers.

Should that money run dry before they hit another breakthrough "AI" popularity is going to drop like a stone. I believe this to be far more likely an outcome than AGI or even the next big breakthrough.

3. Bjorkbat ◴[] No.42142811[source]
I agree that existing benchmarks are no longer useful now that there's basically nothing left in them that seems to stump LLMs.

But when I hear that models are failing to meet expectations, I imagine what they're saying is that the researchers had some sort of eval in mind with room to grow and a target, and that the model in question failed to hit the target they had in mind.

Honestly, problem with sentiments like these is on Twitter is that you can't tell if they're being sincere or just making a snarky, useless remark. Probably a mix of both.