Edit: I never actually expected AGI from LLMs. That was snark. I just think it's notable that the fundamental gains in LLM performance seem to have dried up.
Edit: I never actually expected AGI from LLMs. That was snark. I just think it's notable that the fundamental gains in LLM performance seem to have dried up.
But why does this paper impact your thinking on it? It is about budget and recognizing that different LLMs have different cost structures. It's not really an attempt to improve LLM performance measured absolutely.
arxiv is essentially a blog under an academic format, popular amongst asian and south asian academic communities
currently you can launder reputation with it, just like “white papers” in the crypto world allowed for capital for some time
this ability will diminish as more people catch on
And most would have accept the recommendation because the model sold it as less common tactic, while sounding very logical.
Once you've started to argue with an LLM you're already barking up the wrong tree. Maybe you're right, maybe not, but there's no point in arguing it out with an LLM.
It's mostly hand waving, hype and credulity, and unproven claims of scalability right now.
You can't move the goal posts because they don't exist.
So many people just want to believe, instead of the reality of LLMs being quite unreliable.
Personally it's usually fairly obvious to me when LLMs are bullshitting probably because I have lots of experience detecting it in humans.
It'll be a while until the ability to move the goalposts of "actual intelligence" is exhausted entirely.
In this case I just happened to be domain expert and knew it was wrong. It would have required significant effort to verify everything with some less experienced person.
And the kind of automation brought by LLMs is decidely different than automation in the past which almost always created new (usually better) jobs. LLMs won't do this (at least to extent where it would matter) I think. Most people in ten years will have worse jobs (more physically straining, longer hours, less pay) unless there will be a political intervention.
Doesn't mean there aren't practical definitions depending on the context.
In essence, teaching an AI using recources meant for humans, and nothing more, would be considered AGI. That could be a practical definition, without needing much more rigour.
There is indeed no evidence we'll get there. But there is also no evidence LLM's should work as well as they do