This is the main point that proves to me that these companies are mostly selling us snake oil. Yes, there is a great deal of utility from even the current technology. It can detect patterns in data that no human could; that alone can be revolutionary in some fields. It can generate data that mimics anything humans have produced, and certain permutations of that can be insightful. It can produce fascinating images, audio, and video. Some of these capabilities raise safety concerns, particularly in the wrong hands, and important questions that society needs to address. These hurdles are surmountable, but they require focusing on the reality of what these tools can do, instead of on whatever a group of serial tech entrepreneurs looking for the next cashout opportunity tell us they can do.
The constant anthropomorphization of this technology is dishonest at best, and harmful and dangerous at worst.
Hallucination does seem to be much less of an issue now. I hardly even hear about it - like it just faded away.
As far as I can tell smart engineers are using AI tools, particularly people doing coding, but even non-coding roles.
The criticism feels about three years out of date.
No, it can generate data that mimics anything humans have put on the WWW
The other reason is because the primary focus of the last 3 years has been scaling the data and hardware up, with a bunch of (much needed) engineering around it. This has produced better results, but it can't sustain the AGI promises for much longer. The industry can only survive on shiny value added services and smoke and mirrors for so long.
Last week I had Claude and ChatGPT both tell me different non-existent options to migrate a virtual machine from vmware to hyperv.
Week before that one of them (don't remember which, honestly) gave me non existent options for fio.
Both of these are things that the first party documentation or man page has correct but i was being lazy and was trying to save time or be more efficient like these things are supposed to do for us. Not so much.
Hallucinations are still a problem.
Nonsense, there is a TON of discussion around how the standard workflow is "have Cursor-or-whatever check the linter and try to run the tests and keep iterating until it gets it right" that is nothing but "work around hallucinations." Functions that don't exist. Lines that don't do what the code would've required them to do. Etc. And yet I still hit cases weekly-at-least, when trying to use these "agents" to do more complex things, where it talks itself into a circle and can't figure it out.
What are you trying to get these things to do, and how are you validating that there are no hallucinations? You hardly ever "hear about it" but ... do you see it? How deeply are you checking for it?
(It's also just old news - a new hallucination is less newsworthy now, we are all so used to it.)
Of course, the internet is full of people claiming that they are using the same tools I am but with multiple factors higher output. Yet I wonder... if this is the case, where is the acceleration in improvement in quality in any of the open source software I use daily? Or where are the new 10x-AI-agent-produced replacements? (Or the closed-source products, for that matter - but there it's harder to track the actual code.) Or is everyone who's doing less-technical, less-intricate work just getting themselves hyped into a tizzy about getting faster generation of basic boilerplate for languages they hadn't personally mastered before?
Even just in industry, I think data functions at companies will have a dicey future.
I haven't seen many places where there's scientific peer review - or even software-engineering-level code-review - of findings from data science teams. If the data scientist team says "we should go after this demographic" and it sounds plausible, it usually gets implemented.
So if the ability to validate was already missing even pre-LLM, what hope is there for validation of the LLM-powered replacement. And so what hope is there of the person doing the non-LLM-version of keeping their job (at least until several quarters later when the strategy either proves itself out or doesn't.)
How many other departments are there where the same lack of rigor already exists? Marketing, sales, HR... yeesh.
A few days ago, I asked free ChatGPT to tell me the head brewer of a small brewery in Corpus Christi. It told me that the brewery didn't exist, which it did, because we were going there in a few minutes, but after re-prompting it, it gave me some phone number that it found in a business filing. (ChatGPT has been using web search for RAG for some time now.)
Hallucinations are still a massive problem IMO.
That’s not the case in my experience. Gemini is almost as good as Claude for most of the things I try.
That said, for queries tgat don’t use agentic search or rag, hallucination is as bad a problem as ever and it won’t improve because hallucination is all these models do. In Karpathy’s phrase they “dream text”. Agentic search and rag and similar techniques disguise the issue because they stuff the context of the model with real results, so the scope for it to go noticeably off the rails is less. But it’s still very visible if you ask for references, links etc many/most/sometimes all will be hallucinations depending on the prompt.