Observations of reality is more consistent with company FOMO than with actual usefulness.
Now they are cancelling those plans. For them "AGI" was cancelled.
OpenAI claims to be closer and closer to "AGI" as more top scientists left or are getting poached by other labs that are behind.
So why would you leave if the promise of achieving "AGI" was going to produce "$100B dollars of profits" as per OpenAI's and Microsoft's definition in their deal?
Their actions tell more than any of their statements or claims.
They are leaving for more money, more seniority or because they don’t like their boss. 0 about AGI
Personally I think AGI is ill-defined and won't happen as a new model release. Instead the thing to look for is how LLMs are being used in AI research and there are some advances happening there.
Of course, but that's part of my whole point.
Such statements and targets about how close we are to "AGI" has only become nothing but false promises and using AGI as the prime excuse to continue raising more money.
Another way to say it is that people think it’s much more likely for each decent LLM startup grow really strongly first several years then plateau vs. then for their current established player to hit hyper growth because of AGI.
To fund yourself while building AGI? To hedge risk that AGI takes longer? Not saying you're wrong, just saying that even if they did believe it, this behavior could be justified.
Microsoft itself hasn't said they're doing this because of oversupply in infrastructure for it's AI offerings, but they very likely wouldn't say that publicly even if that's the reason.
Seems to be about this:
> As per the current terms, when OpenAI creates AGI - defined as a "highly autonomous system that outperforms humans at most economically valuable work" - Microsoft's access to such a technology would be void.
https://www.reuters.com/technology/openai-seeks-unlock-inves...
This is the main point that proves to me that these companies are mostly selling us snake oil. Yes, there is a great deal of utility from even the current technology. It can detect patterns in data that no human could; that alone can be revolutionary in some fields. It can generate data that mimics anything humans have produced, and certain permutations of that can be insightful. It can produce fascinating images, audio, and video. Some of these capabilities raise safety concerns, particularly in the wrong hands, and important questions that society needs to address. These hurdles are surmountable, but they require focusing on the reality of what these tools can do, instead of on whatever a group of serial tech entrepreneurs looking for the next cashout opportunity tell us they can do.
The constant anthropomorphization of this technology is dishonest at best, and harmful and dangerous at worst.
As such even if there is a lot of money AI will make, it can still be the right decision to sell tools to others who will figure out how to use it. And of course if it turns out another pointless fad with no real value you still make money. (I'd predict the answer is in between - we are not going to get some AGI that takes over the world, but there will be niches where it is a big help and those niches will be worth selling tools into)
What if chatbots and user interactions ARE the path to AGI? Two reasons they could be: (1) Reinforcement learning in AI has proven to be very powerful. Humans get to GI through learning too - they aren’t born with much intelligence. Interactions between AI and humans may be the fastest way to get to AGI. (2) The classic Silicon Valley startup model is to push to customers as soon as possible (MVP). You don’t develop the perfect solution in isolation, and then deploy it once it is polished. You get users to try it and give feedback as soon as you have something they can try.
I don’t have any special insight into AI or AGI, but I don’t think OpenAI selling useful and profitable products is proof that there won’t be AI.
Hallucination does seem to be much less of an issue now. I hardly even hear about it - like it just faded away.
As far as I can tell smart engineers are using AI tools, particularly people doing coding, but even non-coding roles.
The criticism feels about three years out of date.
No, it can generate data that mimics anything humans have put on the WWW
Then they leave for more money.
The other reason is because the primary focus of the last 3 years has been scaling the data and hardware up, with a bunch of (much needed) engineering around it. This has produced better results, but it can't sustain the AGI promises for much longer. The industry can only survive on shiny value added services and smoke and mirrors for so long.
The 'no one jumps ship if agi is close' assumption is really weak, and seemingly completely unsupported in TFA...
Last week I had Claude and ChatGPT both tell me different non-existent options to migrate a virtual machine from vmware to hyperv.
Week before that one of them (don't remember which, honestly) gave me non existent options for fio.
Both of these are things that the first party documentation or man page has correct but i was being lazy and was trying to save time or be more efficient like these things are supposed to do for us. Not so much.
Hallucinations are still a problem.
Nonsense, there is a TON of discussion around how the standard workflow is "have Cursor-or-whatever check the linter and try to run the tests and keep iterating until it gets it right" that is nothing but "work around hallucinations." Functions that don't exist. Lines that don't do what the code would've required them to do. Etc. And yet I still hit cases weekly-at-least, when trying to use these "agents" to do more complex things, where it talks itself into a circle and can't figure it out.
What are you trying to get these things to do, and how are you validating that there are no hallucinations? You hardly ever "hear about it" but ... do you see it? How deeply are you checking for it?
(It's also just old news - a new hallucination is less newsworthy now, we are all so used to it.)
Of course, the internet is full of people claiming that they are using the same tools I am but with multiple factors higher output. Yet I wonder... if this is the case, where is the acceleration in improvement in quality in any of the open source software I use daily? Or where are the new 10x-AI-agent-produced replacements? (Or the closed-source products, for that matter - but there it's harder to track the actual code.) Or is everyone who's doing less-technical, less-intricate work just getting themselves hyped into a tizzy about getting faster generation of basic boilerplate for languages they hadn't personally mastered before?
Even just in industry, I think data functions at companies will have a dicey future.
I haven't seen many places where there's scientific peer review - or even software-engineering-level code-review - of findings from data science teams. If the data scientist team says "we should go after this demographic" and it sounds plausible, it usually gets implemented.
So if the ability to validate was already missing even pre-LLM, what hope is there for validation of the LLM-powered replacement. And so what hope is there of the person doing the non-LLM-version of keeping their job (at least until several quarters later when the strategy either proves itself out or doesn't.)
How many other departments are there where the same lack of rigor already exists? Marketing, sales, HR... yeesh.
A few days ago, I asked free ChatGPT to tell me the head brewer of a small brewery in Corpus Christi. It told me that the brewery didn't exist, which it did, because we were going there in a few minutes, but after re-prompting it, it gave me some phone number that it found in a business filing. (ChatGPT has been using web search for RAG for some time now.)
Hallucinations are still a massive problem IMO.
Maybe it is the reverse? It is not them offering a product, it is the users offering their interaction data. Data which might be harvested for further training of the real deal, which is not the product. Think about it: They (companies like OpenAI) have created a broad and diverse user base which without a second thought feeds them with up-to-date info about everything happening in the world, down to the individual life and even their inner thoughts. No one in the history of mankind ever had such a holistic view, almost gods eye. That is certainly something a super intelligence would be interested in. They may have achieved it already and we are seeing one of its strategies playing out. Not saying they have, but this observation would not be incompatible or indicate they haven't.
More like usher in climate catastrophe way ahead of schedule. AI-driven data center build outs are a major source of new energy use, and this trend is only intensifying. Dangerously irresponsible marketing cloaks the impact of these companies on our future.
People bring problems to the LLM, the LLM produces some text, people use it and later return to iterate. This iteration functions as a feedback for earlier responses from the LLM. If you judge an AI response by the next 20 rounds of interaction or more you can gauge if it was useful or not. They can create RLHF data this way, using hindsight or extra context from other related conversations of the same user on the same topic. That works because users try the LLM ideas in reality and bring outcome results back to the model, or they simply recall from their personal experience if that approach would work or not. The system isn't just built to be right; it's built to be correctable by the user base, at scale.
OpenAI has 500M users, if they generate 1000 tokens/user/day that means 0.5T interactive tokens/day. The chat logs dwarf the original training set in size and are very diverse, targeted to our interests, and mixed with feedback. They are also "on policy" for the LLM, meaning they contain corrections to mistakes the LLM made, not generic information like web scrape.
You're right that LLMs eventually might not even need to crawl the web, they have the whole society dump data into their open mouths. That did not happen with web search engines, only social networks did that in the past. But social networks are filled with our cultural wars and self conscious posing, while the chat room is an environment where we don't need to signal our group alignment.
Web scraping gives you humanity's external productions - what we chose to publish. But conversational logs capture our thinking process, our mistakes, our iterative refinements. Google learned what we wanted to find, but LLMs learn how we think through problems.
That’s not the case in my experience. Gemini is almost as good as Claude for most of the things I try.
That said, for queries tgat don’t use agentic search or rag, hallucination is as bad a problem as ever and it won’t improve because hallucination is all these models do. In Karpathy’s phrase they “dream text”. Agentic search and rag and similar techniques disguise the issue because they stuff the context of the model with real results, so the scope for it to go noticeably off the rails is less. But it’s still very visible if you ask for references, links etc many/most/sometimes all will be hallucinations depending on the prompt.
For example, the model might propose "try doing X", and I come back later and say "I tried X but this and that happened", it can use that asa feedback. It might be a feedback generated from the real world outcomes of the X suggestion, or even from my own experience, maybe I have seen X in practice and know if it works or not. The longitudinal analysis can span multiple days, the more context the better for self analysis.
The cool thing is that generating preference scores for LLM responses, training a judge model on them, and then doing RLHF with this judge model on the base LLM ensures isolation. So personal data leaks might not be an issue. Another beneficial effect is that the judge model learns to transfer judgements skills across similar contexts, so there might be some generalization going on.
Of course there is always the risk of systematic bias and random noise in the data, but I believe AI researchers are equipped to deal with it. It won't be as simple as I described, but the size of the interaction dataset and the human in the loop, and real world testing are certainly useful for LLMs.
Another important group to remember is those who owned the infrastructure necessary for the prospectors to survive. The folks who owned (or strong-armed their way into) the services around housing, food, alcohol, etc. made off like bandits.