←back to thread

200 points baylearn | 1 comments | | HN request time: 0.328s | source
Show context
empiko ◴[] No.44471933[source]
Observe what the AI companies are doing, not what they are saying. If they would expect to achieve AGI soon, their behaviour would be completely different. Why bother developing chatbots or doing sales, when you will be operating AGI in a few short years? Surely, all resources should go towards that goal, as it is supposed to usher the humanity into a new prosperous age (somehow).
replies(9): >>44471988 #>>44471991 #>>44472148 #>>44472874 #>>44473259 #>>44473640 #>>44474131 #>>44475570 #>>44476315 #
Lichtso ◴[] No.44475570[source]
> Why bother developing chatbots

Maybe it is the reverse? It is not them offering a product, it is the users offering their interaction data. Data which might be harvested for further training of the real deal, which is not the product. Think about it: They (companies like OpenAI) have created a broad and diverse user base which without a second thought feeds them with up-to-date info about everything happening in the world, down to the individual life and even their inner thoughts. No one in the history of mankind ever had such a holistic view, almost gods eye. That is certainly something a super intelligence would be interested in. They may have achieved it already and we are seeing one of its strategies playing out. Not saying they have, but this observation would not be incompatible or indicate they haven't.

replies(3): >>44476075 #>>44476079 #>>44478319 #
visarga ◴[] No.44478319[source]
It's not about achieving AGI as a final product, it's about building a perpetual learning machine fueled by real-time human interaction. I call it the human-AI experience flywheel.

People bring problems to the LLM, the LLM produces some text, people use it and later return to iterate. This iteration functions as a feedback for earlier responses from the LLM. If you judge an AI response by the next 20 rounds of interaction or more you can gauge if it was useful or not. They can create RLHF data this way, using hindsight or extra context from other related conversations of the same user on the same topic. That works because users try the LLM ideas in reality and bring outcome results back to the model, or they simply recall from their personal experience if that approach would work or not. The system isn't just built to be right; it's built to be correctable by the user base, at scale.

OpenAI has 500M users, if they generate 1000 tokens/user/day that means 0.5T interactive tokens/day. The chat logs dwarf the original training set in size and are very diverse, targeted to our interests, and mixed with feedback. They are also "on policy" for the LLM, meaning they contain corrections to mistakes the LLM made, not generic information like web scrape.

You're right that LLMs eventually might not even need to crawl the web, they have the whole society dump data into their open mouths. That did not happen with web search engines, only social networks did that in the past. But social networks are filled with our cultural wars and self conscious posing, while the chat room is an environment where we don't need to signal our group alignment.

Web scraping gives you humanity's external productions - what we chose to publish. But conversational logs capture our thinking process, our mistakes, our iterative refinements. Google learned what we wanted to find, but LLMs learn how we think through problems.

replies(1): >>44478430 #
FuckButtons ◴[] No.44478430[source]
I see where you’re coming from, but I think teasing out something that looks like a clear objective function that generalizes to improved intelligence from llm interaction logs is going to be hellishly difficult. Consider, that most of the best llm pre training comes from being very very judicious with the training data, selecting the right corpus of llm interaction logs and then defining an objective function that correctly models…? Being helpful? From that sounds far harder than just working from scratch with rlhf.
replies(1): >>44479317 #
1. visarga ◴[] No.44479317[source]
The way I see it is to use hindsight, not to come with predefined criteria. The criteria is usefulness of one LLM response in the interactions that follow it down the line.

For example, the model might propose "try doing X", and I come back later and say "I tried X but this and that happened", it can use that asa feedback. It might be a feedback generated from the real world outcomes of the X suggestion, or even from my own experience, maybe I have seen X in practice and know if it works or not. The longitudinal analysis can span multiple days, the more context the better for self analysis.

The cool thing is that generating preference scores for LLM responses, training a judge model on them, and then doing RLHF with this judge model on the base LLM ensures isolation. So personal data leaks might not be an issue. Another beneficial effect is that the judge model learns to transfer judgements skills across similar contexts, so there might be some generalization going on.

Of course there is always the risk of systematic bias and random noise in the data, but I believe AI researchers are equipped to deal with it. It won't be as simple as I described, but the size of the interaction dataset and the human in the loop, and real world testing are certainly useful for LLMs.