NanoChat – The best ChatGPT that $100 can buy

1. karimf ◴[13 Oct 25 16:02 UTC] No.45569878[source]▶

I've always thought about the best way to contribute to humanity: number of people you help x how much you help them. I think what Karpathy is doing is one of the highest leverage ways to achieve that.

Our current world is build on top of open source projects. This is possible because there are a lot of free resources to learn to code so anyone from anywhere in the world can learn and make a great piece of software.

I just hope the same will happen with the AI/LLM wave.

replies(11): >>45571834 #>>45571836 #>>45571900 #>>45571959 #>>45571975 #>>45572208 #>>45572425 #>>45572536 #>>45572555 #>>45572584 #>>45572596 #

2. viccis ◴[13 Oct 25 18:38 UTC] No.45571834[source]▶

>>45569878 (TP) #

I recommend his ANN/LLM from scratch videos to people a lot because not only is he a clear instructor, but his code tends to be very Pythonic and just the right balance of terse but readable (not counting the Pytorch vectorization stuff, but that's not his fault, it's just complex). So I think people benefit just from watching and imitating his code style.

3. croes ◴[13 Oct 25 18:39 UTC] No.45571836[source]▶

>>45569878 (TP) #

I‘m afraid the technology will do more damage because many people will abuse it for fake news and misinformation.

replies(1): >>45572210 #

4. shafyy ◴[13 Oct 25 18:43 UTC] No.45571900[source]▶

>>45569878 (TP) #

If it only were so easy

5. ◴[13 Oct 25 18:50 UTC] No.45571959[source]▶

>>45569878 (TP) #

6. bkettle ◴[13 Oct 25 18:51 UTC] No.45571975[source]▶

>>45569878 (TP) #

This free tradition in software is I think one of the things that I love so much, but I don't see how it can continue with LLMs due to the extremely high training costs and the powerful hardware required for inference. It just seems like writing software will necessarily require paying rent to the LLM hosts to keep up. I guess it's possible that we'll figure out a way to do local inference in a way that is accessible to everyone in the way that most other modern software tools are, but the high training costs make that seem unlikely to me.

I also worry that as we rely on LLMs more and more, we will stop producing the kind of tutorials and other content aimed at beginners that makes it so easy to pick up programming the manual way.

replies(2): >>45572063 #>>45572731 #

7. hodgesrm ◴[13 Oct 25 19:00 UTC] No.45572063[source]▶

>>45571975 #

This. It looks like one of the keys to maintaining open source is to ensure OSS developers have access to capable models. In the best of worlds, LLM vendors would recognize that open source software is the commons that feeds their models and ensure it flourishes.

In the real world...

8. martin-t ◴[13 Oct 25 19:14 UTC] No.45572208[source]▶

>>45569878 (TP) #

As noble as the goal sounds, I think it's wrong.

Software is just a tool. Much like a hammer, a knife, or ammonium nitrate, it can be used for both good or bad.

I say this as someone who has spent almost 15 years writing software in my free time and publishing it as open source: building software and allowing anyone to use it does not automatically make other people's lives better.

A lot of my work has been used for bad purposes or what some people would consider bad purposes - cheating on tests, cheating in games, accessing personal information without permission, and in one case my work contributed to someone's doxxing. That's because as soon as you publish it, you lose control over it.

But at least with open source software, every person can use it to the same extent so if the majority of people are good, the result is likely to be more positive than negative.

With what is called AI today, only the largest corporations can afford to train the models which means they are controlled by people who have entirely different incentives from the general working population and many of whom have quite obvious antisocial personality traits.

At least 2 billion people live in dictatorships. AI has the potential to become a tool of mass surveillance and total oppression from which those countries will never recover because just like the models can detect a woman is pregnant before she knows it, it will detect a dissenter long before dissent turns into resistance.

I don't have high hopes for AI to be a force for good and teaching people how toy models work, as fun as it is, is not gonna change it.

replies(3): >>45572393 #>>45572401 #>>45572770 #

9. IntrepidPig ◴[13 Oct 25 19:15 UTC] No.45572210[source]▶

>>45571836 #

Yeah it feels similar to inventing the nuke. Or it’s even more insidious because the harmful effects of the tech are not nearly as obvious or immediate as the good effects, so less restraint is applied. But also, similar to the nuke, once the knowledge on how to do it is out there, someone’s going to use it, which obligates everyone else to use it to keep up.

10. isaacremuant ◴[13 Oct 25 19:31 UTC] No.45572393[source]▶

>>45572208 #

> At least 2 billion people live in dictatorships. AI has the potential to become a tool of mass surveillance and total oppression from which those countries will never recover because just like the models can detect a woman is pregnant before she knows it, it will detect a dissenter long before dissent turns into resistance.

It already works like this in your precious western democracies and they didn't need AI to be authoritarian total surveillance states in spirit, with quite a lot of support from a propagandized populace that begged for or pretended to agree with the infringement of their civil rights because of terrorism, drugs, covid or protecting the poor poor children.

You can combat tech with legislation and culture but the legislation and culture were way beyond the tech in being extremely authoritian in the first place.

11. oliveiracwb ◴[13 Oct 25 19:32 UTC] No.45572401[source]▶

>>45572208 #

I would genuinely love to think otherwise. But I've seen and grown up seeing good things being used in stupid ways (not necessarily for malice)

12. carlcortright ◴[13 Oct 25 19:34 UTC] No.45572425[source]▶

>>45569878 (TP) #

strong +1 - developers like him are heros

13. epolanski ◴[13 Oct 25 19:44 UTC] No.45572536[source]▶

>>45569878 (TP) #

Then a single person whose learned those skills decide to poison all of us thanks to the skills acquired.

14. contingencies ◴[13 Oct 25 19:46 UTC] No.45572555[source]▶

>>45569878 (TP) #

While documenting a build path is nice, IMHO renting hardware nobody can afford from VC-backed cloud providers using cold hard cash to produce clones of legacy tech using toy datasets under the guise of education is propping up the AI bubble and primarily helping institutional shareholders in those AI bubble companies, particularly their hardware supplier NVidia. Personally I do not see this as helping people or humanity.

This would sit better with me if the repo included a first tier use case for local execution, non-NVidia hardware reference, etc.

replies(3): >>45572724 #>>45572753 #>>45572792 #

15. Yizahi ◴[13 Oct 25 19:49 UTC] No.45572584[source]▶

>>45569878 (TP) #

I would adjust your formula to the:

number of people you help x how much you help them x number of people you harm x how much you harm them

For example - harming a little bit all content creators of the world, by stealing their work without compensation or permission. How much does that cost globally every year after year? How do we even quantify long term consequences of that? Stuff like that.

16. ◴[13 Oct 25 19:50 UTC] No.45572596[source]▶

>>45569878 (TP) #

17. jstummbillig ◴[13 Oct 25 20:02 UTC] No.45572724[source]▶

>>45572555 #

I think you got your proportions slightly wrong there. This will be contributing as much to an AI bubble as a kid tinkering around with combustion is contribution to global warming.

replies(1): >>45572778 #

18. levocardia ◴[13 Oct 25 20:03 UTC] No.45572731[source]▶

>>45571975 #

There's a Stephen Boyd quote that's something like "if your optimization problem is too computationally expensive, just go on vacation to Greece for a few weeks and by the time you get back, computers might be fast enough to solve it." With LLMs there's sort of an equivalent situation with cost: how mindblowing would it be able to train this kind of LLM at all even just 4 years ago? And today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.

There's also a reasonable way to "leapfrog" the training cost with a pre-trained model. So if you were doing nanochat as a learning exercise and had no money, the idea would be to code it up, run one or two very slow gradient descent iterations on your slow machine to make sure it is working, then download a pre-trained version from someone who could spare the compute.

replies(1): >>45572897 #

19. simonw ◴[13 Oct 25 20:05 UTC] No.45572753[source]▶

>>45572555 #

"This would sit better with me if the repo included a first tier use case for local execution, non-NVidia hardware reference, etc."

This is a pretty disheartening way to respond to something like this. Someone puts a great deal of effort into giving something interesting away for free, and is told "you should have also done THIS work for free as well in order for me to value your contribution".

replies(1): >>45572839 #

20. simonw ◴[13 Oct 25 20:06 UTC] No.45572770[source]▶

>>45572208 #

"With what is called AI today, only the largest corporations can afford to train the models"

I take it you're very positive about Andrej's new project which allows anyone to train a model for a few hundred dollars which is comparable to the state-of-the-art from just 5 years ago then.

21. contingencies ◴[13 Oct 25 20:07 UTC] No.45572778{3}[source]▶

>>45572724 #

Not really. Anything that guy does sets the tone for an extended cacophony of fans and followers. It would be a sad day when nobody critically assesses the motivations, effects and framing of those moves. I question the claim this move helps humanity and stand by the assessment it's just more feeding an unfree ecosystem which equates to propping up the bubble.

22. CamperBob2 ◴[13 Oct 25 20:08 UTC] No.45572792[source]▶

>>45572555 #

If you can't afford $100 or learn how to train it locally with more time and less money, then this isn't something you should be focusing on at all.

replies(1): >>45572882 #

23. contingencies ◴[13 Oct 25 20:12 UTC] No.45572839{3}[source]▶

>>45572753 #

It is an objective and transparent response based on free software world norms. Feel free to interpret differently and to be disheartened. Hell, many of us are disheartened by the AI VC political theater we are seeing right now: experienced programmers, artists, lawyers, perhaps much of humanity. Let's stick to objective elements of the discussion, not emotional opine.

24. contingencies ◴[13 Oct 25 20:17 UTC] No.45572882{3}[source]▶

>>45572792 #

It is amusing to note the dichotomy between the clearly compassionate, empathetic and altruistic perspective displayed here and the comically overstated framing of helping humanity.

25. dingnuts ◴[13 Oct 25 20:19 UTC] No.45572897{3}[source]▶

>>45572731 #

> today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.

No, it's extremely hard to imagine since I used one of Karpathy's own models to have a basic chat bot like six years ago. Yes, it spoke nonsense; so did my GPT-2 fine tune four years ago and so does this.

And so does ChatGPT

Improvement is linear at best. I still think it's actually a log curve and GPT3 was the peak of the "fun" part of the curve. The only evidence I've seen otherwise is bullshit benchmarks, "agents" that increase performance 2x by increasing token usage 100x, and excited salesmen proclaiming the imminence of AGI

replies(1): >>45573051 #

26. simonw ◴[13 Oct 25 20:34 UTC] No.45573051{4}[source]▶

>>45572897 #

Apparently 800 million weekly users are finding ChatGPT useful in its present state.