←back to thread

346 points swatson741 | 2 comments | | HN request time: 0.537s | source
Show context
gchadwick ◴[] No.45788468[source]
Karpathy's contribution to teaching around deep learning is just immense. He's got a mountain of fantastic material from short articles like this, longer writing like https://karpathy.github.io/2015/05/21/rnn-effectiveness/ (on recurrent neural networks) and all of the stuff on YouTube.

Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.

replies(2): >>45788631 #>>45788885 #
throwaway290 ◴[] No.45788631[source]
And to all the LLM heads here, this is his work process:

> Yesterday I was browsing for a Deep Q Learning implementation in TensorFlow (to see how others deal with computing the numpy equivalent of Q[:, a], where a is an integer vector — turns out this trivial operation is not supported in TF). Anyway, I searched “dqn tensorflow”, clicked the first link, and found the core code. Here is an excerpt:

Notice how it's "browse" and "search" not just "I asked chatgpt". Notice how it made him notice a bug

replies(2): >>45788657 #>>45788753 #
stingraycharles ◴[] No.45788657[source]
First of all, this is not a competition between “are LLMs better than search”.

Secondly, the article is from 2016, ChatGPT didn’t exist back then

replies(1): >>45788723 #
code51 ◴[] No.45788723[source]
I doubt he's letting LLM creep in to his decision-making in 2025, aside from fun side projects (vibes). We don't ever come across Karpathy going to an LLM or expressing that an LLM helped in any of his Youtube videos about building LLMs.

He's just test driving LLMs, nothing more.

Nobody's asking this core question in podcasts. "How much and how exactly are you using LLMs in your daily flow?"

I'm guessing it's like actors not wanting to watch their own movies.

replies(3): >>45788765 #>>45788773 #>>45788777 #
mquander ◴[] No.45788773[source]
Karpathy talking for 2 hours about how he uses LLMs:

https://www.youtube.com/watch?v=EWvNQjAaOHw

replies(1): >>45789070 #
code51 ◴[] No.45789070[source]
Vibing, not firing at his ML problems.

He's doing a capability check in this video (for the general audience, which is good of course), not attacking a hard problem in ML domain.

Despite this tweet: https://x.com/karpathy/status/1964020416139448359 , I've never seen him citing an LLM helped him out in ML work.

replies(1): >>45789767 #
soulofmischief ◴[] No.45789767[source]
You're free to believe whatever fantasy you wish, but as someone who frequently consults an LLM alongside other resources when thinking about complex and abstract problems, there is no way in hell that Karpathy intentionally limits his options by excluding LLMs when seeking knowledge or understanding.

If he did not believe in the capability of these models, he would be doing something else with his time.

replies(1): >>45790322 #
strogonoff ◴[] No.45790322[source]
One can believe in the capability of a technology but on principle refuse to use implementations of it built on ethically flawed approaches (e.g., violating GPL licensing laws and/or copyright, thus harming open source ecosystem).
replies(2): >>45791892 #>>45791914 #
soulofmischief ◴[] No.45791914[source]
What you see as copyright violation, I see as liberation. I have open models running locally on my machine that would have felled kingdoms in the past.
replies(1): >>45802214 #
strogonoff ◴[] No.45802214[source]
I personally see no issue with training and running open local models by individuals. When corporations run scrapers and expropriate IP at an industrial scale, then charge for using them, it is different.
replies(1): >>45803486 #
1. soulofmischief ◴[] No.45803486[source]
What about Meta and the commercially licensed family of Llama open-weight models?
replies(1): >>45807938 #
2. strogonoff ◴[] No.45807938[source]
I have not researched closely enough but I think it falls under what corporations do. They are commercially licensed, you cannot use them freely, and crucially they were trained using data scraped at an industrial scale, contributing to degradation of the Web for humans.