The deep learning boom caught almost everyone by surprise

(www.understandingai.org)

306 points slyall | 2 comments | 06 Nov 24 04:05 UTC | HN request time: 0.071s | source

Show context

gregw2 ◴[06 Nov 24 16:30 UTC] No.42064762[source]▶

>>42057139 (OP) #

The article credits two academics (Hinton, Fei Fei Li) and a CEO (Jensen Huang). But really it was three academics.

Jensen Huang, reasonably, was desperate for any market that could suck up more compute, which he could pivot to from GPUs for gaming when gaming saturated its ability to use compute. Screen resolutions and visible polygons and texture maps only demand so much compute; it's an S-curve like everything else. So from a marketing/market-development and capital investment perspective I do think he deserves credit. Certainly the Intel guys struggled to similarly recognize it (and to execute even on plain GPUs.)

But... the technical/academic insight of the CUDA/GPU vision in my view came from Ian Buck's "Brook" PhD thesis at Stanford under Pat Hanrahan (Pixar+Tableau co-founder, Turing Award Winner) and Ian promptly took it to Nvidia where it was commercialized under Jensen.

For a good telling of this under-told story, see one of Hanrahan's lectures at MIT: https://www.youtube.com/watch?v=Dk4fvqaOqv4

Corrections welcome.

replies(2): >>42065944 #>>42073157 #

1. a-dub ◴[07 Nov 24 03:43 UTC] No.42073157[source]▶

>>42064762 #

that's what i remember. i remember reading an academic paper about a cool hack where someone was getting the shaders in gpus to do massively parallel general purpose vector ops. it was this massive orders of magnitude scaling that enabled neural networks to jump out of obscurity and into the limelight.

i remember prior to that, support vectors and rkhs were the hotness for continuous signal style ml tasks. they weren't particularly scalable and transfer learning formulations seemed quite complicated. (they were, however, pretty good for demos and contests)

replies(1): >>42073253 #

2. sigmoid10 ◴[07 Nov 24 04:01 UTC] No.42073253[source]▶

>>42073157 (TP) #

You're probably thinking of this paper: https://ui.adsabs.harvard.edu/abs/2004PatRe..37.1311O/abstra...

They were running a massive neural network (by the standards back then) on a GPU years before CUDA even existed. Even funnier, they demoed it on ATI cards. But it still took until 2012 and AlexNet making heavy use of CUDA's simpler interface before the Deep Learning hype started to take off outside purely academic playgrounds.

So the insight neither came from Jensen nor the other authors mentioned above, but they were the first ones to capitalise on it.

↑