←back to thread

251 points slyall | 6 comments | | HN request time: 0.001s | source | bottom
Show context
kleiba ◴[] No.42061089[source]
> “Pre-ImageNet, people did not believe in data,” Li said in a September interview at the Computer History Museum. “Everyone was working on completely different paradigms in AI with a tiny bit of data.”

That's baloney. The old ML adage "there's no data like more data" is as old as mankind itself.

replies(6): >>42061617 #>>42061818 #>>42061987 #>>42063019 #>>42063076 #>>42064875 #
FrustratedMonky ◴[] No.42061617[source]
Not really. This is referring back to the 80's. People weren't even doing 'ML'. And back then people were more focused on teasing out 'laws' in as few data points as possible. The focus was more on formulas and symbols, and finding relationships between individual data points. Not the broad patterns we take for granted today.
replies(2): >>42062250 #>>42063993 #
mistrial9 ◴[] No.42063993[source]
mid-90s had neural nets, even a few popular science kinds of books on it. The common hardware was so much less capable then.
replies(1): >>42064954 #
1. sgt101 ◴[] No.42064954{3}[source]
mid-60's had neural nets.

mid-90's had LeCun telling everyone that big neural nets were the future.

replies(1): >>42065537 #
2. dekhn ◴[] No.42065537[source]
Mid 90s I was working on neural nets and other machine learning, based on gradient descent, with manually computed derivatives, on genomic data (from what I can recall, we had no awareness of LeCun; I didnt find out about his great OCR results until much later). it worked fine and it seemed like a promising area.

My only surprise is how long it took to get to imagenet, but in retrospect, I appreciate that a number of conditions had to be met (much more data, much better algorithms, much faster computers). I also didn't recognize just how poorly MLPs were for sequence modelling, compared to RNNs and transformers.

replies(1): >>42069033 #
3. sgt101 ◴[] No.42069033[source]
I'm so out of things ! What do you mean manually computed derivatives?
replies(2): >>42071400 #>>42072510 #
4. mistrial9 ◴[] No.42071400{3}[source]
it means that code has to read values from each layer and do some summarizing math, instead of passing layer blocks to a graphics card in one primitive operation implemented on the card.
replies(1): >>42072523 #
5. dekhn ◴[] No.42072510{3}[source]
I mean we didn't know autodifferentiation was a thing, so we (my advisor, not me) analytically solved our loss function for its partial derivatives. After I wrote up my thesis, I spent a lot of time learning mathematica and advanced calculus.

I haven't invested the time to take the loss function from our paper and implement in a modern framework, but IIUC, I wouldn't need to provide the derivatives manually. That would be a satisfying outcome (indicating I had wasted a lot of effort learning math that simply wasn't necessary, because somebody had automated it better than I could do manually, in a way I can understand more easily).

6. dekhn ◴[] No.42072523{4}[source]
No. I should have said "determined the partial derivatives of the weights with respect to the variables analytically". We didn't have layers- the whole architecture was a truly crazy combination of dynamic programming with multiple different matrices and a loss function that combined many different types of evidence. AFAICT nobody does any of this any more for finding genes. We just take enormous amounts of genetic data and run an autoencoder or a sequence model over it.