The deep learning boom caught almost everyone by surprise

(www.understandingai.org)

306 points slyall | 2 comments | 06 Nov 24 04:05 UTC | HN request time: 0.398s | source

Show context

DeathArrow ◴[06 Nov 24 09:10 UTC] No.42058383[source]▶

>>42057139 (OP) #

I think neural nets are just a subset of machine learning techniques.

I wonder what would have happened if we poured the same amount of money, talent and hardware into SVMs, random forests, KNN, etc.

I don't say that transformers, LLMs, deep learning and other great things that happened in the neural network space aren't very valuable, because they are.

But I think in the future we should also study other options which might be better suited than neural networks for some classes of problems.

Can a very large and expensive LLM do sentiment analysis or classification? Yes, it can. But so can simple SVMs and KNN and sometimes even better.

I saw some YouTube coders doing calls to OpenAI's o1 model for some very simple classification tasks. That isn't the best tool for the job.

replies(11): >>42058980 #>>42059047 #>>42059100 #>>42059544 #>>42059813 #>>42060244 #>>42060447 #>>42060561 #>>42060833 #>>42062658 #>>42088131 #

jasode ◴[06 Nov 24 10:51 UTC] No.42059813[source]▶

>>42058383 #

>I wonder what would have happened if we poured the same amount of money, talent and hardware into SVMs, random forests, KNN, etc.

But that's backwards from how new techniques and progress is made. What actually happens is somebody (maybe a student at a university) has an insight or new idea for an algorithm that's near $0 cost to implement a proof-of concept. Then everybody else notices the improvement and then extra millions/billions get directed toward it.

New ideas -- that didn't cost much at the start -- ATTRACT the follow on billions in investments.

This timeline of tech progress in computer science is the opposite from other disciplines such as materials science or bio-medical fields. Trying to discover the next super-alloy or cancer drug all requires expensive experiments. Manipulating atoms & molecules requires very expensive specialized equipment. In contrast, computer science experiments can be cheap. You just need a clever insight.

An example of that was the 2012 AlexNet image recognition algorithm that blew all the other approaches out of the water. Alex Krizhevsky had an new insight on a convolutional neural network to run on CUDA. He bought 2 NVIDIA cards (GTX580 3GB GPU) from Amazon. It didn't require NASA levels of investment at the start to implement his idea. Once everybody else noticed his superior results, the billions began pouring in to iterate/refine on CNNs.

Both the "attention mechanism" and the refinement of "transformer architecture" were also cheap to prove out at a very small scale. In 2014, Jakob Uszkoreit thought about an "attention mechanism" instead of RNN and LSTM for machine translation. It didn't cost billions to come up with that idea. Yes, ChatGPT-the-product cost billions but the "attention mechanism algorithm" did not.

>into SVMs, random forests, KNN, etc.

If anyone has found an unknown insight into SVM, KNN, etc that everybody else in the industry has overlooked, they can do cheap experiments to prove it. E.g. The entire Wikipedia text download is currently only ~25GB. Run the new SVM classification idea on that corpus. Very low cost experiments in computer science algorithms can still be done in the proverbial "home garage".

replies(3): >>42061648 #>>42063764 #>>42065288 #

1. scotty79 ◴[06 Nov 24 15:27 UTC] No.42063764[source]▶

>>42059813 #

Do transformer architecture and attention mechanisms actually give any benefit to anything else than scalability?

I though the main insights were embeddings, positional encoding and shortcuts through layers to improve back propagation.

replies(1): >>42074043 #

2. valzam ◴[07 Nov 24 06:34 UTC] No.42074043[source]▶

>>42063764 (TP) #

When it comes to ML there is no such distinction though. Bigger models == more capable models and for bigger models you need scalability of the algorithm. It's like asking if going to 2nm fabs has any benefit other than putting more transistors in a chip. It's the entire point.

↑