The deep learning boom caught almost everyone by surprise

1. DeathArrow ◴[06 Nov 24 09:10 UTC] No.42058383[source]▶

>>42057139 (OP) #

I think neural nets are just a subset of machine learning techniques.

I wonder what would have happened if we poured the same amount of money, talent and hardware into SVMs, random forests, KNN, etc.

I don't say that transformers, LLMs, deep learning and other great things that happened in the neural network space aren't very valuable, because they are.

But I think in the future we should also study other options which might be better suited than neural networks for some classes of problems.

Can a very large and expensive LLM do sentiment analysis or classification? Yes, it can. But so can simple SVMs and KNN and sometimes even better.

I saw some YouTube coders doing calls to OpenAI's o1 model for some very simple classification tasks. That isn't the best tool for the job.

replies(11): >>42058980 #>>42059047 #>>42059100 #>>42059544 #>>42059813 #>>42060244 #>>42060447 #>>42060561 #>>42060833 #>>42062658 #>>42088131 #

2. Meloniko ◴[06 Nov 24 09:58 UTC] No.42058980[source]▶

>>42058383 (TP) #

And based on what though do you think that?

I think neural networks are fundamental and we will focus/experiment a lot more with architecture, layers and other parts involved but emerging features arise through size

3. mentalgear ◴[06 Nov 24 10:02 UTC] No.42059047[source]▶

>>42058383 (TP) #

KANs (Kolmogorov-Arnold Networks) are one example of a promising exploration pathway to real AGI, with the advantage of full explain-ability.

replies(2): >>42059624 #>>42073900 #

4. trhway ◴[06 Nov 24 10:05 UTC] No.42059100[source]▶

>>42058383 (TP) #

>I wonder what would have happened if we poured the same amount of money, talent and hardware into SVMs, random forests, KNN, etc.

people did that to horses. No car resulted from it, just slightly better horses.

>I saw some YouTube coders doing calls to OpenAI's o1 model for some very simple classification tasks. That isn't the best tool for the job.

This "not best tool" is just there for the coders to call while the "simple SVMs and KNN" would require coding and training by those coders for the specific task they have at hand.

replies(1): >>42060054 #

5. empiko ◴[06 Nov 24 10:32 UTC] No.42059544[source]▶

>>42058383 (TP) #

Deep learning is easy to adapt to various domains, use cases, training criteria. Other approaches do not have the flexibility of combining arbitrary layers and subnetworks and then training them with arbitrary loss functions. The depth in deep learning is also pretty important, as it allows the model to create hierarchical representations of the inputs.

replies(1): >>42060613 #

6. astrange ◴[06 Nov 24 10:38 UTC] No.42059624[source]▶

>>42059047 #

"Explainable" is a strong word.

As a simple example, if you ask a question and part of the answer is directly quoted from a book from memory, that text is not computed/reasoned by the AI and so doesn't have an "explanation".

But I also suspect that any AGI would necessarily produce answers it can't explain. That's called intuition.

replies(1): >>42059743 #

7. diffeomorphism ◴[06 Nov 24 10:46 UTC] No.42059743{3}[source]▶

>>42059624 #

Why? If I ask you what the height of the Empire State Building is, then a reference is a great, explainable answer.

replies(1): >>42061157 #

8. jasode ◴[06 Nov 24 10:51 UTC] No.42059813[source]▶

>>42058383 (TP) #

>I wonder what would have happened if we poured the same amount of money, talent and hardware into SVMs, random forests, KNN, etc.

But that's backwards from how new techniques and progress is made. What actually happens is somebody (maybe a student at a university) has an insight or new idea for an algorithm that's near $0 cost to implement a proof-of concept. Then everybody else notices the improvement and then extra millions/billions get directed toward it.

New ideas -- that didn't cost much at the start -- ATTRACT the follow on billions in investments.

This timeline of tech progress in computer science is the opposite from other disciplines such as materials science or bio-medical fields. Trying to discover the next super-alloy or cancer drug all requires expensive experiments. Manipulating atoms & molecules requires very expensive specialized equipment. In contrast, computer science experiments can be cheap. You just need a clever insight.

An example of that was the 2012 AlexNet image recognition algorithm that blew all the other approaches out of the water. Alex Krizhevsky had an new insight on a convolutional neural network to run on CUDA. He bought 2 NVIDIA cards (GTX580 3GB GPU) from Amazon. It didn't require NASA levels of investment at the start to implement his idea. Once everybody else noticed his superior results, the billions began pouring in to iterate/refine on CNNs.

Both the "attention mechanism" and the refinement of "transformer architecture" were also cheap to prove out at a very small scale. In 2014, Jakob Uszkoreit thought about an "attention mechanism" instead of RNN and LSTM for machine translation. It didn't cost billions to come up with that idea. Yes, ChatGPT-the-product cost billions but the "attention mechanism algorithm" did not.

>into SVMs, random forests, KNN, etc.

If anyone has found an unknown insight into SVM, KNN, etc that everybody else in the industry has overlooked, they can do cheap experiments to prove it. E.g. The entire Wikipedia text download is currently only ~25GB. Run the new SVM classification idea on that corpus. Very low cost experiments in computer science algorithms can still be done in the proverbial "home garage".

replies(3): >>42061648 #>>42063764 #>>42065288 #

9. guappa ◴[06 Nov 24 11:09 UTC] No.42060054[source]▶

>>42059100 #

[citation needed]

10. edude03 ◴[06 Nov 24 11:23 UTC] No.42060244[source]▶

>>42058383 (TP) #

Transformers were made for machine translation - someone had the insight that when going from one language to another the context mattered such that the tokens that came before would bias which ones came after. It just so happened that transformers we more performant on other tasks, and at the time you could demonstrate the improvement on a small scale.

11. ldjkfkdsjnv ◴[06 Nov 24 11:38 UTC] No.42060447[source]▶

>>42058383 (TP) #

This is such a terrible opinion, im so tired of reading the LLM deniers

12. f1shy ◴[06 Nov 24 11:46 UTC] No.42060561[source]▶

>>42058383 (TP) #

> neural nets are just a subset of machine learning techniques.

Fact by definition

13. f1shy ◴[06 Nov 24 11:50 UTC] No.42060613[source]▶

>>42059544 #

But is very hard to validate for important or critical applications

14. dr_dshiv ◴[06 Nov 24 12:05 UTC] No.42060833[source]▶

>>42058383 (TP) #

The best tool for the job is, I’d argue, the one that does the job most reliably for the least amount of money. When you consider how little expertise or data you need to use openai offerings, I’d be surprised if sentiment analysis using classical ML methods are actually better (unless you are an expert and have a good dataset).

15. astrange ◴[06 Nov 24 12:28 UTC] No.42061157{4}[source]▶

>>42059743 #

It wouldn't be a reference; "explanation" for an LLM means it tells you which of its neurons were used to create the answer, ie what internal computations it did and which parts of the input it read. Their architecture isn't capable of referencing things.

What you'd get is an explanation saying "it quoted this verbatim", or possibly "the top neuron is used to output the word 'State' after the word 'Empire'".

You can try out a system here: https://monitor.transluce.org/dashboard/chat

Of course the AI could incorporate web search, but then what if the explanation is just "it did a web search and that was the first result"? It seems pretty difficult to recursively make every external tool also explainable…

replies(2): >>42061585 #>>42061651 #

16. Retric ◴[06 Nov 24 12:59 UTC] No.42061585{5}[source]▶

>>42061157 #

LLM’s are not the only possible option here. When talking about AGI none of what we are doing is currently that promising.

The search is for something that can write an essay, drive a car, and cook lunch so we need something new.

replies(1): >>42064107 #

17. FrustratedMonky ◴[06 Nov 24 13:04 UTC] No.42061648[source]▶

>>42059813 #

"$0 cost to implement a proof-of concept"

This falls apart for breakthroughs that are not zero cost to do a proof-of concept.

Think that is what the parent is rereferring . That other technologies might have more potential, but would take money to build out.

18. diffeomorphism ◴[06 Nov 24 13:04 UTC] No.42061651{5}[source]▶

>>42061157 #

Then you should have a stronger notion of "explanation". Why were these specific neurons activated?

Simplest example: OCR. A network identifying digits can often be explained as recognizing lines, curves, numbers of segments etc.. That is an explanation, not "computer says it looks like an 8"

replies(1): >>42065185 #

19. jensgk ◴[06 Nov 24 14:20 UTC] No.42062658[source]▶

>>42058383 (TP) #

> I wonder what would have happened if we poured the same amount of money, talent and hardware into SVMs, random forests, KNN, etc.

From my perspective, that is actually what happened between the mid-90s to 2015. Neural netowrks were dead in that period, but any other ML method was very, very hot.

20. scotty79 ◴[06 Nov 24 15:27 UTC] No.42063764[source]▶

>>42059813 #

Do transformer architecture and attention mechanisms actually give any benefit to anything else than scalability?

I though the main insights were embeddings, positional encoding and shortcuts through layers to improve back propagation.

replies(1): >>42074043 #

21. Vampiero ◴[06 Nov 24 15:49 UTC] No.42064107{6}[source]▶

>>42061585 #

When people talk about explainability I immediately think of Prolog.

A Prolog query is explainable precisely because, by construction, it itself is the explanation. And you can go step by step and understand how you got a particular result, inspecting each variable binding and predicate call site in the process.

Despite all the billions being thrown at modern ML, no one has managed to create a model that does something like what Prolog does with its simple recursive backtracking.

So the moral of the story is that you can 100% trust the result of a Prolog query, but you can't ever trust the output of an LLM. Given that, which technology would you rather use to build software on which lives depend on?

And which of the two methods is more "artificially intelligent"?

replies(1): >>42070201 #

22. krisoft ◴[06 Nov 24 16:54 UTC] No.42065185{6}[source]▶

>>42061651 #

But can humans do that? If you show someone a picture of a cat, can they "explain" why is it a cat and not a dog or a pumpkin?

And is that explanation the way how they obtained the "cat-nes" of the picture, or do they just see that it is a cat immediately and obviously and when you ask them for an explanation they come up with some explaining noises until you are satisfied?

replies(2): >>42067149 #>>42067384 #

23. DeathArrow ◴[06 Nov 24 17:00 UTC] No.42065288[source]▶

>>42059813 #

True, you might not need lots of money to test some ideas. But LLMs and transformers are all the rage so they gather all attention and research funds.

People don't even think of doing anything else and those that might do, are paid to pursue research on LLMs.

24. diffeomorphism ◴[06 Nov 24 18:42 UTC] No.42067149{7}[source]▶

>>42065185 #

Wild cat, house cat, lynx,...? Sure, they can. They will tell you about proportions, shape of the ears, size as compared to other objects in the picture etc.

For cat vs pumpkin they will think you are making fun of them, but it very much is explainable. Though now I am picturing a puzzle about finding orange cats in a picture of a pumpkin field.

replies(1): >>42075270 #

25. fragmede ◴[06 Nov 24 18:56 UTC] No.42067384{7}[source]▶

>>42065185 #

Shown a picture of a cloud, why it looks like a cat does sometimes need an explanation until others can see the cat, and it's not just "explaining noises".

26. astrange ◴[06 Nov 24 22:05 UTC] No.42070201{7}[source]▶

>>42064107 #

The site I linked above does that for LLaMa 8B.

https://transluce.org/observability-interface

LLMs don't have enough self-awareness to produce really satisfying explanations though, no.

27. yathaid ◴[07 Nov 24 06:11 UTC] No.42073900[source]▶

>>42059047 #

Neural networks can encode any computable function.

KANs have no advantage in terms of computability. Why are they a promising pathway?

Also, the splines in KANs are no more "explainable" than the matrix weights. Sure, we can assign importance to a node, but so what? It has no more meaning than anything else.

28. valzam ◴[07 Nov 24 06:34 UTC] No.42074043{3}[source]▶

>>42063764 #

When it comes to ML there is no such distinction though. Bigger models == more capable models and for bigger models you need scalability of the algorithm. It's like asking if going to 2nm fabs has any benefit other than putting more transistors in a chip. It's the entire point.

29. krisoft ◴[07 Nov 24 10:00 UTC] No.42075270{8}[source]▶

>>42067149 #

> They will tell you about proportions, shape of the ears, size as compared to other objects in the picture etc.

But is that how they know that the image is a cat, or is that some after the fact tacked on explaining?

Let me tell you an example to better explain what I mean. There are these “botanical identifying” books. You take a speciment unknown to you and and it asks questions like “what shape the leaves are?” “Is the stem woody or not?” “How many petals on the flower?” And it leads you through a process and at the end gives you ideally the specific latin name of the species. (Or at least narrows it down.)

Vs the act of looking at a rose and knowing without having to expend any further energy that it is a rose. And then if someone is questioning you you can spend some energy on counting petals, and describing leaf shapes and find the thorns and point them out and etc.

It sounds like most people who want “explainable AI” want the first kind of thing. The blind and amnesiac botanist with the plant identifying book. Vs what humans are actually doing which is more like a classification model with a tacked on bulshit generator to reason about the classification model’s outputs into which it doesn’t actually have any in-depth insight.

And it gets worse the deeper you ask them. How do you know that is an ear? How do you know its shape? How do you know the animal is furry?

30. netdevnet ◴[08 Nov 24 16:32 UTC] No.42088131[source]▶

>>42058383 (TP) #

You are supposed to call it AI now. The word "machine learning" is for GOFAI 2nd gen only. Once all investors have been money drained and the next AI winter begins, then you will be allowed to call it Machine Learning