←back to thread

296 points todsacerdoti | 3 comments | | HN request time: 0s | source
Show context
cheesecompiler ◴[] No.44367317[source]
The reverse is possible too: throwing massive compute at a problem can mask the existence of a simpler, more general solution. General-purpose methods tend to win out over time—but how can we be sure they’re truly the most general if we commit so hard to one paradigm (e.g. LLMs) that we stop exploring the underlying structure?
replies(4): >>44367776 #>>44367991 #>>44368757 #>>44375546 #
api ◴[] No.44368757[source]
CS is full of trivial examples of this. You can use an optimized parallel SIMD merge sort to sort a huge list of ten trillion records, or you can sort it just as fast with a bubble sort if you throw more hardware at it.

The real bitter lesson in AI is that we don't really know what we're doing. We're hacking on models looking for architectures that train well but we don't fully understand why they work. Because we don't fully understand it, we can't design anything optimal or know how good a solution can possibly get.

replies(2): >>44369623 #>>44370235 #
1. xg15 ◴[] No.44369623[source]
> You can use an optimized parallel SIMD merge sort to sort a huge list of ten trillion records, or you can sort it just as fast with a bubble sort if you throw more hardware at it.

Well, technically, that's not true: The entire idea behind complexity theory is that there are some tasks that you can't throw more hardware at - at least not for interesting problem sizes or remotely feasible amounts of hardware.

I wonder if we'll reach a similar situation in AI where "throw more context/layers/training data at the problem" won't help anymore and people will be forced to care more about understanding again.

replies(2): >>44370087 #>>44370242 #
2. jimbokun ◴[] No.44370087[source]
And whether that understanding will be done by humans or the AIs themselves.
3. svachalek ◴[] No.44370242[source]
I think it can be argued that ChatGPT 4.5 was that situation.