(lucalp.dev)

296 points todsacerdoti | 1 comments | 24 Jun 25 14:14 UTC | HN request time: 0.311s | source

Show context

cheesecompiler ◴[24 Jun 25 15:32 UTC] No.44367317[source]▶

The reverse is possible too: throwing massive compute at a problem can mask the existence of a simpler, more general solution. General-purpose methods tend to win out over time—but how can we be sure they’re truly the most general if we commit so hard to one paradigm (e.g. LLMs) that we stop exploring the underlying structure?

replies(4): >>44367776 #>>44367991 #>>44368757 #>>44375546 #

1. willvarfar ◴[25 Jun 25 10:14 UTC] No.44375546[source]▶

>>44367317 #

I think the "bitter lesson" is that, while startup A is trying to tune and optimise what they do in order to be able to train their model with hardware quantity B, there is another startup called C that will be lucky enough to have B*2 hardware (credits etc) and will not try so hard to optimise and will reach the end quicker?

Of course deepseek was forced to take the optimisation approach but got to the end in time to stake a claim. So ymmv.

↑

The bitter lesson is coming for tokenization