The real bitter lesson in AI is that we don't really know what we're doing. We're hacking on models looking for architectures that train well but we don't fully understand why they work. Because we don't fully understand it, we can't design anything optimal or know how good a solution can possibly get.
Well, technically, that's not true: The entire idea behind complexity theory is that there are some tasks that you can't throw more hardware at - at least not for interesting problem sizes or remotely feasible amounts of hardware.
I wonder if we'll reach a similar situation in AI where "throw more context/layers/training data at the problem" won't help anymore and people will be forced to care more about understanding again.
More precisely, I think producing a good fast merge of ca 5 lists was a problem I didn’t have good answers for but maybe I was too fixated on a streaming solution and didn’t apply enough tricks.