←back to thread

169 points mattmarcus | 1 comments | | HN request time: 0.24s | source
Show context
nonameiguess ◴[] No.43612761[source]
As every fifth thread becomes some discussion of LLM capabilities, I think we need to shift the way we talk about this to be less like how we talk about software and more like how we talk about people.

"LLM" is a valid category of thing in the world, but it's not a thing like Microsoft Outlook that has well-defined capabilities and limitations. It's frustrating reading these discussions that constantly devolve into one person saying they tried something that either worked or didn't, then 40 replies from other people saying they got the opposite result, possibly with a different model, different version, slight prompt altering, whatever it is.

LLMs possibly have the capability to understand nullability, but that doesn't mean every instance of every model will consistently understand that or anything else. This is the same way humans operate. Humans can run a 4-minute mile. Humans can run a 10-second 100 meter dash. Humans can develop and prove novel math theorems. But not all humans, not all the time, performance depends upon conditions, timing, luck, and there has probably never been a single human who can do all three. It takes practice in one specific discipline to get really good at that, and this practice competes with or even limits other abilities. For LLMs, this manifests in differences with the way they get fine-tuned and respond to specific prompt sequences that should all be different ways of expressing the same command or query but nonetheless produce different results. This is very different from the way we are used to machines and software behaving.

replies(2): >>43612819 #>>43614110 #
1. aSanchezStern ◴[] No.43612819[source]
Yeah the link title is overclaiming a bit, the actual post title doesn't make such a general claim, and the post itself examines several specific models and compares their understanding.