Even if we assume there is value in it, why should it replace (even if in part) the previous activity of reliably making computers do exactly what we want?
Even if we assume there is value in it, why should it replace (even if in part) the previous activity of reliably making computers do exactly what we want?
A contrived example: there are only 100 MB of disk space left, but 1 GB of logs to write. LLM discards 900 MB of logs and keeps only the most important lines.
Sure, you can nitpick this example, but it's the kind of edge case handling that LLMs can "do something resonable" that before required hard coding and special casing.
So you trade reliability to get to that extra 20% of hard cases.
When I watch juniors struggle they seem to think that it's because they dont think hard enough whereas it's usually because they didnt build enough infrastructure that would prevent them from needing to think too hard.
As it happens, when it comes to programming, LLM unreliabilities seem to align quite closely with ours so the same guardrails that protect against human programmers' tendencies to fuck up (mostly tests and types) work pretty well for LLMs too.
Maybe that does add up to solving harder higher level real world problems (business problems) from a practical standpoint, perhaps that's what you mean rather than technical problems.
Or maybe you're referring to producing software which utilizes LLMs, rather than using LLMs to program software (which is what I think the blog post is about, but we should certainly discuss both.)
If you've never done web-dev, and want to create an web-app, where does that fall? In principle you could learn web-dev in 1 week/month, so technically you could do it.
> maybe you're referring to producing software which utilizes LLMs
but yes, this is what I meant, outsourcing "business logic" to an LLM instead of trying to express it in code.
And it’s not just this specific problem. I don’t think letting an LLM handle edge cases is really ever an appropriate use case in production.
I’d much rather the system just fail so that someone will fix it. Imagine a world where at every level instead of failing and halting, everything error just got bubbled up to an LLM that tried to do something reasonable.
Talk about emergent behavior, or more likely catastrophic cascading failures.
I can kind of see your point if you’re talking about a truly hopeless scenario. Like some imaginary autonomous spacecraft that is going to crash into the sun, so in a last ditch effort the autopilot turns over the controls to an LLM.
But even in that scenario we have to have some way of knowing that we truly are in a hopeless scenario. Maybe it just appears that way and the LLM makes it worse.
Or maybe the LLM decides to pilot it into another spacecraft to reduce velocity.
My point is there aren’t many scenarios where “do something reasonable 90% of the time, but do something insane the other 10% of the time” is better than do nothing.
I’ve been using LLMs at work and my gut feeling saying I’m getting some productivity boost, but I’m not even certain of that because I have also spent time chasing subtle bugs that I wouldn’t have introduced myself. I think I’m going to need to see the results of some large well designed studies and several years of output before I really feel confident saying one way or the other.