For starter, "sufficient checks"
does mean sufficient and that inherently means I need to fully understabd the risks.
You're jumping to conclusions not supported by the comment at all.
Also, the comment has two parts: One about writing code, and one about integrating models in workflows.
To the latter, the point is that for a whole lot of uses you can trivially ensure the failure modes are safe.
E.g. I am integrating gpt with my email. "Mostly ok most of the time" applies to things like e.g. summaries and prioritisation, because worst case I just get to an email a bit later. "Sufficient checks" applies to things like writing proposed replies: There's no way I'd send one without reading it, and it's sufficient for me to read through it before pressing send (and making adjustments as needed). Failures here would matter if I intended to make a product of it, but as a productivity tool for myself it just needs to be close enough.
There are a whole lot of possibilities like that.
But even for coding related tasks there are a whole lot of low risk tasks,such as e.g. generating HTML or CSS, or provide usage examples, or providing a scaffold for something you know well how to do but which are time consuming.
If you're trying to make it do things that'd be time consuming to verify sufficiently well, then that's a bad use. The good uses are those where errors are low impact and easy to catch.