This is serious. Researchers and educators rely on these systems every day to do their jobs. Tell me why this work should be discredited. Because I used AI (followed by understanding what it did, testing, a lot of tuning, a lot of changes, a lot of "how would that work" conversations, a lot of "what are the pros and cons" conversations)?
How about we just discredit the lazy use of AI instead?
Should high school kids who copy paste Wikipedia and call it their essay mean we should discredit Wikipedia?
The common failure mode of AI is also concerning. If you ask it to do something that can't be done trivially or at all, or wasn't present enough in the learning dataset, it often wouldn't tell you it doesn't know how to do it. Instead, it'll make shit up with utmost confidence.
Just yesterday I stumbled upon this article that closely matches my opinion: https://eev.ee/blog/2025/07/03/the-rise-of-whatever/
> So the whole appeal of AI seems to be to let it do things without much oversight.
No?? The whole appeal of AI for me is doing things I know how I want to look at the end but I don't know how to get there.
> The common failure mode of AI is also concerning. If you ask it to do something that can't be done trivially or at all, or wasn't present enough in the learning dataset, it wouldn't tell you it doesn't know how to do it. Instead, it'll make shit up with utmost confidence.
I also feel like a lot of people made a lot of conclusions against GPT-3.5 that simply aren't true anymore.
Usually o3 and even 4o and probably most modern models rely a lot more on search results then on their training datasets. I usually even see "I know how to do this but I need to check the documentation for up to date information in case anything changed" in the chain of thought for trivial queries.
But yeah, sometimes you get the old failure mode: stuff that doesn't work. And then you try it and it fails. And you tell it it fails and how. And it either fixes it (90%+ of cases, at least with something powerful like o3), or it starts arguing with you in a nonsensical manner. If the latter, you burn the chat and start a new one, building better context, or just do a manual approach like before.
So the failure mode doesn't mean you can't identify failure. The failure mode means you can't trust it's unchecked output. Ok. So? It's not a finite state machine, it's a statistical inference machine trained on the data that currently exists. It doesn't enter a faliure state. Neither does a PID regulator when the parameters of the physical model change and no one recalibrates it. It starts outputting garbage and overshooting like crazy etc.
But both PID regulators and LLMs are hella useful if you have what to use them for.
Then you absolutely shouldn't be touching Ansible or FreeIPA in production until you've developed enough understanding of the basics and can look up reliable sources for the nitty gritty details. FreeIPA is security critical software for heaven's sake. "Let's make up for zero understanding with AI" is a totally unacceptable approach.
a) I develop the understanding with AI (I would never use something I don't understand at all),
b) I test before pushing to prod and
c) This replaces a bunch of shoddy shell scripts so even if there are hiccups, there were a lot more hiccups before?