'Positive review only': Researchers hide AI prompts in papers

(asia.nikkei.com)

177 points ohjeez | 1 comments | 05 Jul 25 15:15 UTC | HN request time: 0.203s | source

Show context

gmerc ◴[05 Jul 25 15:38 UTC] No.44473491[source]▶

Good. Everyone should do this everywhere, not just in research papers. Because that's the only way we get the necessary focus on fixing the prompt injection nonsense, which requires a new architecture

replies(3): >>44473658 #>>44473930 #>>44474283 #

grishka ◴[05 Jul 25 17:41 UTC] No.44474283[source]▶

>>44473491 #

No, we don't need to fix prompt injection. We need to discredit AI so much that no one relies on it for anything serious.

replies(3): >>44474778 #>>44474913 #>>44475408 #

serbuvlad ◴[05 Jul 25 20:44 UTC] No.44475408[source]▶

>>44474283 #

Define "discredit". Define "rely". I administer some servers and a few classrooms at my uni, along with two colleagues. This is not my primary job. This is not anyone's primary job. We went from a bunch of ad hoc solutions with shell scripts that sort of kept everything together to an entirely declarative system, with centralized accounts, access control and floating homes using Ansible, FreeIPA, NFSv4 w/ Kerberos etc. For bringing up a new classroom computer, we went from hard-cloning the hard disk with clonezilla to installing Ubuntu, enrolling the key and running the ansible install everything playbook.

This is serious. Researchers and educators rely on these systems every day to do their jobs. Tell me why this work should be discredited. Because I used AI (followed by understanding what it did, testing, a lot of tuning, a lot of changes, a lot of "how would that work" conversations, a lot of "what are the pros and cons" conversations)?

How about we just discredit the lazy use of AI instead?

Should high school kids who copy paste Wikipedia and call it their essay mean we should discredit Wikipedia?

replies(1): >>44475641 #

grishka ◴[05 Jul 25 21:20 UTC] No.44475641[source]▶

>>44475408 #

Well, that's the thing — if you understand the technology you're working with and know how to verify the result, chances are, completing the same task with AI would take you longer than without it. So the whole appeal of AI seems to be to let it do things without much oversight.

The common failure mode of AI is also concerning. If you ask it to do something that can't be done trivially or at all, or wasn't present enough in the learning dataset, it often wouldn't tell you it doesn't know how to do it. Instead, it'll make shit up with utmost confidence.

Just yesterday I stumbled upon this article that closely matches my opinion: https://eev.ee/blog/2025/07/03/the-rise-of-whatever/

replies(1): >>44475831 #

serbuvlad ◴[05 Jul 25 21:47 UTC] No.44475831[source]▶

>>44475641 #

But that's exactly the thing. I DON'T understand the technology without AI.. I know stuff about Linux, but I knew NOTHING about Ansible, FreeIPA etc. So I guess you could say I understand the problem space not the solution space?? Either way, it would have taken us many months to do what it did take us a few weeks to with AI.

> So the whole appeal of AI seems to be to let it do things without much oversight.

No?? The whole appeal of AI for me is doing things I know how I want to look at the end but I don't know how to get there.

> The common failure mode of AI is also concerning. If you ask it to do something that can't be done trivially or at all, or wasn't present enough in the learning dataset, it wouldn't tell you it doesn't know how to do it. Instead, it'll make shit up with utmost confidence.

I also feel like a lot of people made a lot of conclusions against GPT-3.5 that simply aren't true anymore.

Usually o3 and even 4o and probably most modern models rely a lot more on search results then on their training datasets. I usually even see "I know how to do this but I need to check the documentation for up to date information in case anything changed" in the chain of thought for trivial queries.

But yeah, sometimes you get the old failure mode: stuff that doesn't work. And then you try it and it fails. And you tell it it fails and how. And it either fixes it (90%+ of cases, at least with something powerful like o3), or it starts arguing with you in a nonsensical manner. If the latter, you burn the chat and start a new one, building better context, or just do a manual approach like before.

So the failure mode doesn't mean you can't identify failure. The failure mode means you can't trust it's unchecked output. Ok. So? It's not a finite state machine, it's a statistical inference machine trained on the data that currently exists. It doesn't enter a faliure state. Neither does a PID regulator when the parameters of the physical model change and no one recalibrates it. It starts outputting garbage and overshooting like crazy etc.

But both PID regulators and LLMs are hella useful if you have what to use them for.

replies(1): >>44477241 #

soraminazuki ◴[06 Jul 25 02:12 UTC] No.44477241[source]▶

>>44475831 #

> I know stuff about Linux, but I knew NOTHING about Ansible, FreeIPA etc.

Then you absolutely shouldn't be touching Ansible or FreeIPA in production until you've developed enough understanding of the basics and can look up reliable sources for the nitty gritty details. FreeIPA is security critical software for heaven's sake. "Let's make up for zero understanding with AI" is a totally unacceptable approach.

replies(1): >>44478124 #

1. serbuvlad ◴[06 Jul 25 05:39 UTC] No.44478124[source]▶

>>44477241 #

Missed the part where I said that:

a) I develop the understanding with AI (I would never use something I don't understand at all),

b) I test before pushing to prod and

c) This replaces a bunch of shoddy shell scripts so even if there are hiccups, there were a lot more hiccups before?

↑