Are they really lies if the grading program considers them correct answers? It sounds like an issue with using a faulty grading program than the protocol.
replies(1):
It isn't on the exact same proving scheme broken in this research, but consider https://risczero.com/blog/zkpoex, which is about proving that you have an exploit (a program) that puts a protocol into into an unexpected state, without revealing the exploit. Imagine you had a specially crafted program that allows you to prove you have an exploit, but actually none exists, and it is just that you're computing the same hash in your program that is used in the Fiat-Shamir heuristic, and violating the assumptions of the random oracle model.