I keep seeing this comment all over the place. Just because something exists 1 time in the training data doesn't mean it can just regurgitate that. That's not how training works. An LLM is not a knowledge database.
In this case I don't think having seen the Arc set would help much in writing and selecting python scripts to solve the test cases. (Unless someone else has tried this approach before and _their_ results are in the training data.)
It will be good to see the private set results though.
Public discussions of solutions to the public test set will presumably have somewhat similar analogies and/or embeddings to aspects of the python programs that solve them.