Have you had any neurologists utilize your dataset? My own reaction after solving a few of the puzzles was "Why is this so intuitive for me, but not for an LLM?".
Our human-ability to abstract things is underrated.
There have been some human studies on ARC 1 previously, I expect there will be more in the future. See this paper from 2021, which was one of the earliest works in this direction: https://arxiv.org/abs/2103.05823