> as a hiring manager it makes my problem harder. How do you compare the results from different choices equitably?
That makes sense, and it's the perspective that's being drilled into a lot of us. Implicit bias and all that. But in my experience, the comparison problem has never been that big of an issue in practice. I guess it depends on the hiring climate, but I'm much more familiar with spending a lot of time going through mediocre candidates and looking for someone who's "good enough". And this is when the recruiter is doing their job, and I understand why each candidate made it to the interview stage. Sure, sometimes it's a hot position (or rather, hot group to work for) and we get multiple good candidates, but then the decisionmaking process is more about "do we want A, who has deep and solid experience in this exact area; or B, who blew us away with their breadth of knowledge, flexibility, and creativity?" than something like "do we want A who did great on the take-home test but was unimpressive in person, or B whose solution to the take-home test was so-so but was clearly very knowledgeable and experienced in person?" The latter is in a hypothetical case where everyone did the same stuff, just to make it easier to compare, but even in that setup that's an uncommon and uninteresting choice. You're comparing test results and trying to infer Truth from it. Test results don't give a lot of signal or predictive power in the first place, so if you're trying to be objective by relying only on normalized scores, then you're not working with much of value.
Take home tests or whiteboard tests or whatever are ok to use as a filter, but people aren't points along a one-dimensional "quality" line. Use whatever you have to in order to find people that have a decent probability of being good enough, then stop thinking about how they might fail and start thinking about what it would look like if they succeed. They'll have different strengths and advantages. Standardizing your tests isn't going to help you explore those.