I've been working on this problem for a while. There are whole companies that do this. They all work by having a human review a sample of the results and score them (with various uses of magic to make that more efficient). And then suggest changes to make it more accurate in the future.
The best companies can get up to 90% accuracy. Most are closer to 80%.
But it's important to remember, we're expecting perfection here. But think about this: Have you ever asked someone to book a flight for you? How did it go?
At least in my experience, there's usually a few back and forth emails, and then something is always not quite right or as good as if you did it yourself, but you're ok with that because it saved you time. The one thing that makes it better is if the same person does it for you a couple of times and learned your specific habits and what you care about.
I think the biggest problem in AI accuracy is expecting the AI to be better than a human.
replies(2):