I can bring data here:
We recently did the first speed run where Louie.ai beat teams of professional cybersecurity analysts in an open competition, Splunk's annual Boss of the SOC. Think writing queries, wrangling Python, and scanning through 100+ log sources to answer frustratingly sloppy database questions:
- We get 100% correct for basic stuff in the first half that takes most people 5-15 minutes per question, and 50% correct in the second half that most people take 15-45+ minute per question, and most teams time out in that second half.
- ... Louie does a median 2-3min per question irrespective of the expected difficulty, so about 10X faster than a team of 5 (wall clock), and 30X less work (person hours). Louie isn't burnt out at the end ;-)
- This doesn't happen out-of-the-box with frontier models, including fancy reasoning ones. Likewise, letting the typical tool here burn tokens until it finds an answer would cost more than a new hire, which is why we measure as a speedrun vs deceptively uncapped auto-solve count.
- The frontier models DO have good intuition , understand many errors, and for popular languages, DO generate good text2query. We are generally happy with OpenAI for example, so it's more on how Louie and the operator uses it.
- We found we had to add in key context and strategies. You see a bit in Claude Code and Cursor, except those are quite generic, so would have failed as well. Intuitively in coding, you want to use types/lint/tests, and same but diff issues if you do database stuff. But there is a lot more, by domain, in my experience, and expecting tools to just work is unlikely, so having domain relevant patterns baked in and that you can extend is key, and so is learning loops.
A bit more on louie's speed run here: https://www.linkedin.com/posts/leo-meyerovich-09649219_genai...
This is our first attempt at the speed run. I expect Louie to improve: my answers represent the current floor, not the ceiling of where things are (dizzyingly) going. Happy to answer any other q's where data might help!