How good is it in terms of scalability?
Im looking for something that wont crap out when we get to 100s or 1000s of tests
Hmm, I didn’t test this many tests. But you can run all tests in parallel — two at a time — and we should be good. The limitation isn’t the framework; it’s your budget and the tier you have on OpenAI. If you have Tier 2, which allows more requests to the OpenAI API, you could potentially run 1,000 tests in a single run.