I haven't looked deeply into pydantic's implementation but this might be related to tool-usage vs completion [0], the backend LLM model, etc. All I know is that with the same LLM models, `openai.client.chat.completions` + a custom prompt to pass in the pydantic JSON schema + post-processing to instantiate SomePydanticModel(*json) creates objects successfully whereas vanilla pydantic-ai rarely does, regardless of the number of retries.
I went with what works in my code, but didn't remove the pydantic-ai dependency completely because I'm hoping something changes. I'd say that getting dynamic prompt context by leveraging JSON schemas, model-and-field docs from pydantic, plus maybe other results from runtime-inspection (like the actual source-code) is obviously a very good idea. Many people want something like "fuzzy compilers" with structured output, not magical oracles that might return anything.
Documentation is context, and even very fuzzy context is becoming a force multiplier. Similarly languages/frameworks with good support for runtime-inspection/reflection and have an ecosystem with strong tools for things like ASTs really should be the best things to pair with AI and agents.