Every Way to Get Structured Output from LLMs

I expected to read about the methods used by the libraries to get the structured output and not a comparison of the language compatibility for each.

Fortunately the same author have a blog post (https://www.boundaryml.com/blog/type-definition-prompting-ba...) explaining how their approach works and how it compares to instructor (https://github.com/jxnl/instructor).

Basically these libraries provide two things: 1. A way to prompt the LLM 2. A way to get a valid JSON

For 1. instructor does it through the json schema definition, BAML's innovation is that they use a simplified lossless schema definition that uses less tokens.

For 2. instructor does it through reprompting until they receive a valid JSON. BAML's innovation is a fuzzy parser able to to parse non-perfect JSON.

Personally I think that there is no need to all these abstractions to get structured outputs from LLMs. A simple .to_prompt() function that takes a pydantic and translate it into some prompt block you can add to your prompt and a retry is sufficient to get the same results.