←back to thread

169 points constantinum | 1 comments | | HN request time: 0s | source
Show context
resiros ◴[] No.40717267[source]
I expected to read about the methods used by the libraries to get the structured output and not a comparison of the language compatibility for each.

Fortunately the same author have a blog post (https://www.boundaryml.com/blog/type-definition-prompting-ba...) explaining how their approach works and how it compares to instructor (https://github.com/jxnl/instructor).

Basically these libraries provide two things: 1. A way to prompt the LLM 2. A way to get a valid JSON

For 1. instructor does it through the json schema definition, BAML's innovation is that they use a simplified lossless schema definition that uses less tokens.

For 2. instructor does it through reprompting until they receive a valid JSON. BAML's innovation is a fuzzy parser able to to parse non-perfect JSON.

Personally I think that there is no need to all these abstractions to get structured outputs from LLMs. A simple .to_prompt() function that takes a pydantic and translate it into some prompt block you can add to your prompt and a retry is sufficient to get the same results.

replies(1): >>40717685 #
Jayakumark ◴[] No.40717685[source]
Will you be able to share an example code or gist ?
replies(2): >>40719010 #>>40720535 #
1. resiros ◴[] No.40720535[source]
If you look into the instructor code(https://github.com/jxnl/instructor/blob/06a49e7824729b8df1f7...). Here is the core code snippet they use:

            message = dedent(
                f"""
                As a genius expert, your task is to understand the content and provide
                the parsed objects in json that match the following json_schema:\n

                {json.dumps(response_model.model_json_schema(), indent=2)}

                Make sure to return an instance of the JSON, not the schema itself
                """
            )

Then depending on the mode, either they add another message `Return the correct JSON response within a ```json codeblock. not the JSON_SCHEMA` or they set the response format to json.