Every Way to Get Structured Output from LLMs

(www.boundaryml.com)

169 points constantinum | 5 comments | 18 Jun 24 04:01 UTC | HN request time: 0s | source

Show context

CraftingLinks ◴[18 Jun 24 06:55 UTC] No.40714816[source]▶

We just use openai function calls (tools) and then use Pydantic to verify the JSON. When validation fails we try the prompt again.

replies(2): >>40714832 #>>40715169 #

aaronvg ◴[18 Jun 24 06:58 UTC] No.40714832[source]▶

>>40714816 #

[Other BAML creator here!] one time we told a customer to do this to fix small json mistakes but turns out their customers don't tolerate a +20-30s increase in latency for regenerating a long json structure.

We instead had to write a parser to catch small mistakes like missing commas, quotes etc, and parse content even if there's things like reasoning in the response, like here: https://www.promptfiddle.com/Chain-of-Thought-KcSBh

replies(1): >>40715136 #

b2v ◴[18 Jun 24 07:48 UTC] No.40715136[source]▶

>>40714832 #

I'm not sure I understand, in the docs for the python client it says that BAML types get converted to Pydantic models, doesn't that step include the extra latency you mentioned?

replies(1): >>40715308 #

aaronvg ◴[18 Jun 24 08:20 UTC] No.40715308[source]▶

>>40715136 #

My bad, I think I didnt explain correctly. Basically you have two options when a "," is missing (amongst other issues) in an LLM output which causes a parsing issue:

- retry the request, which may take 30+ secs (if your LLM outputs are really long and you're using something like gpt4)

- fix the parsing issue

In our library we do the latter. The conversion from BAML types to Pydantic ones is a compile-time step unrelated to the problem above. That doesn't happen at runtime.

replies(1): >>40715362 #

b2v ◴[18 Jun 24 08:32 UTC] No.40715362[source]▶

>>40715308 #

Thanks for the clarification. How do you handle dynamic types, ie types determined at runtime?

replies(1): >>40715395 #

1. hellovai ◴[18 Jun 24 08:37 UTC] No.40715395[source]▶

>>40715362 #

we recently added dynamic type support with this snippet! (docs coming soon!)

Python: https://github.com/BoundaryML/baml/blob/413fdf12a0c8c1ebb75c...

Typescript: https://github.com/BoundaryML/baml/blob/413fdf12a0c8c1ebb75c...

Snippet:

async def test_dynamic():

    tb = TypeBuilder()

    tb.Person.add_property("last_name", tb.string().list())

    tb.Person.add_property("height", tb.float().optional()).description(
        "Height in meters"
    )


    tb.Hobby.add_value("chess")

    for name, val in tb.Hobby.list_values():
        val.alias(name.lower())

    tb.Person.add_property("hobbies", tb.Hobby.type().list()).description(
        "Some suggested hobbies they might be good at"
    )

    # no_tb_res = await b.ExtractPeople("My name is Harrison. My hair is black and I'm 6 feet tall.")
    tb_res = await b.ExtractPeople(
        "My name is Harrison. My hair is black and I'm 6 feet tall. I'm pretty good around the hoop.",
        {"tb": tb},
    )

    assert len(tb_res) > 0, "Expected non-empty result but got empty."

    for r in tb_res:
        print(r.model_dump())

replies(1): >>40715567 #

2. b2v ◴[18 Jun 24 09:13 UTC] No.40715567[source]▶

>>40715395 (TP) #

Neat, thanks! I'm still pondering wether I should be using this since most of the retries I have to do are because of the LLM itself not understanding the schema asked for (eg output with missing fields / using a value not present in `Literal[]`) — certain models being especially bad with deeply nested schemas and output gibberish. Anything on your end that can help with that?

replies(1): >>40715606 #

3. hellovai ◴[18 Jun 24 09:17 UTC] No.40715606[source]▶

>>40715567 #

nothing specific, but you can try our prompt / datamodel out on https://www.promptfiddle.com

or if you're open to share your prompt / data model with, I can send over my best guess of a good prompt! We've found these models works even with over 50+ fields / nested and whatnot decently well!

replies(1): >>40715700 #

4. b2v ◴[18 Jun 24 09:35 UTC] No.40715700{3}[source]▶

>>40715606 #

I might share it with you later on your discord server.

> I can send over my best guess of a good prompt!

Now if you could automate the above process by "fitting" a first draft prompt to a wanted schema, ie where your library makes a few adjustments if some assertions do not pass by have having a chat of its own with the LLM, that would be super useful! Heck i might just implement it myself.

replies(1): >>40720650 #

5. aaronvg ◴[18 Jun 24 18:24 UTC] No.40720650{4}[source]▶

>>40715700 #

[Another BAML creator here]. I agree this is an interesting direction! We have a "chat" feature on our roadmap to do this right in the VSCode playground, where an AI agent will have context on your prompt, schema, (and baml test results etc) and help you iterate on the prompt automatically. We've done this before and have been surprised by how good the LLM feedback can be.

We just need a bit better testing flow within BAML since we do not support adding assertions just yet.

↑