←back to thread

169 points constantinum | 5 comments | | HN request time: 0s | source
Show context
CraftingLinks ◴[] No.40714816[source]
We just use openai function calls (tools) and then use Pydantic to verify the JSON. When validation fails we try the prompt again.
replies(2): >>40714832 #>>40715169 #
aaronvg ◴[] No.40714832[source]
[Other BAML creator here!] one time we told a customer to do this to fix small json mistakes but turns out their customers don't tolerate a +20-30s increase in latency for regenerating a long json structure.

We instead had to write a parser to catch small mistakes like missing commas, quotes etc, and parse content even if there's things like reasoning in the response, like here: https://www.promptfiddle.com/Chain-of-Thought-KcSBh

replies(1): >>40715136 #
b2v ◴[] No.40715136[source]
I'm not sure I understand, in the docs for the python client it says that BAML types get converted to Pydantic models, doesn't that step include the extra latency you mentioned?
replies(1): >>40715308 #
aaronvg ◴[] No.40715308[source]
My bad, I think I didnt explain correctly. Basically you have two options when a "," is missing (amongst other issues) in an LLM output which causes a parsing issue:

- retry the request, which may take 30+ secs (if your LLM outputs are really long and you're using something like gpt4)

- fix the parsing issue

In our library we do the latter. The conversion from BAML types to Pydantic ones is a compile-time step unrelated to the problem above. That doesn't happen at runtime.

replies(1): >>40715362 #
b2v ◴[] No.40715362[source]
Thanks for the clarification. How do you handle dynamic types, ie types determined at runtime?
replies(1): >>40715395 #
1. hellovai ◴[] No.40715395[source]
we recently added dynamic type support with this snippet! (docs coming soon!)

Python: https://github.com/BoundaryML/baml/blob/413fdf12a0c8c1ebb75c...

Typescript: https://github.com/BoundaryML/baml/blob/413fdf12a0c8c1ebb75c...

Snippet:

async def test_dynamic():

    tb = TypeBuilder()

    tb.Person.add_property("last_name", tb.string().list())

    tb.Person.add_property("height", tb.float().optional()).description(
        "Height in meters"
    )


    tb.Hobby.add_value("chess")

    for name, val in tb.Hobby.list_values():
        val.alias(name.lower())

    tb.Person.add_property("hobbies", tb.Hobby.type().list()).description(
        "Some suggested hobbies they might be good at"
    )

    # no_tb_res = await b.ExtractPeople("My name is Harrison. My hair is black and I'm 6 feet tall.")
    tb_res = await b.ExtractPeople(
        "My name is Harrison. My hair is black and I'm 6 feet tall. I'm pretty good around the hoop.",
        {"tb": tb},
    )

    assert len(tb_res) > 0, "Expected non-empty result but got empty."

    for r in tb_res:
        print(r.model_dump())
replies(1): >>40715567 #
2. b2v ◴[] No.40715567[source]
Neat, thanks! I'm still pondering wether I should be using this since most of the retries I have to do are because of the LLM itself not understanding the schema asked for (eg output with missing fields / using a value not present in `Literal[]`) — certain models being especially bad with deeply nested schemas and output gibberish. Anything on your end that can help with that?
replies(1): >>40715606 #
3. hellovai ◴[] No.40715606[source]
nothing specific, but you can try our prompt / datamodel out on https://www.promptfiddle.com

or if you're open to share your prompt / data model with, I can send over my best guess of a good prompt! We've found these models works even with over 50+ fields / nested and whatnot decently well!

replies(1): >>40715700 #
4. b2v ◴[] No.40715700{3}[source]
I might share it with you later on your discord server.

> I can send over my best guess of a good prompt!

Now if you could automate the above process by "fitting" a first draft prompt to a wanted schema, ie where your library makes a few adjustments if some assertions do not pass by have having a chat of its own with the LLM, that would be super useful! Heck i might just implement it myself.

replies(1): >>40720650 #
5. aaronvg ◴[] No.40720650{4}[source]
[Another BAML creator here]. I agree this is an interesting direction! We have a "chat" feature on our roadmap to do this right in the VSCode playground, where an AI agent will have context on your prompt, schema, (and baml test results etc) and help you iterate on the prompt automatically. We've done this before and have been surprised by how good the LLM feedback can be.

We just need a bit better testing flow within BAML since we do not support adding assertions just yet.