←back to thread

169 points constantinum | 5 comments | | HN request time: 0.821s | source
1. JakaJancar ◴[] No.40714866[source]
AI noob question:

Why do OpenAI/Anthropic/... not support constraining token generation? I'd imagine producing valid structured output would be at the top of their feature request lists.

replies(3): >>40714901 #>>40715217 #>>40717249 #
2. hellovai ◴[] No.40714901[source]
not a noob question, here's how the LLM works:

```

prompt = "..."

output = []

do:

  token_probabilities = call_model(prompt)

  best_token = pick_best(token_probabilities)

  if best_token == '<END>':

    break

  output += best_token
while true

return output

```

basically to support generation they would need to modify pick_best to support constraining. That would make it so they can't optimize the hot loop at their scales. They support super broad output constraints like JSON which apply to everyone, but that leads to other issues (things like chain-of-thought/reasoning perform way worse in structured responses).

replies(1): >>40718510 #
3. joatmon-snoo ◴[] No.40715217[source]
Author here- besides hellovai’s point about the performance bottleneck, it’s a really tricky semantic problem!

LLMs today are really good at producing output that satisfies the very vague metric of “this looks good to a human” but aren’t nearly as good at producing output that satisfies a complex set of syntax and schema constraints. The state space of the former is much larger than the latter, so there’s a lot more opportunity for an LLM to be successful by targeting the state space of “looks good to a human”. Plus, there’s still a lot of room for advancement in multimodality and data quality improvements.

Search problems, in general, deal with this too: it’s easy to provide a good search experience when there are a lot of high-quality candidates, and much harder when there are fewer, because all you have to do is return just a few of the best candidates. (This is partly why Google Drive Search has always sucked compared to Web Search- it’s really hard to guess exactly which document in a 10k-file-Drive a user is looking for, as opposed to finding something on Wikipedia/NYTimes/Instagram that the user might be looking for!)

4. RockyMcNuts ◴[] No.40717249[source]
This is the right question, and the OpenAI API supports requesting JSON with e.g.

client.chat.completions.create(..., response_format={"type": "json_object"})

But the nature of LLMs is stochastic, nothing is 100%. The LLM vendors aren't dummies and train hard for this use case. But you still need a prompt that OpenAI can handle, and validating / fixing the output with an output parser, and retrying.

In my experience asking for simple stuff, requesting json_object is reliable.

with LangChain even! eye-roll, you can't really title the post 'every way' and omit possibly the most popular way with a weak dig. I have literally no idea why they would omit it, it's just a thin wrapper over the LLM APIs and has a JSON output parser. Of course people do use LangChain in production, although there is merit to the idea of using it for research, trying different LLMs and patterns where LangChain makes it easy to try different things, and then using the underlying LLM directly in prod which will have a more stable API and fewer hinky layers.

this post is a little frustrating since it doesn't explain things that a dev would want to know, and omits the popular modules. the comment by resiros offers some good additional info.

5. PheonixPharts ◴[] No.40718510[source]
> things like chain-of-thought/reasoning perform way worse in structured responses

That is fairly well establish to be not true.