(www.boundaryml.com)

169 points constantinum | 1 comments | 18 Jun 24 04:01 UTC | HN request time: 0.199s | source

Show context

StrauXX ◴[18 Jun 24 07:03 UTC] No.40714872[source]▶

Did I understand the documentation for many of these libraries correctly in that they reprompt until they receive valid JSON? If so I don't understand why one would do that when token masking is a deterministicly verifyable way to get structured output of any kind (as done by Guidance and LMQL for instance). This is not meant to be snarky, I really am curious. Is there an upside to reprompting - aside from easier implementation.

replies(4): >>40714984 #>>40714988 #>>40715185 #>>40715620 #

torginus ◴[18 Jun 24 07:56 UTC] No.40715185[source]▶

>>40714872 #

Isn't reprompting a decent technique? Considering most modern languages are LL(k), that is you need at most k tokens to parse the output (tbf these are programming language tokens not LLM tokens), with k=1 being the most common choice, would it not be reasonable to expect to only have to regenerate only a handful of tokens at most?

replies(1): >>40715256 #

1. joatmon-snoo ◴[18 Jun 24 08:09 UTC] No.40715256[source]▶

>>40715185 #

Author here- yes, reprompting can work well enough if the latency hit is acceptable to you.

If you’re driving user-facing interactions with LLMs, though, and you’re already dealing with >1min latency on the first call (as many of our current users are!), waiting for another LLM call to come back is a really frustrating thing to block your UX on.

↑

Every Way to Get Structured Output from LLMs