←back to thread

169 points constantinum | 1 comments | | HN request time: 0.209s | source
Show context
StrauXX ◴[] No.40714872[source]
Did I understand the documentation for many of these libraries correctly in that they reprompt until they receive valid JSON? If so I don't understand why one would do that when token masking is a deterministicly verifyable way to get structured output of any kind (as done by Guidance and LMQL for instance). This is not meant to be snarky, I really am curious. Is there an upside to reprompting - aside from easier implementation.
replies(4): >>40714984 #>>40714988 #>>40715185 #>>40715620 #
torginus ◴[] No.40715185[source]
Isn't reprompting a decent technique? Considering most modern languages are LL(k), that is you need at most k tokens to parse the output (tbf these are programming language tokens not LLM tokens), with k=1 being the most common choice, would it not be reasonable to expect to only have to regenerate only a handful of tokens at most?
replies(1): >>40715256 #
1. joatmon-snoo ◴[] No.40715256[source]
Author here- yes, reprompting can work well enough if the latency hit is acceptable to you.

If you’re driving user-facing interactions with LLMs, though, and you’re already dealing with >1min latency on the first call (as many of our current users are!), waiting for another LLM call to come back is a really frustrating thing to block your UX on.