←back to thread

54 points tudorizer | 7 comments | | HN request time: 1.019s | source | bottom
Show context
nmaley ◴[] No.44373601[source]
I'm in the process of actually building LLM based apps at the moment, and Martin Fowler's comments are on the money. The fact is seemingly insignificant changes to prompts can yield dramatically different outcomes, and the odd new outcomes have all these unpredictable downstream impacts. After working with deterministic systems most of my career it requires a different mindset.

It's also a huge barrier to adoption by mainstream businesses, which are used to working to unambiguous business rules. If it's tricky for us developers it's even more frustrating to end users. Very often they end up just saying, f* it, this is too hard.

I also use LLM's to write code and for that they are a huge productivity boon. Just remember to test! But I'm noticing that use of LLM's in mainstream business applications lags the hype quite a bit. They are touted as panaceas, but like any IT technology they are tricky to implement. People always underestimate the effort necessary to get a real return, even with deterministic apps. With indeterministic apps it's an even bigger problem.

replies(1): >>44402970 #
CraigJPerry ◴[] No.44402970[source]
Some failure modes can be annoying to test for. For example, if you exceed the model’s context window, nothing will happen in terms of errors or exceptions but the observable performance on the task will tank.

Counting tokens is the only reliable defence i found to this.

replies(1): >>44403060 #
1. danielbln ◴[] No.44403060[source]
If you exceed the context window the remote LLM endpoint will throw you an error which you probably want to catch, or rather you want to catch that before it happens and deal with it. Either way, it's not a silent error that goes unnoticed usually, what makes you think that?
replies(2): >>44403270 #>>44403434 #
2. CraigJPerry ◴[] No.44403270[source]
Interesting, the completion return object is documented but theres no error or exception field. In practice the only errors ive seen so far have been on the HTTP transport layer.

It would make sense to me for the chat context to raise an exception. Maybe i should read the docs further…

3. diggan ◴[] No.44403434[source]
> If you exceed the context window the remote LLM endpoint will throw you an error which you probably want to catch

Not every endpoint works the same way, I'm pretty sure LM Studio's OpenAI-compatible endpoints will silently (from the clients perspective) truncate the context, rather than throw an error. It's up to the client to make sure the context fits in those cases.

OpenAI's own endpoints do show an error and refuses if you exceed the context length though. I think I've seen others use the "finish_reason" attribute too to signal the context length was exceeded, rather than setting an error status code on the response.

Overall, even "OpenAI-compatible" endpoints often aren't 100% faithful reproductions of the OpenAI endpoints, sadly.

replies(1): >>44403624 #
4. danielbln ◴[] No.44403624[source]
That seems like terrible API design to just truncate without telling the caller. Anthropic, Google and OpenAI all will fail very loudly if you exceed the context window, and that's how it should be. But fair enough, this shouldn't happen anyway and the context should be actively handled before it blows up either way.
replies(2): >>44404066 #>>44404527 #
5. dist-epoch ◴[] No.44404066{3}[source]
It's complicated, for example some models (o3) will throw an error if you set temperature.

What do you do if you want to support multiple models in your LLM gateway? Do you throw an error if a user sets temperature for o3, thus dumping the problem on them? Or just ignore it, but potentially creating confusion because temperature will seem to not work for some models?

replies(1): >>44404385 #
6. danielbln ◴[] No.44404385{4}[source]
I'm a big fan of fail early and fail loudly.
7. diggan ◴[] No.44404527{3}[source]
> That seems like terrible API design to just truncate without telling the caller

Agree, confused me a lot the first time I encountered it.

It would be great if implementations/endpoints could converge, but with OpenAI moving to the Responses API rather than ChatCompletion, yet the rest of the ecosystem seemingly still implementing ChatCompletion with various small differences (like how to do structured outputs), it feels like it's getting further away, not closer...