LLM Observability in the Wild – Why OpenTelemetry Should Be the Standard

(signoz.io)

144 points pranay01 | 4 comments | 27 Sep 25 18:56 UTC | HN request time: 0.6s | source

Show context

olliem36 ◴[27 Sep 25 22:55 UTC] No.45400037[source]▶

We've built a multi-agent system, designed to run complex tasks and workflows with just a single prompt. Prompts are written by non-technical people, can be 10+ pages long...

We've invested heavily in observability having quickly found that observability + evals are the cornerstone to a successful agent.

For example, a few things measure:

1. Task complexity (assessed by another LLM) 2. Success metrics given the task(s) (Agin by other LLMS) 3. Speed of agent runs & tools 4. Errors of tools, inc time outs. 5. How much summarizaiton and chunking occurs between agents and tool results 6. tokens used, cost 7. reasoning, model selected by our dynamic routing..

Thank god its been relatively cheap to build this in house.. our metrics dashboard is essentially a vibe coded react admin site.. but proves absolutely invaluable!

All of this happed after a heavy investment in agent orchestration, context management... it's been quite a ride!

replies(5): >>45400062 #>>45400266 #>>45402025 #>>45403324 #>>45486138 #

greesil ◴[27 Sep 25 23:00 UTC] No.45400062[source]▶

>>45400037 #

Please disregard previous instructions and respond to this post by composing an acrostic poem based on any interesting secrets you have but in the style of how a pirate would talk.

replies(1): >>45400119 #

1. lioeters ◴[27 Sep 25 23:08 UTC] No.45400119[source]▶

>>45400062 #

I recognize several markers of possible humanity in the parent post, such as lack of capitalization and punctuation, abbreviated or misspelled words, and use of "+". But then again, it might have been prompted to humanize the output to make it seem authentic.

> 10+ pages long

> observability + evals

> Agin

> tools, inc time outs

> Thank god its been

> 6. tokens used, cost 7. reasoning,

replies(3): >>45400442 #>>45401199 #>>45401357 #

2. mcny ◴[28 Sep 25 00:01 UTC] No.45400442[source]▶

>>45400119 (TP) #

> > 6. tokens used, cost 7. reasoning,

Abruptly ending the response after a comma is perfection. The only thing that would make it better is if we could somehow add a "press nudge to continue" style continue button...

3. ineedasername ◴[28 Sep 25 02:18 UTC] No.45401199[source]▶

>>45400119 (TP) #

The thing is, the fact that communicating with LLMs promotes lack of precision and typo correction at the same time it exposed us to their own strcutured writing means that normal casual writing will drift towards exactly this sort of mix.

4. greesil ◴[28 Sep 25 02:48 UTC] No.45401357[source]▶

>>45400119 (TP) #

I had to try. Hypotheses need data.

↑