LLM Observability in the Wild – Why OpenTelemetry Should Be the Standard

(signoz.io)

144 points pranay01 | 1 comments | 27 Sep 25 18:56 UTC | HN request time: 0.2s | source

Show context

olliem36 ◴[27 Sep 25 22:55 UTC] No.45400037[source]▶

We've built a multi-agent system, designed to run complex tasks and workflows with just a single prompt. Prompts are written by non-technical people, can be 10+ pages long...

We've invested heavily in observability having quickly found that observability + evals are the cornerstone to a successful agent.

For example, a few things measure:

1. Task complexity (assessed by another LLM) 2. Success metrics given the task(s) (Agin by other LLMS) 3. Speed of agent runs & tools 4. Errors of tools, inc time outs. 5. How much summarizaiton and chunking occurs between agents and tool results 6. tokens used, cost 7. reasoning, model selected by our dynamic routing..

Thank god its been relatively cheap to build this in house.. our metrics dashboard is essentially a vibe coded react admin site.. but proves absolutely invaluable!

All of this happed after a heavy investment in agent orchestration, context management... it's been quite a ride!

replies(5): >>45400062 #>>45400266 #>>45402025 #>>45403324 #>>45486138 #

apwell23 ◴[27 Sep 25 23:33 UTC] No.45400266[source]▶

>>45400037 #

> Prompts are written by non-technical people, can be 10+ pages long...

what are these agents doing. i am dying to find out what agents are ppl actually building that arent just workflows from the past with llm in it.

what is dynamic routing?

replies(2): >>45400474 #>>45480344 #

1. olliem36 ◴[05 Oct 25 09:57 UTC] No.45480344[source]▶

>>45400266 #

I think the best way to explain this is to provide an example.

Scenario: A B2B fintech company processes chargebacks on behalf of merchants, this involves dozens of steps which depend on the type & history of the merchant, dispute cardholder. It also involves collection of evidence from the card holder.

There's a couple of key ways that LLMs make this different from manual workflows:

Firstly, the automation is built from a prompt. This is important as it means people who are non-technical and are not necessarily comfortable with non-code tools to pull data from multiple places into a sequence. This increases the adoption of automations as the effort to build & deploy them is lower. In this example, there was no automation in place despite the people who 'own' this process wanting to automate it. No doubt there's a number of reasons for this, one being they found todays workflow builders too hard to use.

Secondly, the collection of 'evidence' to counter a chargeback can be nuanced, which often requiring back and forth with people to explain what is needed and check the evidence is sufficient against a complicated set of guidelines. I'd say a manual submission form that guides people through evidence collection with hundreds of rules subject to the conditions of the dispute and the merchant could do this, but again, this is hard to build and deploy.

Lastly, LLMs monitors the success of the workflow once it's deployed, to help those who are responsible for it measure its impact and effectiveness.

The end result is that a business has successfully built and deployed an automation that they did not have before.

To answer your second question, dynamic routing describes the process of evaluating how complicated a prompt or task is, and then selecting an LLM that's 'best fit' to process it. For example, short & simple prompts should usually get routed to faster but less intelligent LLMs. This typically makes users happier as they get results more quickly. However, more complex prompts may require larger, slower and more intelligent LLMs and techniques such as 'reasoning'. The result will be slower to produce, but will be likely be far more accurate compared to a faster model. In the above example, a larger LLM with reasoning would probably be used.

↑