LLM Observability in the Wild – Why OpenTelemetry Should Be the Standard

1. CuriouslyC ◴[27 Sep 25 19:37 UTC] No.45398725[source]▶

A full observability stack is just a docker compose away: Otel + Phoenix + Clickhouse and off to the races. No excuse not to do it.

replies(3): >>45398832 #>>45399127 #>>45399917 #

2. perfmode ◴[27 Sep 25 19:52 UTC] No.45398832[source]▶

>>45398725 (TP) #

Phoenix as in Elixir?

replies(1): >>45398972 #

3. mindcrime ◴[27 Sep 25 20:13 UTC] No.45398972[source]▶

>>45398832 #

I imagine they meant:

https://github.com/Arize-ai/phoenix

4. pranay01 ◴[27 Sep 25 20:36 UTC] No.45399127[source]▶

>>45398725 (TP) #

one of the cases we have observed is that Phoenix doesn't completely stick to OTel conventions.

More specifically, one issue I observed is how it handles span kinds. If you send via OTel, the span Kinds are classified as unknown

e.g. The Phoneix screenshot here - https://signoz.io/blog/llm-observability-opentelemetry/#the-...

replies(3): >>45399660 #>>45399902 #>>45400349 #

5. CuriouslyC ◴[27 Sep 25 21:53 UTC] No.45399660[source]▶

>>45399127 #

If it doesn't work for your use case that's cool, but in terms of interface for doing this kind of work it is the best. Tradeoffs.

replies(1): >>45399697 #

6. 7thpower ◴[27 Sep 25 22:00 UTC] No.45399697{3}[source]▶

>>45399660 #

I’ve found phoenix to be a clunky experience and have been far happier with tools like langfuse.

I don’t know how you can confidently say one is “the best”.

replies(1): >>45399886 #

7. a_khan ◴[27 Sep 25 22:28 UTC] No.45399886{4}[source]▶

>>45399697 #

Curious what you prefer from langfuse over Phoenix!

replies(1): >>45431387 #

8. ijk ◴[27 Sep 25 22:31 UTC] No.45399902[source]▶

>>45399127 #

Spans labeled as 'unknown' when I definitely labeled them in the code is probably the most annoying part of Phoenix right now.

replies(1): >>45400428 #

9. dcreater ◴[27 Sep 25 22:32 UTC] No.45399917[source]▶

>>45398725 (TP) #

Is phoenix really the no-brainer go to? There are so many choices - langfuse, w&b etc.

replies(2): >>45400143 #>>45402284 #

10. CuriouslyC ◴[27 Sep 25 23:12 UTC] No.45400143[source]▶

>>45399917 #

I suppose it depends on the way you approach your work. It's designed with an experimental mindset so it makes it very easy to keep stuff organized, separate, and integrate with the rest of my experimental stack.

If you come from an ops background, other tools like SigNoz or LangFuse might feel more natural, I guess it's just a matter of perspective.

11. cephalization ◴[27 Sep 25 23:47 UTC] No.45400349[source]▶

>>45399127 #

Phoenix ingests any opentelemetry compliant spans into the platform, but the UI is geared towards displaying spans whose attributes adhere to “openinference” naming conventions.

There are numerous open community standards for where to put llm information within otel spans but openinference predates most of em.

12. pranay01 ◴[27 Sep 25 23:58 UTC] No.45400428{3}[source]▶

>>45399902 #

Yes, it is happening because OpenInference assumes these span kind values https://github.com/Arize-ai/openinference/blob/b827f3dd659fc...

Anything which doesn't fall in other span kinds is classified as `unknown`

For reference, these are span kinds which opentelemetry emits - https://github.com/open-telemetry/opentelemetry-python/blob/...

13. jkisiel ◴[28 Sep 25 06:54 UTC] No.45402284[source]▶

>>45399917 #

Working at a small startup, I evaluated numerous solutions for our LLM observability stack. That was early this year (IIRC Langfuse was not open source then) and Phoenix was the only solution that worked out of the box and seemed to have the right 'mindset', i.e. using Otel and integrating with Python and JS/Langchain. Wasted lots of time with others, some solutions did not even boot.

replies(1): >>45403147 #

14. dcreater ◴[28 Sep 25 10:03 UTC] No.45403147{3}[source]▶

>>45402284 #

This is exactly what I was looking for! An actual practitioners experience from trials! Thanks.

Is it fair to assume you are happy with it?

15. 7thpower ◴[30 Sep 25 21:19 UTC] No.45431387{5}[source]▶

>>45399886 #

Sorry for the delayed response!

The main thing was wrestling with the instrumentation vs the out of the box langfuse python decorator that works pretty well for basic use cases.

It’s been a while but I also recall that prompt management and other features in Phoenix weren’t really built out (probably not a goal for them, but I like having that functionality under the same umbrella).