←back to thread

116 points ndhandala | 1 comments | | HN request time: 0s | source
Show context
drivenextfunc ◴[] No.45085422[source]
Has anyone used OpenTelemetry for long-running batch jobs? OTel seems designed for web apps where spans last seconds/minutes, but batch jobs run for hours or days. Since spans are only submitted after completion, there's no way to track progress during execution, making OTel nearly unusable for batch workloads.

I have a similar issue with Prometheus -- not great for batch job metrics either. It's frustrating how many otherwise excellent OSS tools are optimized for web applications but fall short for batch processing use cases.

replies(5): >>45086098 #>>45087174 #>>45092745 #>>45103972 #>>45107467 #
1. nucleardog ◴[] No.45103972[source]
Nothing running for days, but sometimes a half hour or so. When the process kicks off it starts a trace, but individual steps of the process create separate spans within that trace (and sometimes further nested spans) that don't run the entire length of the job. As the job progresses, the spans and their related events, logs, etc all appear.

I think this does highlight, to me, the biggest weakness of OTel--the actual documentation and examples for "how to solve problems with this" really suck.