Why are traces so important to OpenTelemetry?
Because traces are so important to cloud-native developers!
I see a lot of people looking at the project with an eye towards replacing metric or logging pipelines, and while it’s a totally valid use case, it’s missing the forest for the trees. An observable system requires hard context between signals – that is to say, it’s not enough to just have some shared attributes and a time window. You need to be able to link your telemetry together using a durable and unique per-request identifier. Tracing is the best way to get that done!
“But aren’t all those traces expensive? Won’t I sample out a lot of them?” Sure — but that’s fine! Using the OpenTelemetry Collector, you can create pipelines that only save the interesting or important traces and put the rest in cheap storage for later analysis or processing.
You can also use traces to create interesting new telemetry streams without having to re-instrument your code. Imagine you have some work that goes on a queue — there’s a lot of questions you might want to ask:
– “How many steps did it take for each job to complete? What do fast and slow jobs have in common, and what’s different about them?”
– “How long is work sitting on the queue waiting to be picked up? What is contributing to long consumer or producer lag?”
– “How frequently are particular events occurring, like re-delivery?”
While you can try to stitch these insights together using a combination of signals, you can answer most of them using traces. Using the Collector, you can send partial traces to a queue, then write some code to turn those traces into counts and histograms with exemplars that point back to the complete trace that’s kept in S3 or other cheap blob storage.
Don’t think of OpenTelemetry as an appliance that just creates telemetry for you, think of it like a box of Legos — your imagination is the only limit!
(reprinted with permission from LinkedIn)