Custom instrumentation: how much is just right?

|

, , ,

Instrumentation: usually there’s not enough, and you can’t find the problem. Sometimes there’s too much, and you can’t find the problem–like with great piles of log strings. That doesn’t usually happen with traces, but we do see “too much” as in “too much work” (or “too expensive”).

Custom instrumentation, where you create your own spans or add your own business-specific data, takes work. It is code that you have to write and maintain. There is a sweet spot in effort (and cost) for how much custom instrumentation you want.

Case 1: standard frameworks

If your app uses standard frameworks, then automatic instrumentation will create the spans you need. The sweet spot is: add several attributes to the active span. Include what’s important to your service. Top priority: customer ID (tenant ID, user ID, whatever identifies your customer). When a problem occurs, you can now ask “who is affected?”

Add any other attributes that might someday help in debugging. Order size, product IDs, region, price plan, payment provider, whatever might be relevant. The more attributes linked together in a span, the more value each one has! If you’re sending spans to Honeycomb, each attribute costs nothing.

Case 2: your own frameworks

If you use your own frameworks, then you get to add instrumentation to them. This means creating your own spans.

The sweet spot for span creation is: network hops (incoming and outgoing) and process boundaries (like database calls). Whenever you receive a request or call out to an external system, create a span to represent that work.

In outgoing calls, propagate the trace ID and span ID in request metadata. On incoming calls, look for a trace and span ID, and continue that story in your spans. When you control both sides of the interface, you can make the traces great.

Add attributes to those spans that follow OpenTelemetry conventions. Add attributes that describe what’s relevant to your business. For instance, if your service is multi tenant, then tenant ID is essential. Add all the relevant fields that your framework has access to, and then let each service add what is specific to it.

Limitations

What I’ve described here applies to the back end; request/response structure is the sweet spot for traces. If your processing is asynchronous, consider span links. For client side… watch this space, we’re working on it.

If it’s cost you’re worried about, then fewer spans with more information on them is usually best. If you’re using Honeycomb, then add attributes and metrics willy nilly. Watch out for span events; they cost as much as a whole span. Also, consider sampling: a few detailed examples (plus every problem) is the optimal number of stories to see.

For more personalized advice, Ask Miss O11y in a zoom.

Latest Articles

Subscribe to
our newsletter

OpenTelemetry in Practice