When I look at my traces, or when I’m frustrated by the quantity of metric events, I want to know how that data got to me. That’s why the first thing I add to any OpenTelemetry Collector configuration is: attribution.
This came up in End User Discussion Group yesterday; two people said they do this too.
Here’s a processor configuration to add an attribute to every trace span, saying what collector it went through:
transform/labelme:
trace_statements:
- context: resource
statements:
- set(attributes["collector.collector"], "${COLLECTOR_NAME}")
Then I add that to the pipeline configuration:
traces:
receivers: [otlp]
processors: [transform/labelme, batch]
exporters: [otlp]
Next, I often want to know what receiver a metric came from. I can’t graph a metric properly until I know how it was collected. To add attribution for the receiver, I have to break into multiple pipelines, so each can have its own transform processor.
For an example, check this processor configuration at otelbin.io. Also available as a gist.
Each processor definition looks like this:
transform/kubeletstats:
metric_statements:
- context: resource
statements:
- set(attributes["meta.signal_type"], "metrics") where attributes["meta.signal_type"]
== nil
- set(attributes["collector.receiver"], "otlp")
- set(attributes["collector.collector"], "daemonset-opentelemetry-collector")
where attributes["collector.collector"] == nil
Here, I define three attributes: the collector, the receiver, and the signal type. Why the signal type? Because traces get “trace” in this field, logs get “log” in this field automatically, and I want metrics to be consistent with that. The collector lets me make it so.
Creating a pipeline per receiver is repetition; when you add a metrics processor you have to add it in three places (in this example). Dan Nelson uses Helm templates to generate their collector configuration, abstracting over this duplication.