Collector best practices: where did this telemetry come from?

When I look at my traces, or when I’m frustrated by the quantity of metric events, I want to know how that data got to me. That’s why the first thing I add to any OpenTelemetry Collector configuration is: attribution.

This came up in End User Discussion Group yesterday; two people said they do this too.

Here’s a processor configuration to add an attribute to every trace span, saying what collector it went through:

- context: resource
- set(attributes["collector.collector"], "${COLLECTOR_NAME}")

Then I add that to the pipeline configuration:

receivers: [otlp]
processors: [transform/labelme, batch]
exporters: [otlp]

Next, I often want to know what receiver a metric came from. I can’t graph a metric properly until I know how it was collected. To add attribution for the receiver, I have to break into multiple pipelines, so each can have its own transform processor.

For an example, check this processor configuration at Also available as a gist.

Each processor definition looks like this:

- context: resource
- set(attributes["meta.signal_type"], "metrics") where attributes["meta.signal_type"]
== nil
- set(attributes["collector.receiver"], "otlp")
- set(attributes["collector.collector"], "daemonset-opentelemetry-collector")
where attributes["collector.collector"] == nil

Here, I define three attributes: the collector, the receiver, and the signal type. Why the signal type? Because traces get “trace” in this field, logs get “log” in this field automatically, and I want metrics to be consistent with that. The collector lets me make it so.

Creating a pipeline per receiver is repetition; when you add a metrics processor you have to add it in three places (in this example). Dan Nelson uses Helm templates to generate their collector configuration, abstracting over this duplication.

