The Observability Layer Cake

You’ve probably heard of an ‘observability stack’, but what about an ‘observability layer cake’? Let’s talk about it!

Historically, monitoring and observability tools have been tightly coupled and vertically integrated. From instrumentation to storage, query to visualization, everything is bound up together in a mostly seamless (to you, at least) way. This is a pretty good deal for people building observability tools – if you control the entire pipeline, end-to-end, you can do a lot of optimization. You know exactly what data is available in order to build out visualizations and workflows. You can also make some pretty intelligent decisions around pricing and billing — people might not know how many spans or metrics their system emits, but they probably know how many servers or nodes they’re running. What’s interesting to me is that this model keeps perpetuating itself — even newer entrants to the observability space (or vendors that are diversifying) are adopting it.

I don’t think it’s great for users in the long run. Why? Well, first off, it severely reduces your option space. Innovative tools get passed over because of duplicate ‘work’ being done by all-in-one stacks. Your development teams or SREs are artificially limited in observability use cases due to pricing or packaging concerns. Second, this approach results in duplicate spend and work, but along team and use case boundaries. Your security team uses whatever SIEM solution they like, your SRE’s are looking at infra metrics and logs in something else, and your frontend/client engineers are capturing errors in yet another tool… but these are all, roughly, the same underlying data being captured at different parts of the stack and then duplicated to multiple storage and analysis backends.

Why is this the case? The key driver is instrumentation. Without data from observability sources, you’ve got nothing; That’s why instrumentation and logging agents are so popular. OpenTelemetry changes this by making instrumentation a built-in feature of cloud-native software. If all of your libraries, frameworks, and infrastructure emit OpenTelemetry data, then suddenly you’re not reliant on these agents to collect and format the data for you. What’s valuable in that case? Transforming and pipelining data, rather than simply collecting it.

If you think of the ‘old way’ as a stack, the ‘new way’ looks more like a set of loosely coupled layers tied together with open APIs and standards. OpenTelemetry provides the instrumentation, collection, and transformation. The next step is storage, then query, then finally visualization and workflows. This is where I see innovation happening over the next few years, as organizations consolidate around OpenTelemetry and start to think about how to manage their observability data lakes.

Want to learn more about how OpenTelemetry helps you manage complex systems? I wrote about it here.

The Observability Layer Cake

Like this:

Shut Up, Filesystem Instrumentation!

All Traces Point to OpenTelemetry

Ask Miss O11y: Observability for Internal PaaS?

Subscribe to
our newsletter

The Observability Layer Cake

Share this:

Like this:

Shut Up, Filesystem Instrumentation!

All Traces Point to OpenTelemetry

Ask Miss O11y: Observability for Internal PaaS?

Subscribe to our newsletter

Subscribe to
our newsletter