Ask Miss O11y: Observability for Internal PaaS?

Q: As part of an internal Platform as a Service, our platform team provisions kubernetes clusters for development teams to use. What observability tools should we pre-deploy into the cluster for each team?

Flags go up! Wait! Don’t do this!

big red NO, in front of three little infrastructures, each with some apps sending telemetry to their own little storage and UI. The teams each look at their own little world.

Well, partly. You need some observability pipeline components nearby, in infrastructure shared with your application, but the pipeline should never end right next to your app.

three application infrastructures, each with their own collector. The collectors send to a fourth infrastructure, this one with telemetry storage and UI. All teams are looking at this shared observability UI. An arrow points at the transmission of telemetry from one infrastructure to another: nice network boundaries.

The pipeline starts with agents and instrumentation, producing data—that part must live with your code and your running apps. You need to get the telemetry out of your apps quickly, so send it to an OpenTelemetry Collector running in the same cluster.

But then get it out! Make the Collector ship that data across a network boundary for storage.

Why?

You want your observability storage & UI in a different failure domain than your applications. When your application is in trouble, that’s when you need observability the most. If the application problems are caused by (or the cause of) infrastructure constraints, don’t let that choke your observability tools.

A kubernetes cluster can get horked by network partitions, running out of resources, DNS difficulties, failed operator upgrades, you name it. Don’t let the problems tripping up your app also leave you blind.

Platform-as-a-service includes infrastructure to gather telemetry data and process it: maybe an OpenTelemetry Operator for automatic instrumentation, and one of more OpenTelemetry Collectors to process it. Separately, the Platform needs a central observability store and UI on different infrastructure.

This is one reason SaaS works so well for observability: your vendor is a completely different failure domain.

Centralized observability has another key benefit: you see the connections across different teams’ apps! Distributed tracing is only useful when all the trace spans come together.

Most software bugs are at the seams, and those seams are revealed when all teams share one tool, backed by interconnected data from all of their apps.

Standardize on OpenTelemetry, and standardize on one storage and UI for observability. Make it easy for teams to ship data out. Do not duplicate storage and UI per provisioned infrastructure.

Ask Miss O11y: Observability for Internal PaaS?

Like this:

Shut Up, Filesystem Instrumentation!

All Traces Point to OpenTelemetry

Ask Miss O11y: Observability for Internal PaaS?

Subscribe to
our newsletter

Ask Miss O11y: Observability for Internal PaaS?

Share this:

Like this:

Shut Up, Filesystem Instrumentation!

All Traces Point to OpenTelemetry

Ask Miss O11y: Observability for Internal PaaS?

Subscribe to our newsletter

Subscribe to
our newsletter