Q: As part of an internal Platform as a Service, our platform team provisions kubernetes clusters for development teams to use. What observability tools should we pre-deploy into the cluster for each team?
Flags go up! Wait! Don’t do this!

Well, partly. You need some observability pipeline components nearby, in infrastructure shared with your application, but the pipeline should never end right next to your app.

The pipeline starts with agents and instrumentation, producing data—that part must live with your code and your running apps. You need to get the telemetry out of your apps quickly, so send it to an OpenTelemetry Collector running in the same cluster.
But then get it out! Make the Collector ship that data across a network boundary for storage.
Why?
You want your observability storage & UI in a different failure domain than your applications. When your application is in trouble, that’s when you need observability the most. If the application problems are caused by (or the cause of) infrastructure constraints, don’t let that choke your observability tools.
A kubernetes cluster can get horked by network partitions, running out of resources, DNS difficulties, failed operator upgrades, you name it. Don’t let the problems tripping up your app also leave you blind.
Platform-as-a-service includes infrastructure to gather telemetry data and process it: maybe an OpenTelemetry Operator for automatic instrumentation, and one of more OpenTelemetry Collectors to process it. Separately, the Platform needs a central observability store and UI on different infrastructure.
This is one reason SaaS works so well for observability: your vendor is a completely different failure domain.
Centralized observability has another key benefit: you see the connections across different teams’ apps! Distributed tracing is only useful when all the trace spans come together.
Most software bugs are at the seams, and those seams are revealed when all teams share one tool, backed by interconnected data from all of their apps.
Standardize on OpenTelemetry, and standardize on one storage and UI for observability. Make it easy for teams to ship data out. Do not duplicate storage and UI per provisioned infrastructure.





