Introduction

OpenTelemetry is a CNCF project/organisation which has the following aim :

High-quality, ubiquitous, and portable telemetry to enable effective observability

The organisations maintains 3 main projects “categories” :

  • a protocol, OTLP
  • a collector, OpenTelemetry Collector, including Helm charts and other cloud-native stuff (more on that below)
  • a set of SDK and auto-instrumentations library

Or at list as I understand, the organisations has 78 repositories at the time of writing.

Protocol

The protocol supports the transmission of 3 (soon 4) types of signal:

  • metrics: think CPU/Memory/storage… basically numbers that fluctuate over time
  • logs: your app journal of what it’s doing and how it’s going
  • traces: in a distributed, microservices-oriented system, states which services contributed to respond to the user request
  • profiles (not production ready): the execution stack a request generated

Because it’s a protocol, several software can interact with it.

Collector

A collector does nothing more than collecting metrics/logs/traces, process it to modify or transform (samples/filters/aggregate) and send it to a backend.

The official collector is called OpenTelemetry Collector. An opponent could be Grafana Alloy.

Sampling

Sampling is a statistic concept (from now on, double check my sayings) that aims to keep only a part of the telemetry but to still be representative of all requests. In OTel collector, you can configure the percentage of telemetry to keep based on a set of criteria (e.g. : the request failed or succeeded)

Backend

Each telemetry type has a different backend. I’ll be talking only about projects from Grafana stack.

From my understanding :

  • metrics = Prometheus
  • logs = Grafana Loki
  • traces = Grafana Tempo
  • profiles = Pyroscope

Kubernetes

Instrumentation

The organisation provide an operator to run collectors based on a OpenTelemetryCollector CRD. The same operator also provide a Instrumentation CRD that allows to inject an agent in the runtime for compatible backends (read: a language with some kind of VM, like Java and Python). Languages that can be auto-instrumentated: .NET, Java, Node.js, Python, and Go. (TODO: checks CRDs names)

This is called auto-instrumentations. It doesn’t required any code changes under certain conditions :

  1. The framework handling requests supports it
  2. The language has some kind of runtime which can be extended (in java, it uses a Java Agent)

Any other type of app will have to modify a bit of the code ()

It has some drawbacks:

  • for some languages, a rooted privileged container is required (the Go getting started docs states it) (TODO: Check if this is really an issue)
  • the labels associated with the request is chose by the framework’s defaults and by association you don’t control which attribute you (don’t) need (potential performance overhead). You might want to add attributes relevant to your business
  • language must have a runtime
  • the framework must support OTLP

The remedy to those drawbacks mens manual instrumentation (read: you create your spans yourself, means more coding).

Rust instrumentation

As of today (2024-12-23), no Rust library implements instrumentation. You’re on your own 🩲.

Go instrumentation

The automatic instrumentation in Go relies on eBBF and requires a privileged container. TODO: check what container should be privileged, and if it’s a security issue.

Apparently you can also tell well-known components to export signals (metrics/logs/traces), like nginx. TODO: Check how and what.

Deploy

There are 3 patterns to connect applications and backend :

Agent and gateway can be combined

Kubernetes Attributes Processor

See the documentation of the otel col components.

The Kubernetes Attributes Processor is one of the most important components for a collector running in Kubernetes. Any collector receiving application data should use it. Because it adds Kubernetes context to your telemetry, the Kubernetes Attributes Processor lets you correlate your application’s traces, metrics, and logs signals with your Kubernetes telemetry, such as pod metrics and traces.