Module 5: The Collector — Architecture and Pipelines

The Assumption to Destroy

“The Collector is optional.”

This is technically true in the same way that a load balancer is optional — you can skip it and have your clients talk directly to every server. It works. Until you need to change something.

Without a Collector, the backend URL lives in every service’s configuration. The sampling strategy lives in every service. The enrichment logic — adding Kubernetes metadata, tagging by deployment environment — lives in every service. When you have four services, this is annoying. When you have forty, it’s a crisis in waiting.

Changing your tracing backend from Jaeger to Grafana Tempo? Update forty configs, redeploy forty services. Want to add tail-based sampling? Add sampling logic to forty services and figure out how to coordinate the sampling decision across all of them. Want to stop exporting user.email in plaintext because legal sent you an email? Find every service that sets that attribute and update them all, then coordinate the rollout.

The Collector doesn’t just make these operations easier. It makes them possible to do correctly — with one config change, zero service restarts, zero risk of one service being out of sync with the rest.

Here’s the mental model for this module:

The Collector is the single control plane for your telemetry data. It is where you receive, transform, filter, sample, and route signals from all services — without touching service code. Once you have it in place, you can change backends, add sampling, enrich spans with infrastructure metadata, and route different signals to different destinations — all from one config file, all without a service deploy.

The Collector is not optional in any serious production setup. Let’s understand exactly how it works.

Why the Collector Exists

Let’s make the cost of not having a Collector concrete.

InsureWatch runs Python claims processing, a Node.js policy management API, and a Java underwriting engine. Before the Collector was added, this is what the topology looked like:

Without Collector:

  claims-api (Python)    ──────────────────► Jaeger
  policy-api (Node.js)   ──────────────────► Jaeger
  underwriting (Java)    ──────────────────► Prometheus
  notify-service         ──────────────────► Jaeger

        (switch backends = redeploy all four services)
        (add sampling = implement in all four services)
        (add k8s metadata = k8s API call from all four services)

Every service holds its own exporter config. Each one talks directly to a backend. Every operational concern — sampling, enrichment, routing — is duplicated across the fleet.

Here’s what the topology looks like with a Collector:

With Collector:

  claims-api (Python)  ─┐
  policy-api (Node.js) ─┤
  underwriting (Java)  ─┼──► Collector ──► Jaeger
  notify-service       ─┤                ──► Grafana Tempo
                        │                ──► S3 (archive)
                        │
            (switch backends = update one Collector config)
            (add sampling = add one processor to one pipeline)
            (add k8s metadata = one k8s_attributes processor)

Every service points to the same OTLP endpoint. They do not know or care what happens downstream. The Collector handles everything else.

This is why the Collector exists: it decouples the instrumentation layer from the backend layer. Services produce OTLP. The Collector decides where it goes.

Core Components: Receivers, Processors, Exporters, Extensions, Connectors

The Collector’s architecture is a set of composable building blocks. Each is defined independently in the config, and then wired together into pipelines in the service section. Understanding what each component does — and which ones map to which part of the pipeline — is foundational to both operating the Collector and passing the OTCA exam.

Receivers — Ingesting Data

Receivers are how data gets into the Collector. They listen for incoming telemetry, pull from configured sources, or read from the host.

otlp — The standard receiver for OTel SDKs. Supports gRPC on port 4317 and HTTP on port 4318. Every service instrumented with an OTel SDK should be configured to export to an OTLP receiver. This is the receiver you’ll use for virtually all greenfield instrumentation.

prometheus — A pull-based receiver. It scrapes /metrics endpoints, just like a standalone Prometheus server would. Use this to bring existing Prometheus-instrumented services into the OTel pipeline without changing their instrumentation. The Collector becomes the scraper.

filelog — Reads log files from disk. For services that write structured logs to files instead of exporting via OTLP — which is most legacy services. If those logs contain trace_id and span_id fields, the Filelog receiver can extract them and create properly correlated OTel log records.

hostmetrics — Collects host-level metrics: CPU utilization, memory usage, disk I/O, network throughput. No application instrumentation required. Run the Collector as a DaemonSet on Kubernetes and every node’s host metrics flow into your pipeline automatically.

jaeger — Accepts data in the legacy Jaeger wire format. This is a migration path: services still using the Jaeger SDK can send to the Collector’s Jaeger receiver, and the Collector exports downstream in OTLP. Zero service code change required during migration.

Processors — Transforming Data In-Flight

Processors sit between receivers and exporters. They are where you mutate, filter, enrich, and shape data before it reaches its destination. They form an ordered chain — the sequence matters.

memory_limiter — Caps the Collector’s memory usage. When the Collector is processing high-volume data, this processor monitors memory pressure and begins dropping or rejecting data before an OOM crash occurs. There is a critical rule about this processor’s placement that we will cover in detail shortly.

batch — Buffers spans, metric data points, and log records and sends them in batches. Without batching, every span generates an individual export call — catastrophic for both the Collector and the backend. The batch processor dramatically reduces the number of exporter calls by grouping data by timeout or batch size, whichever comes first.

attributes — Add, update, delete, hash, or extract attributes on spans, metric data points, or log records. Simple and powerful for common transformations: rename an attribute, remove a sensitive field, insert a static tag.

resource — Modify Resource attributes — the metadata attached to the entire process, not individual signals. Use this to insert deployment.environment, cloud.region, or any attribute that describes the source process and should appear on every signal it emits.

filter — Drop entire spans, metric data points, or log records matching specified criteria. Use this to eliminate health check spans, suppress noisy metrics below a threshold, or discard debug-level logs before they reach your backend.

transform — The most powerful processor. Uses OTTL (OpenTelemetry Transformation Language) to express arbitrary transformations: conditional logic, attribute manipulation, cross-field operations. We will cover OTTL in depth in its own section.

Exporters — Sending Data Out

Exporters send processed data to backends. Like receivers, they are defined independently and referenced in pipeline configs.

otlp — Sends data over gRPC to any OTLP-compatible backend: Grafana Tempo, Jaeger, Uptrace, Honeycomb, Datadog (with OTLP ingestion enabled), your own Collector gateway. This is the primary production exporter for most pipelines.

otlphttp — Same as otlp, but over HTTP. Use this when gRPC is blocked by a firewall, a proxy, or a backend that only supports HTTP. Identical semantics, different transport.

prometheus — Instead of pushing data out, this exporter exposes a /metrics endpoint that Prometheus scrapes. Use this when your metrics backend is a Prometheus server and you do not want to change how it collects data. The Collector becomes a Prometheus-compatible target.

debug — Prints telemetry data to stdout or stderr. Not for production. Invaluable during development and debugging — add it to any pipeline to see exactly what data is flowing through at what point in the pipeline.

file — Writes telemetry data to a local file in JSON or OTLP binary format. Useful for offline testing, building test fixtures, or archiving data locally before a backend is available.

Extensions — Capabilities Outside the Pipeline

Extensions add operational capabilities to the Collector itself. They do not process telemetry data — they operate on the Collector process.

health_check — Exposes an HTTP endpoint (default :13133/health) that returns 200 when the Collector is running. Use this as the liveness and readiness probe in Kubernetes. Without it, your pod has no health signal for the kubelet.

zpages — An in-process debug HTTP server (default :55679) that exposes real-time pipeline statistics: how many spans are flowing, how many are being dropped, current memory usage. Significantly more useful than reading logs when you are trying to understand pipeline behavior.

pprof — Exposes the standard Go pprof profiling endpoint (default :1777). When the Collector is consuming more CPU or memory than expected, pprof lets you capture a flame graph and identify the source.

basicauth — Adds HTTP Basic authentication to receivers. Use this when Collector receiver endpoints need to be protected — for example, when agents are sending data over a network segment that is not fully trusted.

Connectors — Linking Pipelines Together

Connectors are a newer component type, and they are specifically tested on the OTCA exam. A connector joins two pipelines: the output of one pipeline feeds the input of another. They act simultaneously as an exporter (on the upstream pipeline) and a receiver (on the downstream pipeline).

spanmetrics — Reads trace spans and generates RED metrics from them: request rate (spans per second), error rate (error spans per second), and duration (a histogram of span durations). These metrics get fed into a metrics pipeline and ultimately into your metrics backend. This is how you get SLI dashboards from trace data without any additional instrumentation.

routing — Routes signals to different pipelines based on attribute values. Send traces from deployment.environment=production to one pipeline and traces from deployment.environment=staging to another, using a single receiver.

Exam callout: The OTCA exam tests connector knowledge. Know that connectors are distinct from exporters — they connect pipelines within the Collector, not to external backends. The spanmetrics connector specifically generates metrics from trace data, making it a bridge between the traces pipeline and the metrics pipeline.

The Pipeline Model

Every signal in the Collector flows through a pipeline. A pipeline is a named, typed processing chain that wires together one or more receivers, zero or more processors (in order), and one or more exporters.

The pipeline definition lives under service.pipelines. The components it references — receivers, processors, exporters — are defined separately in their own top-level sections. The pipeline is purely the wiring.

Here is a complete, minimal Collector config covering all three signal types for InsureWatch:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
  batch:
    timeout: 5s
    send_batch_size: 1024
  resource:
    attributes:
      - action: insert
        key: deployment.environment
        value: production

exporters:
  otlp:
    endpoint: tempo:4317
    tls:
      insecure: true
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource]
      exporters: [otlp, debug]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]

Read the service.pipelines section as the source of truth. The traces pipeline takes OTLP data, runs it through three processors in order, and sends the result to both the OTLP exporter (Grafana Tempo) and the debug exporter (stdout). The metrics and logs pipelines share the same receiver but have shorter processor chains and only one exporter.

The internal flow through the traces pipeline looks like this:

┌──────────────────────────────────────────────────────────────┐
│  PIPELINE: traces                                            │
│                                                              │
│  [OTLP Receiver]                                             │
│       │                                                      │
│       ▼                                                      │
│  [memory_limiter]  <-- must be first                         │
│       │                                                      │
│       ▼                                                      │
│  [batch]                                                     │
│       │                                                      │
│       ▼                                                      │
│  [resource]  <-- adds deployment.environment                 │
│       │                                                      │
│       ├──────────────────────────┐                           │
│       ▼                          ▼                           │
│  [OTLP Exporter/tempo]    [debug Exporter]                   │
└──────────────────────────────────────────────────────────────┘

When a pipeline has multiple exporters, the data is fanned out — both exporters receive the same data. This is not a split: every span goes to Tempo and every span also goes to debug. The Collector handles this internally with a fanout consumer.

One important nuance: the same component definition can be used in multiple pipelines. The otlp receiver above feeds all three pipelines simultaneously. It is one receiver, one listener, one config block — but it routes incoming data to whichever pipeline is registered to consume from it.

The memory_limiter Rule — Exam Critical

This is a rule you must know, not just understand.

memory_limiter must be the first processor in every pipeline.

Here is why.

When data flows into the Collector, it is deserialized, parsed, and allocated in memory before any processor touches it. If memory_limiter is positioned third in the chain, the data has already been deserialized by processors one and two before the limiter gets to check memory. At that point, the memory has already been consumed. The limiter is acting on a decision that is already made. When a burst of high-volume data arrives, the OOM crash happens before the limiter has any ability to intervene.

When memory_limiter is first, it acts at the earliest possible point — before deserialization of the full payload, before any other processor runs. It can apply back-pressure and reject incoming data before the heap explodes.

In a concrete sense: memory_limiter is a gate, not a cleanup operation. Gates work at the entrance, not in the middle.

Exam callout: The OTCA exam directly tests this. “What is the recommended position for the memory_limiter processor in a Collector pipeline?” The answer is: first in the processor chain, before all other processors. “What is the consequence of placing it later?” Data is deserialized and allocated before the limiter can act, making it ineffective at preventing OOM crashes under high load.

This applies to every pipeline independently. If your Collector runs separate traces, metrics, and logs pipelines — each one needs memory_limiter first.

OTTL — The Transformation Language

OTTL (OpenTelemetry Transformation Language) is the expression language used by the transform processor. It is how you express non-trivial data transformations in the Collector — field renames, conditional mutations, hash operations, cross-attribute moves — without writing a custom processor in Go.

OTTL operates on a context: span, spanevent, metric, datapoint, or log. Statements are evaluated against each item in the pipeline. Functions operate on the item’s fields. Conditions (where clauses) constrain which items a statement applies to.

Here is a realistic OTTL config for InsureWatch, covering traces, metrics, and logs:

processors:
  transform:
    error_mode: ignore
    trace_statements:
      - context: span
        statements:
          # Remove all internal debug attributes added during development
          - delete_matching_keys(attributes, "^internal\\.")
          # Migrate deprecated HTTP attribute to current semantic convention
          - set(attributes["http.request.method"], attributes["http.method"]) where attributes["http.method"] != nil
          - delete_key(attributes, "http.method")
          # Drop health check endpoint spans from trace data entirely
          - drop() where attributes["http.route"] == "/health"
    metric_statements:
      - context: datapoint
        statements:
          # Stamp every metric data point with the environment label
          - set(attributes["environment"], "production")
          # Rename legacy metric attribute to current convention
          - set(attributes["http.request.method"], attributes["method"]) where attributes["method"] != nil
          - delete_key(attributes, "method")
    log_statements:
      - context: log
        statements:
          # Hash PII before it reaches the backend
          - set(attributes["user.email"], SHA256(attributes["user.email"])) where attributes["user.email"] != nil
          # Promote severity to ERROR for specific business-critical conditions
          - set(severity_number, SEVERITY_NUMBER_ERROR) where body == "payment gateway timeout"

The common OTTL functions you need to know:

set(target, value) — Assign a value to a field. set(attributes["env"], "prod").
delete_key(attributes, "key") — Remove a specific attribute by name.
delete_matching_keys(attributes, "regex") — Remove all attributes whose keys match a regular expression. Useful for bulk-removing debug attributes matching a prefix pattern.
keep_keys(attributes, ["key1", "key2"]) — Inverse of delete: remove all attributes except the ones listed.
drop() — Discard the entire item (span, metric data point, or log record). Equivalent to a filter, but expressed inline with other statements.
SHA256(value) — Hash a string value. Use this for PII fields you need to pseudonymize before exporting.
where clause — Conditional. Any statement can be followed by where <condition> to apply it only when the condition evaluates to true.

The error_mode: ignore setting tells the processor to continue if a statement fails (for example, trying to hash a nil field). Use error_mode: propagate in development to surface issues; use ignore in production to avoid dropping data because of a transformation error.

The filter Processor

When you want to drop entire items without the overhead of the transform processor, use filter instead:

processors:
  filter:
    error_mode: ignore
    traces:
      span:
        # Drop health check and readiness probe spans
        - 'attributes["http.route"] == "/health"'
        - 'attributes["http.route"] == "/ready"'
        - 'attributes["http.route"] == "/metrics"'
    metrics:
      datapoint:
        # Drop metrics with no meaningful value
        - 'value_double == 0 and name == "system.filesystem.usage"'
    logs:
      log_record:
        # Drop DEBUG logs before they reach the backend
        - 'severity_number < SEVERITY_NUMBER_INFO'

The filter processor evaluates OTTL conditions. Any item matching a condition is dropped. Simpler than transform when all you need is selective exclusion.

Exam callout: Know the distinction between the filter processor (drops entire items matching criteria) and the transform processor with drop() (same effect, but expressed inline with other transformations). Both use OTTL syntax. The filter processor also supports dropping based on resource attributes, which transform handles differently.

Agent vs Gateway Topology

The Collector supports two deployment archetypes. In practice, production Kubernetes environments use both simultaneously.

Agent (Sidecar or DaemonSet)

An Agent Collector runs close to the data source — either as a sidecar container in the same pod, or as a DaemonSet (one pod per node). Services on the same host export to the local Agent over loopback or localhost. The Agent:

Receives local telemetry from all services on the host
Attaches host-level and pod-level metadata (from hostmetrics or the k8s_attributes processor)
Performs lightweight processing: batching, local enrichment
Forwards to a central Gateway Collector

The key advantage of the Agent pattern is that host and Kubernetes metadata is trivially available — the Agent is on the same node as the pods it serves. Attaching k8s.pod.name, k8s.node.name, k8s.namespace.name to every span is a one-processor config change. If services had to query the Kubernetes API themselves, each service would need service account permissions, and the k8s API server would bear N times the load.

Gateway (Central Deployment)

A Gateway Collector runs as a centrally-accessible deployment — two or three replicas behind a load balancer or a Kubernetes Service. The Gateway:

Receives from Agent Collectors (or directly from services in simpler setups)
Performs heavy-weight processing: tail-based sampling (requires seeing all spans for a trace), aggregation, fan-out to multiple backends
Routes different signal types to different backends
Applies organization-wide policies: cost controls, PII redaction, routing rules

The Gateway is the right place for tail-based sampling because it can see complete traces. The Agent only sees the spans from its local node. A tail-sampling decision requires all spans from a given trace, which means those spans must converge at a central point.

The Hybrid Pattern — Recommended for Kubernetes

┌─── Node 1 ──────────────────────────┐
│  claims-api  ──► Agent Collector    │
│  policy-api  ──┘  (DaemonSet)       │─────┐
└─────────────────────────────────────┘     │
                                            │     ┌──────────────────────────┐
┌─── Node 2 ──────────────────────────┐     ├────►│  Gateway Collector        │──► Backends
│  underwriting ──► Agent Collector   │     │     │  (Deployment, 2 replicas) │
│  notify-svc   ──┘  (DaemonSet)      │─────┘     └──────────────────────────┘
└─────────────────────────────────────┘

Agents handle local enrichment and batching. The Gateway handles routing, sampling, and fan-out. The services themselves point at a fixed local address (localhost:4317 or otel-agent:4317 via Kubernetes DNS). They never need to know about the Gateway’s existence.

Exam callout: The OTCA exam tests Kubernetes-specific deployment knowledge. Know that DaemonSet means one pod per node — the correct pattern for the Agent Collector, ensuring every node’s services have a local Collector. Know that Deployment means a replicated set of pods — the correct pattern for the Gateway Collector, providing horizontal scalability and high availability.

Multi-Pipeline Fan-out

One of the most compelling Collector capabilities is the ability to send the same telemetry data to multiple backends simultaneously — without changing instrumentation. This is how you prove vendor neutrality, run parallel backend evaluations, or maintain redundant pipelines.

Named component instances use a slash syntax: otlp/tempo and otlp/uptrace are two separate instances of the otlp exporter, each with different configuration. Both receive the same data from the pipeline.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
  batch:
    timeout: 5s
    send_batch_size: 1024

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
  otlp/uptrace:
    endpoint: otlp.uptrace.dev:4317
    headers:
      uptrace-dsn: "${env:UPTRACE_DSN}"
  otlp/s3:
    endpoint: otel-s3-proxy:4317
    tls:
      insecure: true

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: 0.0.0.0:55679

service:
  extensions: [health_check, zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/tempo, otlp/uptrace, otlp/s3]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/tempo]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/tempo]

In this config, every trace simultaneously goes to Grafana Tempo (primary visualization), Uptrace (parallel evaluation), and an S3 proxy (long-term archive). Services produce one stream of OTLP. The Collector fans it out. When InsureWatch decides Uptrace isn’t worth the cost, they remove one exporter from the pipeline config and reload the Collector. Zero service changes. Zero redeploy.

This is vendor neutrality made operational, not theoretical.

Exam callout: The slash syntax (exporter/name) allows multiple instances of the same exporter type. This is the standard way to export to multiple backends in a single pipeline. The OTCA exam tests whether you understand that this is configured at the Collector level, not the service level — services remain unaware of how many backends receive their data.

Collector Security Basics

A Collector with unprotected receiver endpoints exposed to a network is an ingest API anyone can write to. That is not a theoretical concern — it means anyone who can reach port 4317 can inject arbitrary spans, fabricate traces, and pollute your observability data.

The baseline security posture for production Collector deployments:

Never expose receivers publicly without authentication. The OTLP receiver should listen on a private network address (10.x.x.x, an internal Kubernetes Service ClusterIP, or a pod-local 127.0.0.1). Services connect to the Collector via the internal network, not via a public IP.

Use TLS for Collector-to-Collector communication. Agent-to-Gateway traffic may cross node boundaries or leave the cluster network. Configure TLS on both the Gateway receiver and the Agent exporter.

Use the basicauth extension for receiver authentication. When services must connect to a Collector over a less-trusted network:

extensions:
  basicauth/server:
    htpasswd:
      inline: |
        otel-agent:$2y$10$<bcrypt-hash>

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        auth:
          authenticator: basicauth/server

Use the headers_setter extension for authenticated exporter calls. When the downstream backend requires an API key or bearer token:

extensions:
  headers_setter/uptrace:
    headers:
      - action: insert
        key: uptrace-dsn
        from_context: uptrace-dsn

exporters:
  otlphttp/uptrace:
    endpoint: https://otlp.uptrace.dev
    auth:
      authenticator: headers_setter/uptrace

Keep API keys in environment variables, never hardcoded in config files. The ${env:VARIABLE_NAME} syntax in Collector YAML expands environment variables at startup.

The Filelog Receiver — Bridging Legacy Logs

Most production services do not send logs via OTLP. They write to stdout or to rotating log files, and a log shipper collects them. The Filelog receiver brings these services into the OTel pipeline without changing their code.

For InsureWatch’s claims processing service, which writes structured JSON logs to /var/log/insurewatch/:

receivers:
  filelog:
    include:
      - /var/log/insurewatch/*.log
    start_at: beginning
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      - type: move
        from: attributes.trace_id
        to: trace_id
      - type: move
        from: attributes.span_id
        to: span_id
      - type: move
        from: attributes.level
        to: severity_text

The operators chain is where log parsing happens. The json_parser reads each line as JSON and promotes the fields to log record attributes. The move operators relocate specific fields from the parsed JSON into the OTel log record’s first-class fields — trace_id, span_id, and severity_text.

This last step is critical. If the claims-api writes trace_id into its JSON logs (which it should if it is instrumented with OTel), the Filelog receiver can extract that value and place it in the proper OTel log record field. The result is a fully-correlated log record: even though the log came from a file, it carries trace context, and the log/trace correlation works in Grafana the same way it would for a service exporting logs via OTLP.

For log data flowing from the Filelog receiver into the broader pipeline, you add it to the service.pipelines.logs receivers list alongside otlp:

service:
  pipelines:
    logs:
      receivers: [otlp, filelog]
      processors: [memory_limiter, batch]
      exporters: [otlp/tempo]

The two log streams — direct OTLP logs and file-sourced logs — merge at the pipeline level. The backend sees one unified log stream.

Debugging the Collector

When telemetry data is not arriving in your backend, the Collector is almost always where to look first. Three questions: is data arriving at the receiver, is it making it through the processors, and is the exporter delivering it?

The debug Exporter

Add the debug exporter to any pipeline temporarily. Every item flowing through that pipeline prints to stdout:

exporters:
  debug:
    verbosity: detailed  # 'basic' for count only, 'detailed' for full payloads

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource]
      exporters: [otlp/tempo, debug]  # add debug here

verbosity: detailed dumps the full span content. You can see exactly what attributes are present, whether enrichment applied, and whether the resource processor added deployment.environment. Remove the debug exporter before treating this config as production.

zpages

The zpages extension exposes a live dashboard at http://<collector-host>:55679/debug/tracez and http://<collector-host>:55679/debug/servicez. It shows:

How many items are currently queued in each pipeline
How many items have been accepted, refused, and exported per exporter
The current state of the Collector’s internal components

This is dramatically faster than log parsing when you need to know “is data flowing” versus “is data being dropped.”

Log Level

Start the Collector with --log-level=debug for verbose output of every internal decision. Do not run this in production under load — the log volume is significant. Use it in staging or during incident investigation when the pipeline behavior is unclear.

Health Check

The health_check extension at :13133/health returns HTTP 200 when the Collector is running and 503 when it has entered a degraded state (for example, when memory_limiter is actively dropping data due to memory pressure). Wire this up as both the liveness and readiness probe in Kubernetes:

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: 0.0.0.0:55679
  pprof:
    endpoint: 0.0.0.0:1777

service:
  extensions: [health_check, zpages, pprof]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/tempo]

Extensions must be listed in service.extensions to be activated. Defining them under extensions: alone is not sufficient.

Putting It Together: A Production-Ready InsureWatch Config

Here is a full Collector config representing a reasonable production baseline for InsureWatch — Agent topology, all three signal types, OTTL-based transformation, security basics, and debug tooling:

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      memory: {}
      disk: {}
      network: {}
  filelog:
    include:
      - /var/log/insurewatch/*.log
    start_at: end
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      - type: move
        from: attributes.trace_id
        to: trace_id
      - type: move
        from: attributes.span_id
        to: span_id

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128
  batch:
    timeout: 5s
    send_batch_size: 1024
  resource:
    attributes:
      - action: insert
        key: deployment.environment
        value: ${env:DEPLOYMENT_ENV}
      - action: insert
        key: collector.version
        value: agent
  filter:
    error_mode: ignore
    traces:
      span:
        - 'attributes["http.route"] == "/health"'
        - 'attributes["http.route"] == "/ready"'
        - 'attributes["http.route"] == "/metrics"'
  transform:
    error_mode: ignore
    trace_statements:
      - context: span
        statements:
          - delete_matching_keys(attributes, "^internal\\.")
    log_statements:
      - context: log
        statements:
          - set(attributes["user.email"], SHA256(attributes["user.email"])) where attributes["user.email"] != nil

exporters:
  otlp/gateway:
    endpoint: otel-gateway:4317
    tls:
      insecure: true

service:
  extensions: [health_check, zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource, filter, transform]
      exporters: [otlp/gateway]
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [memory_limiter, batch, resource]
      exporters: [otlp/gateway]
    logs:
      receivers: [otlp, filelog]
      processors: [memory_limiter, batch, resource, transform]
      exporters: [otlp/gateway]

Note that memory_limiter is first in every processor chain. The Agent forwards everything to the Gateway Collector, which handles fan-out, sampling, and backend routing. Services point at otel-agent:4317 via Kubernetes internal DNS and never need to change.

Exam Domain Coverage: What You Need to Know

The OTCA exam allocates 26% of its questions to the OpenTelemetry Collector domain. This module covers the core of that domain. The specific areas to review before the exam:

Component taxonomy: receivers vs processors vs exporters vs extensions vs connectors — what each does, which layer it operates on
memory_limiter placement: first in every pipeline, why it must be first, what breaks when it is not
Pipeline wiring: how service.pipelines connects independently-defined components, what fan-out means, how named instances work with the slash syntax
Agent vs Gateway: DaemonSet vs Deployment in Kubernetes, which processing belongs where, why tail-based sampling requires a Gateway
OTTL: set, delete_key, keep_keys, delete_matching_keys, where conditions, drop(), SHA256()
Connectors: what they are, the spanmetrics connector specifically, how they differ from exporters
Filelog receiver: that it can extract trace context from log files, enabling log/trace correlation for services that write to files
Extensions vs processors: extensions operate on the Collector process, processors operate on telemetry data in the pipeline

Exam callout: The Collector domain is the highest-weighted single domain on the OTCA at 26%. Questions cover both conceptual knowledge (what does memory_limiter do) and applied scenarios (given this config, what happens to a span with http.route=/health). Read every YAML example in this module until you can predict its behavior without running it.

What’s Next

Lab 3 is the hands-on complement to everything in this module. You will start from a skeleton Collector config and build it out in stages: wiring the initial OTLP pipeline, adding a Filelog-sourced logs pipeline, writing OTTL transformations to drop health check spans and hash a PII field, and finally proving vendor neutrality by configuring a second exporter so the same traces arrive at two backends simultaneously — without touching any service code.

The lab is where the config syntax becomes muscle memory and where the memory_limiter-must-be-first rule stops being a thing you read and becomes a thing you feel when you put it in the wrong position and watch the pipeline behave unexpectedly.

After Lab 3: Module 6 covers sampling — head-based vs tail-based, the probability math behind sampling rates, and how to configure the tailsampling processor in a Gateway Collector to make sampling decisions based on trace attributes, error status, and duration thresholds.