The Assumption to Destroy
“Context propagation is automatic.”
This is half-true, and the half that’s false is exactly where incidents become undiagnosable.
Propagation is automatic within a single process. When code in your Python API calls another function in the same Python API, OTel knows which span is active and sets the parent automatically. You don’t think about it.
The moment a network boundary appears — an HTTP call to another service, a message published to Kafka, an async job dispatched to a worker — propagation requires something to carry the context across. OTel handles the mechanism of injecting and extracting that context. Whether the context actually survives the journey depends on everything in between: proxies, load balancers, API gateways, message broker configurations, custom HTTP clients.
In InsureWatch, the Python API calls the Java claims service over HTTP. In a correctly configured system, you get one trace that spans both services — you can see the full causal chain. With propagation broken, you get two disconnected traces — the Python side and the Java side — with no link between them. At 2 AM during an incident, that’s the difference between a 5-minute diagnosis and a 45-minute mystery.
Here is how propagation actually works, where it breaks, and how to fix it when it does.
What Context Is
Before propagation can happen, there has to be something to propagate.
Context in OTel is a container that holds the current trace state: the trace ID, the active span ID, and any baggage key-value pairs. It exists in-process as an implicit, thread-local (or async-safe equivalent) object that the OTel API manages for you.
When you create a span, OTel attaches it to the current context. When you start a child span, OTel reads the current context to find the parent. This is why, within a single process, parent/child relationships are automatic — the context is right there in memory.
Single Process (Python API)
─────────────────────────────────────────────────────────
Context: { trace_id: "7f3b...", span_id: "a1b2..." }
│
├─► handle_request() ← root span, sets context
│ │
│ ├─► validate_input() ← child, reads context automatically
│ │
│ └─► fetch_policy() ← child, reads context automatically
│
└── Context automatically available to all code on this thread
Cross a network boundary and the context doesn’t travel automatically. It needs to be serialized into the outbound request and deserialized from the inbound request on the other side. That’s propagation.
W3C TraceContext: The Wire Format
OTel didn’t invent the propagation format. It implements the W3C TraceContext specification, which is a web standard ratified in 2020. The format is two HTTP headers:
traceparent — the required header. Carries trace identity.
traceparent: 00-7f3b8a2c4d5e6f7a8b9c0d1e2f3a4b5c-a1b2c3d4e5f6a7b8-01
│ │ │ │
│ │── trace-id (128-bit hex) │ │
version span-id flags
(64-bit hex) (01 = sampled)
Four fields, hyphen-separated:
- version: always
00in current spec - trace-id: 128-bit, 32 hex characters — the identifier shared across the entire distributed trace
- parent-id: 64-bit, 16 hex characters — the span ID of the caller’s active span (this becomes the parent of the new span created by the callee)
- flags: 8-bit field. The only current flag:
01= sampled (this trace is being recorded).00= not sampled.
Exam callout: Know the
traceparentformat cold. The exam tests the field order, what each field contains, the sizes (128-bit trace ID, 64-bit span ID), and what the flags byte means. “What does a flags value of01mean?” → the trace is being sampled.
tracestate — the optional header. Carries vendor-specific trace state.
tracestate: vendor1=value1,vendor2=value2
tracestate is a list of key-value pairs, vendor-namespaced. OTel uses it to carry additional sampling decisions or metadata that individual vendors need to propagate. You don’t usually interact with it directly — OTel manages it — but you need to know it exists and what it’s for.
Exam callout: The OTCA exam distinguishes between
traceparent(required, standardized format, carries trace identity) andtracestate(optional, vendor-specific key-value pairs, carries additional state). Know which is which.
How OTel Propagates Context: Inject and Extract
OTel propagation is built around two operations:
Inject — before making an outbound call, serialize the current context into the carrier (HTTP headers, message attributes, etc.)
Extract — at the start of an inbound request, deserialize context from the carrier and restore it as the current context
Service A (Python) Service B (Java)
───────────────── ─────────────────
active span: a1b2c3d4
Inject:
headers["traceparent"] =
"00-7f3b...-a1b2c3d4-01"
HTTP POST /api/claims ─────────────► Extract:
traceparent: 00-7f3b...-a1b2c3d4-01 context = parse(headers["traceparent"])
new span: parent = a1b2c3d4
trace_id: 7f3b... (same trace!)
The result: the Java service’s span has parent_span_id = a1b2c3d4, linking it to the Python service’s span. One trace. Both services visible in the same trace view.
The OTel SDK does this automatically when you use an instrumented HTTP client. Flask and requests (Python), Express and axios (Node.js), Spring and RestTemplate (Java) — all of these have OTel instrumentation libraries that call inject/extract without you writing a line of propagation code.
The problem is when something in the middle strips the headers.
Where Propagation Breaks
This is where the real knowledge lives, and it’s what the OTCA exam tests. Propagation is not a guarantee — it’s a contract that requires every hop in the chain to honor it.
Load Balancers and Reverse Proxies
Many load balancers are configured to strip unrecognized headers for security reasons. traceparent is not a standard header to an unconfigured nginx or HAProxy. If the proxy strips it, the downstream service receives a request with no context — it creates a new trace, disconnected from the upstream.
Fix: Explicitly whitelist traceparent and tracestate in your proxy’s header passthrough config.
# nginx — preserve trace headers
proxy_pass_header traceparent;
proxy_pass_header tracestate;
API Gateways
AWS API Gateway, Kong, and similar products sometimes transform or filter headers. The behavior depends on configuration. If your traces are fragmenting at the boundary between an API gateway and a downstream service, the gateway is the first place to check.
Message Queues
HTTP propagation is straightforward — headers are a natural carrier. Message queues don’t have headers by default. Context has to be injected into message attributes or metadata by the producer, and extracted by the consumer.
OTel has instrumentation for common brokers (Kafka, RabbitMQ, SQS), but if you’re using a broker without an official OTel instrumentation library, you need to do this manually.
from opentelemetry import propagate
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
# Producer — inject context into message attributes
carrier = {}
propagate.inject(carrier)
# carrier now contains {"traceparent": "00-7f3b...-a1b2c3d4-01"}
message_attributes = {
"traceparent": {"DataType": "String", "StringValue": carrier["traceparent"]},
}
sqs.send_message(
QueueUrl=queue_url,
MessageBody=json.dumps(payload),
MessageAttributes=message_attributes,
)
# Consumer — extract context from message attributes
carrier = {
"traceparent": message["MessageAttributes"]["traceparent"]["StringValue"]
}
ctx = propagate.extract(carrier)
with tracer.start_as_current_span("process_claim", context=ctx):
# this span is now a child of the producer's span
process(message)
Exam callout: Message queues are a common propagation break point. The exam tests whether you know that context must be explicitly injected into message attributes and extracted on the consumer side — it does not happen automatically unless there is an OTel instrumentation library for that specific broker.
Async Boundaries and Thread Pools
Within a process, OTel uses a context API that is async-safe in modern runtimes (Python asyncio, Node.js event loop, Java virtual threads). But some patterns break this:
- Thread pools: if you submit work to a thread pool, the context from the submitting thread is not automatically copied to the worker thread. You need to propagate it explicitly.
- Fire-and-forget patterns: starting a background task without carrying context is a propagation break. The background work runs in a new trace with no parent.
- Callbacks: in callback-heavy code (older Node.js patterns), context can get lost between the callback registration and invocation.
The general rule: any time you move work off the current execution context, ask whether OTel context traveled with it.
Custom HTTP Clients
Auto-instrumentation wraps known HTTP client libraries. If you build a custom HTTP client (or use a lesser-known library without OTel support), inject/extract won’t happen automatically.
# Manual inject when using a custom HTTP client
from opentelemetry import propagate
headers = {}
propagate.inject(headers)
# headers now has traceparent set
response = my_custom_http_client.post(url, headers=headers, body=payload)
Baggage: Propagating Application Data
Beyond trace identity, OTel supports Baggage — a key-value store that propagates alongside the trace context.
Baggage uses its own header (baggage) and travels through the same inject/extract mechanism. Unlike trace context (which is opaque to application code), baggage is readable and writable from application code.
from opentelemetry import baggage, context
# Set baggage in Service A
ctx = baggage.set_baggage("tenant.id", "acme-corp")
ctx = baggage.set_baggage("feature.flags", "new-pricing-model")
# This baggage propagates to all downstream services in the trace
# In Service B:
tenant = baggage.get_baggage("tenant.id") # → "acme-corp"
What baggage is for: propagating business context that’s relevant to all downstream services. A user ID, a tenant ID, a feature flag, an A/B test cohort.
What baggage is not for: large payloads. Every byte in baggage is transmitted with every downstream HTTP request in the trace. Baggage is for small, high-signal keys — not for passing request bodies.
Exam callout: The OTCA exam tests the baggage use case and its risks. Correct answer: baggage propagates key-value pairs to all downstream services. The risk: every key-value pair is added to every outbound request header in that trace. Overuse causes header bloat and potential performance issues. Baggage should be small (a few small string values).
The Danger: Baggage Is Not Filtered
There is no automatic filtering of baggage. If Service A adds user.session_token to baggage, it propagates to every service downstream — including third-party services you call. This is a real data leakage risk.
Rule: Only put data in baggage that you would be comfortable sending to every service your trace touches, including external ones.
The Propagator API
The mechanism behind inject and extract is the Propagator interface. OTel ships with two built-in propagators:
TraceContextPropagator— implements W3C TraceContext (traceparent+tracestate)BaggagePropagator— implements W3C Baggage (baggageheader)
You configure propagators at SDK initialization. The default composite propagator uses both:
from opentelemetry.propagators.composite import CompositePropagator
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
from opentelemetry.baggage.propagation import W3CBaggagePropagator
from opentelemetry import propagate
propagate.set_global_textmap(
CompositePropagator([
TraceContextTextMapPropagator(),
W3CBaggagePropagator(),
])
)
If you’ve instrumented a service but its upstream uses a different propagation format (B3, Jaeger), you add the corresponding propagator to the composite and it handles both formats simultaneously.
Exam callout: The OTCA exam tests propagator configuration. Know that: (1) propagators are configured at SDK init, not per-request; (2) the W3C TraceContext propagator is the OTel default; (3) B3 and Jaeger propagators exist for legacy system compatibility and can be composed with the W3C propagator; (4) if no propagator is configured, inject/extract are no-ops and propagation silently fails.
Diagnosing a Propagation Break
When traces are fragmented — services appear as disconnected root spans instead of a unified trace — the diagnostic path is systematic:
Symptom: Service A and Service B traces are disconnected
(B appears as a new root span, not a child of A)
Step 1: Is Service A injecting?
→ Check: does the outbound request from A contain traceparent?
→ Tool: debug exporter in Collector, or HTTP request logging
Step 2: Is the header surviving transit?
→ Check: does the inbound request to B contain traceparent?
→ Common culprit: load balancer, API gateway, proxy stripping headers
→ Tool: log request headers at B's entry point
Step 3: Is Service B extracting?
→ Check: is B's OTel SDK initialized with a propagator?
→ Check: is the instrumentation library (e.g., Flask, Spring) calling extract?
→ Common culprit: SDK initialized but propagator not set; or custom framework
not calling extract
Step 4: Is the context being used?
→ Check: is B creating spans inside the extracted context?
→ Common culprit: context extracted but new span created outside it
The most common cause in practice: a load balancer or API gateway in the path stripping traceparent. Check there first.
InsureWatch: The Three-Service Propagation Chain
InsureWatch has three services, each with a different language and HTTP framework:
Browser / Test Client
│
▼
Python API (Flask) ← extracts traceparent from inbound request
│ creates SERVER span
│ POST /claims
│ traceparent: 00-7f3b...-a1b2c3d4-01 ← injected by requests library
▼
Java Claims Service (Spring) ← extracts traceparent via Spring instrumentation
│ creates SERVER span, parent = a1b2c3d4
│ GET /policies/{id}
│ traceparent: 00-7f3b...-b2c3d4e5-01 ← injected with Java span ID
▼
Node.js Policy Store (Express) ← extracts via axios/express instrumentation
creates SERVER span, parent = b2c3d4e5
One trace ID (7f3b...) spans all three services, three languages. In Grafana Tempo or Jaeger, this appears as a single waterfall with three service segments — you can see the full causal chain from the API call to the database query on the other side of the Java service.
Lab 1 breaks this intentionally. The propagator configuration between Python and Java is misconfigured — the Python service is using a B3 propagator while Java expects W3C TraceContext. The headers are present, but in the wrong format. Java can’t parse them and creates a new root span. Your job in Lab 1 is to find this by reading the fragmented traces in Grafana and fix it.
What’s Next
Lab 1 puts this into practice: a live InsureWatch environment with context propagation intentionally misconfigured between the Python API and Java claims service. Traces are fragmented — you diagnose why from the Grafana trace viewer and fix the propagator configuration to restore the full distributed trace. No step-by-step instructions — just the symptoms and the tools.
Lab 1 is available as part of the paid course tier.