Skip to content
Module 4 45 min read

Module 4: Semantic Conventions

Semantic conventions are not optional naming suggestions. Break them and every downstream tool that expects standard attribute names breaks silently. They are a portability contract.

The Assumption to Destroy

“Semantic conventions are optional naming suggestions — I can name my attributes whatever makes sense for my team.”

If you’ve ever added a span attribute called http_method or statusCode or query_string and thought “close enough,” this is the assumption we’re going to destroy in this module.

Semantic conventions are not style preferences. They are a portability contract. When you use them correctly, every OTel-compatible backend, visualization tool, and auto-instrumentation library can build features against your telemetry with confidence — because they know exactly what to expect. When you invent your own names, all of that silently disappears. Your dashboards show nothing. Your vendor’s out-of-box APM views are blank. Your SLO alerts don’t fire. And nothing tells you why.

Here’s the mental model we’re going to build:

Semantic conventions are the schema that makes OpenTelemetry’s portability promise real. Any tool that understands OTel understands the conventions. If your data doesn’t follow the conventions, it is not actually portable — it is just data that happens to move through OTel’s plumbing.

Let’s take this apart piece by piece.


What Semantic Conventions Are and Where They Live

Semantic conventions define the names, types, and meanings of attributes for common operations: HTTP requests, database calls, messaging systems, runtime metrics, cloud resource metadata, and more.

They live in the opentelemetry/semantic-conventions repository on GitHub — a dedicated sub-repo under the OpenTelemetry organization. This is separate from the main opentelemetry-specification repo, which covers the API, SDK, and protocol definitions. Semantic conventions have their own release cadence.

The point of having a shared specification is this: any backend vendor can build a “database query performance” view that works for every customer — not just customers on a specific SDK or language. Why? Because the vendor knows:

  • The database system will always be in db.system
  • The database name will always be in db.name
  • The operation type will always be in db.operation
  • The actual query will always be in db.statement (when present)

When you follow these conventions in your InsureWatch claims service — whether it’s written in Python, Java, or Node.js — the vendor’s dashboard works for you out of the box. When you invent your own names (database_type, sql_query, op), the dashboard shows zero data, with no error, no warning, and no indication of why.

That last part is what makes convention violations so dangerous. They don’t cause exceptions. They cause silence.

Your App (following conventions)
   │  db.system = "postgresql"
   │  db.operation = "SELECT"
   │  db.statement = "SELECT * FROM claims WHERE ..."

OTel Collector
   │  (passes through unchanged)

Backend A (Grafana)                Backend B (Datadog)
"I know db.system exists —         "I know db.system exists —
 building DB latency panel"         adding to DB analytics view"

─────────────────────────────────────────────────────────────────

Your App (inventing names)
   │  database_type = "postgresql"
   │  op = "SELECT"
   │  query = "SELECT * FROM claims WHERE ..."

OTel Collector
   │  (passes through unchanged)

Backend A (Grafana)                Backend B (Datadog)
"db.system not found —             "db.system not found —
 panel shows no data"               DB view shows no data"

Takeaway: Every attribute name you invent is a name no downstream tool will ever understand. Use the conventions.


Stability Levels

Not all semantic conventions are equal. The specification defines three stability levels, and each has real consequences for how you should use them.

Stable

Stable conventions are locked. The names, types, and meanings will not change. You can instrument against them today and expect them to work in five years, across any SDK version, any Collector version, any backend version.

When you build SLO dashboards, alerting rules, or on-call runbooks on stable attributes, you’re building on solid ground.

Experimental

Experimental conventions are subject to change. The names you use today might be renamed, restructured, or removed in a future release. The specification makes no backward-compatibility guarantee.

This does not mean “don’t use them.” Many of the most practically useful attributes are still experimental — particularly in newer areas like GenAI instrumentation, some messaging patterns, and certain runtime metrics. The point is: know what you’re building on. An experimental attribute is a conscious trade-off. You accept that a future SDK or library upgrade might rename it.

Exam callout: The OTCA exam tests the stability system specifically. Know that “experimental” means “may change,” not “do not use.” Know that stable means “guaranteed not to change.” Know that deprecated means “was stable, now replaced — migrate to the new name.”

Deprecated

Deprecated conventions were once stable, then superseded by better names — usually because the original design had structural problems, or the naming didn’t compose well with other namespaces. The old name continues to work for backward compatibility, but you should migrate to the current stable name in new instrumentation.

The HTTP namespace evolution is the clearest example of this, and it is directly tested on the exam. We’ll cover it in detail next.


The Naming Evolution Trap — http.method vs http.request.method

This is a specific exam topic. It is also a real production problem that has bitten teams on SDK upgrades.

The Old Names (pre-1.21)

Before semantic conventions version 1.21, HTTP attributes looked like this:

  • http.method — the request method (GET, POST, etc.)
  • http.url — the full URL
  • http.status_code — the response status code
  • http.host — the target host
  • http.scheme — http or https

These names were flat. They mixed request attributes, response attributes, and URL components in the same namespace without any clear structure.

The New Names (1.21+)

Starting with 1.21, the HTTP conventions were rearchitected:

  • http.request.method — the request method
  • url.full — the full URL (moved to the url.* namespace)
  • url.path — the URL path component
  • url.scheme — http or https (moved to url.*)
  • http.response.status_code — the response status code (now clearly scoped to response)
  • server.address — the target host (moved to server.*)
  • server.port — the target port
Instrumentation v0.x → v1.21+

Old name                    New name
─────────────────────────────────────────────────────────
http.method           →     http.request.method
http.url              →     url.full
http.status_code      →     http.response.status_code
http.host             →     server.address
http.scheme           →     url.scheme

Why the Rename?

The old names blurred the boundary between request data and response data. http.status_code and http.method live in the same flat namespace, but one is set when you receive a request and the other when you send a response. The new names make the lifecycle clear: http.request.* is set when the request is received, http.response.* is set when the response is sent.

The URL components were also moved to a shared url.* namespace — because URL attributes apply to many protocols beyond HTTP, and centralizing them prevents duplication.

The Practical Failure Mode

Here’s where teams get burned. Your auto-instrumentation library gets upgraded as part of a routine dependency bump. The new version emits http.request.method. Your dashboards query http.method. The data stopped matching at the exact moment the upgrade deployed.

No error. No warning. The charts just go flat.

Dashboard panel:    filter by http.method = "GET"

Before upgrade:     http.method = "GET"     [1,423 req/min shown]

After upgrade:      http.request.method = "GET"
                    http.method = (not emitted)

Dashboard result:   [0 req/min shown — query matches nothing]

For InsureWatch, this means if the Python API service’s opentelemetry-instrumentation-fastapi package is on an older version while your Grafana dashboard queries http.request.method, you’re flying blind on inbound request rates without knowing it.

The fix: always check the auto-instrumentation library changelog when upgrading. Any attribute rename will be documented there. When upgrading, temporarily emit both old and new names (many libraries support this via compatibility flags during the migration window), then flip dashboards to the new names, then stop emitting old names.

Exam callout: Know both old and new HTTP attribute names. The exam tests whether you can identify the current stable conventions and recognize deprecated names. http.method is deprecated. http.request.method is current stable.

Takeaway: Auto-instrumentation library upgrades can silently rename attributes. When dashboards go flat after an upgrade, attribute renames are the first thing to check.


Key Namespaces to Know for the Exam

The OTCA exam expects you to recognize attributes across six major namespaces. For each one, know the most important attribute names and what values they take.

http.* — HTTP Requests and Responses

These apply to HTTP server spans (incoming requests) and HTTP client spans (outgoing requests).

Current stable attributes:

  • http.request.method — GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS
  • http.response.status_code — 200, 404, 500, etc.
  • url.full — the complete URL including scheme, host, path, and query string
  • url.path — just the path component: /api/quote
  • url.scheme — “http” or “https”
  • server.address — the hostname of the target server
  • server.port — the port number

For InsureWatch, every inbound request to the Python API service emits these. The /api/quote endpoint gets http.request.method = "POST" and http.response.status_code = 200 on success, http.response.status_code = 400 on validation failure.

One important nuance: for server spans, server.address and server.port reflect the local server’s address. For client spans, they reflect the target server’s address. The same attribute name, two different perspectives depending on span kind.

db.* — Database Calls

These apply to CLIENT spans representing outbound database operations.

Key attributes:

  • db.system — the type of database: “postgresql”, “redis”, “mongodb”, “mysql”, “cassandra”
  • db.name — the specific database name: “insurewatch_claims”
  • db.operation — the operation type: “SELECT”, “INSERT”, “UPDATE”, “DELETE”, “FINDONE”
  • db.statement — the actual query string: SELECT * FROM claims WHERE claim_id = $1

For InsureWatch, the Java service querying the claims PostgreSQL database would emit:

db.system    = "postgresql"
db.name      = "insurewatch_claims"
db.operation = "SELECT"
db.statement = "SELECT claim_id, status, amount FROM claims WHERE policy_id = $1"

A backend vendor’s “slow query” panel knows exactly where to find the query text because it’s always in db.statement. An APM tool’s database topology view knows the system type because it’s always in db.system.

One critical note on db.statement: this attribute can contain PII. A query like SELECT * FROM members WHERE email = 'user@example.com' exposes an email address. We’ll cover this in the cardinality and sensitivity section below.

messaging.* — Message Queues and Streaming

These apply to PRODUCER and CONSUMER spans for asynchronous messaging systems.

Key attributes:

  • messaging.system — “kafka”, “rabbitmq”, “sqs”, “pubsub”
  • messaging.destination.name — the topic or queue name: “claims.submitted”, “quote.requested”
  • messaging.operation — “publish”, “receive”, “process”

For InsureWatch, if the platform adds event-driven claims processing — where a new claim submission publishes to a Kafka topic consumed by the underwriting service — the producer span on the API service would carry:

messaging.system           = "kafka"
messaging.destination.name = "claims.submitted"
messaging.operation        = "publish"

And the consumer span on the underwriting service would carry:

messaging.system           = "kafka"
messaging.destination.name = "claims.submitted"
messaging.operation        = "process"

Both sides of the message exchange use the same conventions. A distributed tracing tool can draw the producer-to-consumer connection automatically — provided the trace context is propagated in the message headers.

service.* — Resource Attributes

Here is an important distinction: service.* attributes are Resource attributes, not span attributes. You do not set them per span. You set them once at SDK initialization, and the SDK attaches them to every piece of telemetry emitted by that process.

Key attributes:

  • service.name — required. The logical name of the service: “claims-api”, “quote-engine”, “policy-service”. If missing, most backends show “unknown_service” in their UIs, which makes the data nearly useless.
  • service.version — the deployed version: “2.1.0”, “2025-03-15-abcdef”. Critical for correlating incidents with deployments.
  • service.instance.id — a unique identifier for this running instance. When you have 20 replicas of claims-api, this lets you distinguish pod-a1b2 from pod-c3d4. Crucial when one replica is having a bad time and the others aren’t.
SDK initialization (Python):

resource = Resource.create({
    "service.name": "claims-api",
    "service.version": "2.1.0",
    "service.instance.id": os.environ.get("POD_NAME", socket.gethostname()),
})

# Every span, metric, and log record from this process
# carries these three attributes automatically.

Exam callout: Know that service.* are Resource attributes, not span attributes. They are set at SDK init, not per-span. The exam distinguishes between Resource attributes (process-level) and span attributes (operation-level).

cloud.* — Cloud Provider Context

These describe the cloud environment where the service is running. They are typically set automatically by resource detectors — SDK components that interrogate the cloud provider’s metadata API at startup.

Key attributes:

  • cloud.provider — “aws”, “gcp”, “azure”
  • cloud.region — “us-east-1”, “northeurope”, “us-central1”
  • cloud.account.id — the cloud account or subscription ID
  • cloud.availability_zone — “us-east-1a”

For InsureWatch running on AWS EKS, the AWS resource detector automatically populates these at startup. You get cloud context on every piece of telemetry without writing a single line of application code.

k8s.* — Kubernetes Context

Similar to cloud attributes, these are typically set by the Kubernetes resource detector.

Key attributes:

  • k8s.namespace.name — “insurewatch-prod”, “insurewatch-staging”
  • k8s.pod.name — “claims-api-7b9d4f-xk2p1”
  • k8s.node.name — the cluster node hosting the pod
  • k8s.deployment.name — “claims-api”
  • k8s.container.name — the container within the pod

Here’s why these are operationally critical: when you have 40 replicas of claims-api and one of them is misbehaving, k8s.pod.name is how you find it. Without it, you know something is wrong with claims-api at the service level, but you’re grep-ing logs across 40 pods to find the sick one. With k8s.pod.name as a Resource attribute on every span and log, you filter to the specific pod in seconds.

Scenario: claims-api p99 latency spike, 40 replicas running

Without k8s.* attributes:
  - 40 pods to check
  - ssh into each one and look at logs
  - ~15 minutes to isolate

With k8s.pod.name on every span:
  - filter spans by p99 > 500ms
  - group by k8s.pod.name
  - one pod is responsible for 95% of the slow spans
  - ~30 seconds to isolate

Takeaway: Resource attributes — service.*, cloud.*, k8s.* — are not optional decoration. They are the operational context that makes telemetry actionable at scale.


Within each namespace, the specification further categorizes attributes by requirement level. This is separate from stability level — a stable attribute can be optional, and an experimental attribute can be required.

Required

A backend can reasonably assume a Required attribute is present on any span in that namespace. If it’s missing, the telemetry is considered incomplete.

Should be present when you have the data. Backends may build features around Recommended attributes, but they won’t assume they’re always present.

Opt-in

Available when you want them, but not expected. These are often higher-cardinality or privacy-sensitive attributes — the kind where the cost of emitting them needs to be weighed against the benefit.

Here’s how this plays out for db.*:

Requirement level  Attribute        Notes
─────────────────  ───────────────  ─────────────────────────────────────────
Required           db.system        Always emit. What kind of DB is this?
Recommended        db.name          Emit when known. Which database?
Recommended        db.operation     Emit when known. SELECT, INSERT, etc.
Opt-in             db.statement     Can contain PII. Emit intentionally, not by default.

The practical implication: auto-instrumentation for PostgreSQL will always emit db.system = "postgresql". It will try to emit db.name and db.operation. It will not emit db.statement by default — you have to explicitly opt in, because doing so means potentially shipping your SQL queries (and any PII embedded in them) to your observability backend.

For InsureWatch, a query like:

SELECT * FROM members WHERE email = 'user@example.com' AND dob = '1985-03-15'

contains PII. If db.statement is enabled, that email address and date of birth are now in your observability data. That’s a compliance problem, a data residency problem, and a potential breach disclosure problem if your observability backend is breached.

The specification’s distinction between Required, Recommended, and Opt-in is intentional: it creates a default posture that is useful but not irresponsible.

Exam callout: The OTCA tests requirement levels within namespaces. Know that db.system is required, db.statement is opt-in, and why.

Takeaway: Opt-in attributes exist because emitting them always has a cost — either cardinality, privacy, or both. Know which attributes require explicit consent before enabling them.


Cardinality Explosions from High-Value Attributes

This is the failure mode that semantic conventions are specifically designed to prevent, and it is tested on the OTCA exam.

The Rule

Span attribute values must be bounded in cardinality. In practice: the number of distinct values an attribute can take should be finite and small — ideally in the dozens or hundreds, not millions.

This matters because span attributes often get promoted to metric labels. When the OTel Collector or your backend converts trace data into metrics, it creates one metric series per unique combination of attribute values. If one of those attribute values is a UUID, you now have one metric series per UUID ever generated — which is unbounded cardinality, and it will crash your metrics backend.

What Breaks It

High-cardinality attributes that seem reasonable but are not:

  user.id          →  one series per user (millions)
  request.id       →  one series per request (billions over time)
  url.full         →  /api/policy/12345, /api/policy/67890, ...
  error.message    →  "Connection timeout after 5001ms" vs "5002ms"
  session.token    →  a different value on every request

The URL example is particularly subtle. url.full is a legitimate attribute — it’s in the HTTP conventions. But if you are using the full URL including query parameters (/api/quote?policy_id=abc123&zip=90210), the cardinality is unbounded. Every unique combination of query parameters is a new series.

How Conventions Steer You Away From This

The HTTP conventions make a specific distinction that addresses this:

  • url.full — the actual URL with all parameters. Bounded enough for traces (one per request is fine). Not safe for metric labels.
  • url.path — just the path: /api/quote. Still varies with path parameters.
  • http.route — the route template: /api/policy/{id}. This is bounded. Ten thousand requests to ten thousand different policy IDs all share one route template.

For InsureWatch, this is critical. The claims API has endpoints like:

/api/claims/7382
/api/claims/7383
/api/claims/7384

If you use url.path as a metric dimension, you create one series per claim ID. Use http.route = "/api/claims/{id}" instead, and you have one series for the entire claims endpoint — regardless of how many claims exist.

Wrong (unbounded cardinality):
  span attribute:  url.path = "/api/claims/7382"
  metric label:    url.path = "/api/claims/7382"
  → one metric series per claim_id. Millions of series over time.

Right (bounded cardinality):
  span attribute:  http.route = "/api/claims/{id}"
  metric label:    http.route = "/api/claims/{id}"
  → one metric series for the claims endpoint. Always.

The InsureWatch Disaster Scenario

The claims team at InsureWatch decides to add quote_id — a UUID — as a span attribute on every quote request, so they can correlate traces to quotes. Reasonable motivation. But they also have a dashboard that generates RED metrics (rate, error, duration) from span attributes.

At 1,000 quotes per day, in a year you have 365,000 unique quote_id values. Each one is a separate metric series. At 100,000 quotes per day (after a good sales quarter), you have 36.5 million series. Prometheus OOM kills. Grafana queries time out. Alerting stops working.

The solution isn’t to remove quote_id from spans — it’s valuable for tracing. The solution is to not promote high-cardinality attributes to metric dimensions. The Collector’s spanmetrics processor and similar tools have explicit configuration for which attributes to promote. Leave quote_id in the span, out of the metrics.

Exam callout: The OTCA exam tests cardinality awareness. Know why high-cardinality attribute values create metric series explosion when those attributes are promoted to metric dimensions. Know that trace attributes and metric labels have different cardinality requirements.

Takeaway: High-cardinality values belong in trace spans where they help with individual request debugging. They do not belong in metric labels where they create unbounded series. Semantic conventions steer you toward bounded values — use the route template, not the URL.


How to Stay Current

Semantic conventions are not static. They evolve as OTel matures, as new technologies need coverage, and as the community discovers that existing names have structural problems (see: the entire HTTP rename story).

What was experimental 12 months ago may be stable today. What was stable a year before that may now be deprecated. Staying current is an ongoing responsibility.

What to Watch

Subscribe to open-telemetry/semantic-conventions releases on GitHub. New versions are tagged with changelogs. Each release notes what moved from experimental to stable, what was deprecated, and what new namespaces were added.

Check instrumentation library changelogs on every upgrade. When you bump opentelemetry-instrumentation-django from 0.44 to 0.46, read the changelog before deploying. Any attribute rename will be documented there. The change may be invisible until your dashboards go dark.

When dashboards stop showing data after an upgrade: attribute renames are the first hypothesis to test. Compare the attribute names your dashboard queries against the attribute names your upgraded library emits. In Grafana Tempo or Jaeger, pull a fresh trace and look at the raw span attributes. Compare them to what your dashboard filter expressions expect.

The Migration Pattern

When a rename happens, the transition isn’t instantaneous. Instrumentation libraries typically offer a compatibility period where they emit both old and new attribute names simultaneously. Use it:

Phase 1 (compatibility mode on):
  Library emits: http.method = "GET" (old)
                 http.request.method = "GET" (new)
  Dashboard queries: http.method  [still works]

Phase 2 (update dashboards):
  Switch dashboard queries from http.method to http.request.method
  Validate data appears correctly

Phase 3 (disable compatibility mode):
  Library emits: http.request.method = "GET" (new only)
  Dashboard queries: http.request.method  [works]

Skipping phase 2 — disabling the old name without updating dashboards — is how teams end up with silently empty charts.


Putting It Together: InsureWatch Attribute Audit

Let’s apply everything in this module to a concrete scenario. The InsureWatch platform has three services:

  • Python API service — handles inbound REST requests
  • Java claims service — reads and writes the PostgreSQL claims database
  • Node.js notification service — publishes to an SQS queue

Here’s what correct semantic convention usage looks like across all three:

Python API service (SERVER span for POST /api/quotes):
  Resource:
    service.name = "api-gateway"
    service.version = "3.2.1"
    k8s.namespace.name = "insurewatch-prod"
    k8s.pod.name = "api-gateway-6f7d9b-xk2p1"
    cloud.provider = "aws"
    cloud.region = "us-east-1"

  Span attributes:
    http.request.method = "POST"
    http.route = "/api/quotes"
    http.response.status_code = 201
    url.scheme = "https"
    server.address = "api.insurewatch.com"

───────────────────────────────────────────────────────────────

Java claims service (CLIENT span for DB query):
  Resource:
    service.name = "claims-service"
    service.version = "1.8.0"
    k8s.namespace.name = "insurewatch-prod"
    k8s.pod.name = "claims-service-7b9d4f-m3n4k"

  Span attributes:
    db.system = "postgresql"
    db.name = "insurewatch_claims"
    db.operation = "SELECT"
    [db.statement not enabled — PII risk]

───────────────────────────────────────────────────────────────

Node.js notification service (PRODUCER span for SQS):
  Resource:
    service.name = "notification-service"
    service.version = "0.9.4"
    k8s.namespace.name = "insurewatch-prod"

  Span attributes:
    messaging.system = "sqs"
    messaging.destination.name = "insurewatch-claim-events"
    messaging.operation = "publish"

Every backend tool that understands OTel semantic conventions can now build a full topology view of InsureWatch — HTTP entry points, database dependencies, messaging patterns — without any custom configuration. The data is self-describing.

The same data sent with invented names (api_method, db_type, queue_name) would flow through the same Collector, reach the same backend, and produce an empty topology view. Same infrastructure. No portability.


What’s Next

We’ve covered the attribute naming layer — what the conventions say, where they live, how they evolve, and what breaks when you ignore them. But we haven’t covered the operational layer: how does telemetry actually get from your application to your backend?

That’s the Collector — the proxy, processor, and router that sits between your instrumented services and your observability backends. It is the most operational component in the OTel stack, and it is the component that gives you the most leverage: attribute renaming for convention migrations, cardinality reduction, tail-based sampling, and multi-backend fan-out all happen here.

That’s Module 5: The OpenTelemetry Collector — architecture, pipeline design, and production configuration.