Full visibility into every LLM call across your stack
Stereos is OTEL-native. Every gateway request emits a span. Send spans from your own instrumented services too — we relay and surface everything in one place.
Built on open standards
Stereos accepts standard OTLP/HTTP JSON on /v1/traces, /v1/logs, and /v1/metrics. Any OpenTelemetry SDK or exporter that speaks OTLP/HTTP can send data with minimal configuration.
Gateway-native spans
Every request through your Stereos gateway automatically emits an OTEL span — no instrumentation required on your end.
Ingest from anywhere
Send spans from your own apps, agents, or pipelines and correlate them with gateway-emitted spans end-to-end.
Everything you need to understand AI usage in production
From token usage and latency percentiles to per-service attribution and error rates — surfaced automatically from the spans you already emit.
Real-time event feed
A reverse-chronological stream of every trace, log, and metric received — filterable by vendor, model, service, or severity.
Token & cost charts
Stacked bar charts of input vs. output token usage over time, with cost attribution broken down by team and model.
Latency percentiles
p50, p90, p95, and p99 latency charts per model and service so you can catch regressions before they impact users.
Gen AI conventions
Built on the OpenTelemetry Gen AI semantic conventions. Standard attributes like gen_ai.system and gen_ai.usage.* map directly to dashboard widgets.
Multi-vendor
Automatic vendor detection from gen_ai.system covers OpenAI, Anthropic, Gemini, Mistral, Bedrock, and more.
Span waterfall
Full trace timeline showing the waterfall of spans for individual requests — click any span to inspect attributes, prompts, and completions.
See your LLM usage in real time
Connect your first service in minutes with any OTEL SDK.