Docs
Install otel-k8s-graph, feed it trace spans, and query the graph over REST or from Claude. Three small Go binaries, coordinating only through Redis.
Quickstart
The Helm chart bundles a single-replica Redis by default, so a fresh install is
self-contained. image.registry is the only
required value.
# Build + push all three images (versioned + latest), then helm upgrade --install:
REGISTRY=<your-registry> ./deploy.sh
# Or with helm directly against pre-built images:
helm upgrade --install graph helm/graph \
--namespace default --create-namespace \
--set image.registry=<your-registry>
Already have Redis? Point at it with
--set redis.internal.enabled=false:
helm ... --set redis.internal.enabled=false \
--set redis.host=my-redis --set redis.password=s3cret kubectl create secret generic redis-creds \
--from-literal=REDIS_HOST=my-redis \
--from-literal=REDIS_PASSWORD=s3cret
helm ... --set redis.internal.enabled=false \
--set redis.existingSecret=redis-creds Helm knobs
The values you'll reach for most often. See the chart README for the complete list.
| Key | Default | Description |
|---|---|---|
| image.registry | (required) | Container registry for all three images, no trailing slash. The image ref is <registry>/<imageName>:<tag>. |
| image.tag | latest | Shared image tag for all components. deploy.sh pins the exact version at install time. |
| graph.keyPrefix | graph | GRAPH_REDIS_KEY_PREFIX namespacing every Redis key. All three components MUST agree on this to share one graph. |
| redis.internal.enabled | true | Deploy a bundled single-replica Redis. Set false to bring your own. |
| redis.host | "" | REDIS_HOST for external Redis (when internal.enabled is false and existingSecret is empty). |
| redis.password | "" | REDIS_PASSWORD for external Redis. Empty = no auth. |
| redis.existingSecret | "" | Source host/username/password from an existing Secret (via secretKeyRef). |
| redis.internal.persistence.enabled | false | Persist the bundled Redis in a PVC. Off by default — the graph is rebuilt from the K8s API and OTel data after a restart. |
| redis.internal.persistence.size | 1Gi | PVC size when persistence is enabled. |
| graphOtel.replicas | 2 | Replica count. graph-otel is horizontally scalable (idempotent full-flush writes + reaper); requires the collector to shard spans by trace ID to the headless Service (see the loadbalancing exporter above). |
| graphOtel.service.otlpPort | 4317 | OTLP gRPC port. Point the collector's traces exporter at graph-otel-otlp:<otlpPort>. |
| graphOtel.config.flushInterval | 60s | GRAPH_FLUSH_INTERVAL: how often the whole in-memory set is written to Redis (idempotent HSET/SADD, no deletes — safe across replicas). |
| graphOtel.config.expiryTtl | 100s | GRAPH_EXPIRY_TTL: drop in-memory entities/edges not seen for this long. MUST exceed flushInterval. |
| graphOtel.config.reapInterval | 5m | GRAPH_REAP_INTERVAL: how often the reaper deletes Redis entities not refreshed within reapTtl. 0 disables it; all replicas reap (idempotent). |
| graphOtel.config.reapTtl | 48h | GRAPH_REAP_TTL: delete graph-otel-owned Redis entities not refreshed by any replica for this long. (FLOW_REAP_INTERVAL / FLOW_REAP_TTL do the same for flows.) |
| graphK8s.config.resyncPeriod | 5m | WATCH_RESYNC_PERIOD: how often informers re-list every object (self-heal). |
| graphRead.replicas | 1 | Replica count for the query API (stateless reader; safe to raise). |
| graphRead.ingress.enabled | false | Expose the query API via Ingress. NO auth and includes destructive POST /prune — add auth or restrict IPs first. |
OTel Collector setup
graph-otel consumes trace spans directly — no
spanmetrics connector or metrics pipeline required. Fan the spans your apps already emit to
graph-otel alongside your existing trace backend. The
k8sattributes preset adds the
k8s.* attributes graph-otel needs to attach
relationships to the right container.
# otel-collector-values.yaml
mode: deployment
image:
repository: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib
# Adds the k8sattributes processor (+ RBAC) so spans carry
# k8s.namespace.name / k8s.pod.name / k8s.container.name.
presets:
kubernetesAttributes:
enabled: true
config:
exporters:
# graph-otel reconstructs whole traces in memory (for flows), so every
# span of a trace must reach the SAME replica. The loadbalancing exporter
# shards by trace id across the graph-otel pods, discovered via the
# headless Service (returns one A record per pod).
loadbalancing:
routing_key: traceID
resolver:
dns:
hostname: graph-otel-otlp-headless.default.svc.cluster.local
port: 4317
protocol:
otlp:
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loadbalancing] helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm upgrade --install otel-collector open-telemetry/opentelemetry-collector \
-f otel-collector-values.yaml
Already exporting traces? Add the loadbalancing
exporter alongside your existing one in the traces pipeline
(exporters: [your-backend, loadbalancing]).
The trace-id sharding keeps each trace whole across graph-otel replicas, which its
flow assembly requires.
MCP / Claude Code
The same graph-read binary runs an MCP server in
mcp mode. It's a thin REST client of the query
API — set GRAPH_BASE_URL to point it at the API.
Expose the query API with the chart's built-in Ingress so the MCP server can run anywhere:
helm ... \
--set graphRead.ingress.enabled=true \
--set graphRead.ingress.className=nginx \
--set graphRead.ingress.hosts[0].host=graph.example.com \
--set graphRead.ingress.hosts[0].paths[0].path=/ \
--set graphRead.ingress.hosts[0].paths[0].pathType=Prefix
The query API has no authentication and includes the destructive
POST /prune — add auth at the ingress (basic-auth /
oauth2-proxy annotations) or restrict source IPs before exposing it.
Then register it with Claude Code / Claude Desktop, pointing GRAPH_BASE_URL at the ingress host:
{
"mcpServers": {
"graph": {
"command": "graph-read",
"args": ["mcp"],
"env": { "GRAPH_BASE_URL": "https://graph.example.com" }
}
}
} The six tools
-
searchFind entities by keyword across IDs, names, metadata values, and edge actions. -
get_entityFull detail for one entity by exact ID: kind, metadata, and all outbound edges. -
list_entitiesEnumerate every entity of a given kind. -
get_subgraphEvery entity reachable from a root via BFS (blast radius / call neighborhood). -
list_flowsDistinct trace flows — request shapes collapsed from many traces. -
get_flowThe full canonical structure of one trace flow by root hash.
REST API
Served by graph-read on
:8080. The MCP tools mirror these endpoints.
| Method | Path | Description |
|---|---|---|
| GET | /search?q=<text>&kind=<kind>&limit=N | Case-insensitive substring across IDs, names, metadata values, and edge actions. |
| GET | /entities?kind=<kind> | List every entity of a kind. |
| GET | /entity/{id...} | One entity with its edges + metadata (IDs may contain /). |
| GET | /subgraph/{id...}?max_depth=N | BFS reachable set from an entity (blast radius / call neighborhood). |
| GET | /flows?limit=N | Trace-flow summaries (abstract request structures), by occurrence count. |
| GET | /flow/{hash} | One flow's canonical structure (collapsed Merkle tree with per-node metadata). |
| POST | /prune?older_than=<dur> | Drop entities not seen for <dur> (e.g. 5m); returns the count. |
| GET | /healthz | Liveness/readiness. |
The query API has no authentication and includes the destructive
POST /prune. Put auth in front (ingress basic-auth /
oauth2-proxy) or restrict source IPs before exposing it.
Graph schema
Entity types
Every node kind, where it comes from, and the signal that creates it.
| Kind | ID format | Source | How it's derived |
|---|---|---|---|
| namespace | namespace:<name> | K8s Namespace | Namespace objects (also implied by every pod). Metadata: label.* |
| node | node:<name> | K8s Node | Node objects + each pod's spec.nodeName. Metadata: node label.* |
| zone | zone:<name> | K8s Node labels | From topology.kubernetes.io/zone (legacy failure-domain.beta.kubernetes.io/zone). Emitted only if the label is set; region → zone → node. |
| region | region:<name> | K8s Node labels | From topology.kubernetes.io/region (legacy fallback). Emitted only when both zone and region labels are present. |
| deployment | deployment:<ns>/<name> | K8s Deployment | Deployment objects. Metadata: label.* |
| statefulset | statefulset:<ns>/<name> | K8s StatefulSet | StatefulSet objects. Metadata: label.* |
| daemonset | daemonset:<ns>/<name> | K8s DaemonSet | DaemonSet objects. Metadata: label.* |
| job | job:<ns>/<name> | K8s Job | Job objects; linked to its CronJob via ownerRef. Metadata: label.* |
| cronjob | cronjob:<ns>/<name> | K8s CronJob | CronJob objects. Metadata: label.*, cronjob.schedule |
| rollout Argo Rollouts | rollout:<ns>/<name> | Argo Rollouts CRD (argoproj.io/v1alpha1) | Watched via dynamic informer; skipped if the CRD is absent. Metadata: label.*, rollout.strategy (canary/blueGreen) |
| pod | pod:<ns>/<name> | K8s Pod | Pod objects. Metadata: label.*, k8s.node.name, k8s.pod.uid |
| container | container:<ns>/<pod>/<name> | K8s Pod spec | From each pod's spec.containers. Metadata: container.image.name |
| hpa | hpa:<ns>/<name> | K8s HPA (autoscaling/v2) | HorizontalPodAutoscaler objects. Metadata: hpa.min_replicas, hpa.max_replicas, hpa.target.kind/name |
| scaledobject KEDA | scaledobject:<ns>/<name> | KEDA CRD (keda.sh/v1alpha1) | Dynamic informer; skipped if the CRD is absent. Metadata: keda.target.*, keda.min/max_replicas, keda.triggers, keda.scaling_policy |
| endpoint | endpoint:<service>/<METHOD>/<route> | OTel HTTP & RPC spans | span.kind SERVER or CLIENT. RPC is matched first — rpc.method (or a gRPC-shaped http.route / url.full path): id endpoint:<rpc.system>/<rpc.service>/<rpc.method>, host-independent so a client's CALLS and the server's EXPOSES converge on one entity (e.g. grpc /oteldemo.CartService/GetCart). Otherwise HTTP: SERVER → http.route + service.name; CLIENT → url.full path + peer.service (else server.address). HTTP routes collapse :id/{id}/digit segments to {n}; RPC names are kept verbatim. |
| topic | topic:<name> | OTel messaging spans | span.kind PRODUCER or CONSUMER. Name from messaging.destination.name (fallback: first token of span.name). Used verbatim. |
| database | database:<system>/<host>[:<port>] | OTel DB-client spans | db.system set + span.kind CLIENT + server.address. Port from server.port (optional). |
Edge types
Each edge has a counterpart in the other direction. Span-derived edges anchor on the emitting container (container preferred over pod).
| Edge / counterpart | Source | Meaning | How it's derived |
|---|---|---|---|
| CONTAINS / RUNS_IN | K8s | Structural containment (A contains B). | namespace → pod, node → pod, pod → container (pod spec); zone → node, region → zone (node topology labels). |
| MANAGES / MANAGED_BY | K8s | A workload owns/controls its children. | deployment/statefulset/daemonset/job/rollout → pod (pod ownerRef); cronjob → job (Job ownerRef); scaledobject → hpa (HPA ownerRef = ScaledObject). |
| SCALES / SCALED_BY | K8s | An autoscaler scales a workload. | hpa → target (HPA spec.scaleTargetRef); scaledobject → target (KEDA spec.scaleTargetRef, defaults Deployment). Targets: Deployment/StatefulSet/Rollout; other kinds recorded in metadata only. |
| EXPOSES / EXPOSED_BY | Spans | A container serves an HTTP or RPC endpoint. | SERVER span: HTTP (http.request.method + http.route + service.name) or RPC (rpc.method) — see the endpoint entity for id details. |
| CALLS / CALLED_BY | Spans | A container calls an HTTP or RPC endpoint. | CLIENT span: HTTP (http.request.method + url.full; target = peer.service / server.address) or RPC (rpc.method). |
| QUERIES / QUERIED_BY | Spans | A container queries a database. | DB CLIENT span: db.system + span.kind=CLIENT + server.address. Carries action = templatized span.name (the SQL op); one edge per distinct operation. |
| PUBLISHES / PUBLISHED_BY | Spans | A container publishes to a topic. | Messaging PRODUCER span. |
| CONSUMES / CONSUMED_BY | Spans | A container consumes from a topic. | Messaging CONSUMER span. |
| EMITS / EMITTED_BY | Spans (rollup) | Workload-level rollup of everything its containers touch — one hop to answer “what does this deployment talk to?” | deployment/statefulset/… → every endpoint/topic/database (the union of its containers' CALLS/EXPOSES/QUERIES/PUBLISHES/CONSUMES), hoisted onto the stable workload id. graph-otel resolves container → pod → workload via the pod's MANAGED_BY edge at flush time. |
Redis schema (prefix configurable, default graph)
<prefix>:entity:<id> HASH id, kind, name, last_seen_at_ms
<prefix>:entity:<id>:metadata HASH arbitrary string key/values
<prefix>:entity:<id>:edges SET JSON-encoded Edge objects
<prefix>:by_kind:<kind> SET entity IDs of the given kind
<prefix>:ids SET all entity IDs Full source and chart values on GitHub.