Docs

Install otel-k8s-graph, feed it trace spans, and query the graph over REST or from Claude. Three small Go binaries, coordinating only through Redis.

Quickstart

The Helm chart bundles a single-replica Redis by default, so a fresh install is self-contained. image.registry is the only required value.

install

# Build + push all three images (versioned + latest), then helm upgrade --install:
REGISTRY=<your-registry> ./deploy.sh

# Or with helm directly against pre-built images:
helm upgrade --install graph helm/graph \
  --namespace default --create-namespace \
  --set image.registry=<your-registry>

Already have Redis? Point at it with --set redis.internal.enabled=false:

external redis (inline)

helm ... --set redis.internal.enabled=false \
  --set redis.host=my-redis --set redis.password=s3cret

external redis (existing secret)

kubectl create secret generic redis-creds \
  --from-literal=REDIS_HOST=my-redis \
  --from-literal=REDIS_PASSWORD=s3cret

helm ... --set redis.internal.enabled=false \
  --set redis.existingSecret=redis-creds

Helm knobs

The values you'll reach for most often. See the chart README for the complete list.

Key	Default	Description
image.registry	(required)	Container registry for all three images, no trailing slash. The image ref is <registry>/<imageName>:<tag>.
image.tag	latest	Shared image tag for all components. deploy.sh pins the exact version at install time.
graph.keyPrefix	graph	GRAPH_REDIS_KEY_PREFIX namespacing every Redis key. All three components MUST agree on this to share one graph.
redis.internal.enabled	true	Deploy a bundled single-replica Redis. Set false to bring your own.
redis.host	""	REDIS_HOST for external Redis (when internal.enabled is false and existingSecret is empty).
redis.password	""	REDIS_PASSWORD for external Redis. Empty = no auth.
redis.existingSecret	""	Source host/username/password from an existing Secret (via secretKeyRef).
redis.internal.persistence.enabled	false	Persist the bundled Redis in a PVC. Off by default — the graph is rebuilt from the K8s API and OTel data after a restart.
redis.internal.persistence.size	1Gi	PVC size when persistence is enabled.
graphOtel.replicas	2	Replica count. graph-otel is horizontally scalable (idempotent full-flush writes + reaper); requires the collector to shard spans by trace ID to the headless Service (see the loadbalancing exporter above).
graphOtel.service.otlpPort	4317	OTLP gRPC port. Point the collector's traces exporter at graph-otel-otlp:<otlpPort>.
graphOtel.config.flushInterval	60s	GRAPH_FLUSH_INTERVAL: how often the whole in-memory set is written to Redis (idempotent HSET/SADD, no deletes — safe across replicas).
graphOtel.config.expiryTtl	100s	GRAPH_EXPIRY_TTL: drop in-memory entities/edges not seen for this long. MUST exceed flushInterval.
graphOtel.config.reapInterval	5m	GRAPH_REAP_INTERVAL: how often the reaper deletes Redis entities not refreshed within reapTtl. 0 disables it; all replicas reap (idempotent).
graphOtel.config.reapTtl	48h	GRAPH_REAP_TTL: delete graph-otel-owned Redis entities not refreshed by any replica for this long. (FLOW_REAP_INTERVAL / FLOW_REAP_TTL do the same for flows.)
graphK8s.config.resyncPeriod	5m	WATCH_RESYNC_PERIOD: how often informers re-list every object (self-heal).
graphRead.replicas	1	Replica count for the query API (stateless reader; safe to raise).
graphRead.ingress.enabled	false	Expose the query API via Ingress. NO auth and includes destructive POST /prune — add auth or restrict IPs first.

OTel Collector setup

graph-otel consumes trace spans directly — no spanmetrics connector or metrics pipeline required. Fan the spans your apps already emit to graph-otel alongside your existing trace backend. The k8sattributes preset adds the k8s.* attributes graph-otel needs to attach relationships to the right container.

otel-collector-values.yaml

# otel-collector-values.yaml
mode: deployment

image:
  repository: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib

# Adds the k8sattributes processor (+ RBAC) so spans carry
# k8s.namespace.name / k8s.pod.name / k8s.container.name.
presets:
  kubernetesAttributes:
    enabled: true

config:
  exporters:
    # graph-otel reconstructs whole traces in memory (for flows), so every
    # span of a trace must reach the SAME replica. The loadbalancing exporter
    # shards by trace id across the graph-otel pods, discovered via the
    # headless Service (returns one A record per pod).
    loadbalancing:
      routing_key: traceID
      resolver:
        dns:
          hostname: graph-otel-otlp-headless.default.svc.cluster.local
          port: 4317
      protocol:
        otlp:
          tls:
            insecure: true

  service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [loadbalancing]

install collector

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm upgrade --install otel-collector open-telemetry/opentelemetry-collector \
  -f otel-collector-values.yaml

Already exporting traces? Add the loadbalancing exporter alongside your existing one in the traces pipeline (exporters: [your-backend, loadbalancing]). The trace-id sharding keeps each trace whole across graph-otel replicas, which its flow assembly requires.

MCP / Claude Code

The same graph-read binary runs an MCP server in mcp mode. It's a thin REST client of the query API — set GRAPH_BASE_URL to point it at the API. Expose the query API with the chart's built-in Ingress so the MCP server can run anywhere:

expose via ingress

helm ... \
  --set graphRead.ingress.enabled=true \
  --set graphRead.ingress.className=nginx \
  --set graphRead.ingress.hosts[0].host=graph.example.com \
  --set graphRead.ingress.hosts[0].paths[0].path=/ \
  --set graphRead.ingress.hosts[0].paths[0].pathType=Prefix

The query API has no authentication and includes the destructive POST /prune — add auth at the ingress (basic-auth / oauth2-proxy annotations) or restrict source IPs before exposing it.

Then register it with Claude Code / Claude Desktop, pointing GRAPH_BASE_URL at the ingress host:

mcp config (json)

{
  "mcpServers": {
    "graph": {
      "command": "graph-read",
      "args": ["mcp"],
      "env": { "GRAPH_BASE_URL": "https://graph.example.com" }
    }
  }
}

The six tools

search Find entities by keyword across IDs, names, metadata values, and edge actions.
get_entity Full detail for one entity by exact ID: kind, metadata, and all outbound edges.
list_entities Enumerate every entity of a given kind.
get_subgraph Every entity reachable from a root via BFS (blast radius / call neighborhood).
list_flows Distinct trace flows — request shapes collapsed from many traces.
get_flow The full canonical structure of one trace flow by root hash.

REST API

Served by graph-read on :8080. The MCP tools mirror these endpoints.

Method	Path	Description
GET	/search?q=<text>&kind=<kind>&limit=N	Case-insensitive substring across IDs, names, metadata values, and edge actions.
GET	/entities?kind=<kind>	List every entity of a kind.
GET	/entity/{id...}	One entity with its edges + metadata (IDs may contain /).
GET	/subgraph/{id...}?max_depth=N	BFS reachable set from an entity (blast radius / call neighborhood).
GET	/flows?limit=N	Trace-flow summaries (abstract request structures), by occurrence count.
GET	/flow/{hash}	One flow's canonical structure (collapsed Merkle tree with per-node metadata).
POST	/prune?older_than=<dur>	Drop entities not seen for <dur> (e.g. 5m); returns the count.
GET	/healthz	Liveness/readiness.

The query API has no authentication and includes the destructive POST /prune. Put auth in front (ingress basic-auth / oauth2-proxy) or restrict source IPs before exposing it.

Graph schema

Entity types

Every node kind, where it comes from, and the signal that creates it.

Kind	ID format	Source	How it's derived
namespace	namespace:<name>	K8s Namespace	Namespace objects (also implied by every pod). Metadata: label.*
node	node:<name>	K8s Node	Node objects + each pod's spec.nodeName. Metadata: node label.*
zone	zone:<name>	K8s Node labels	From topology.kubernetes.io/zone (legacy failure-domain.beta.kubernetes.io/zone). Emitted only if the label is set; region → zone → node.
region	region:<name>	K8s Node labels	From topology.kubernetes.io/region (legacy fallback). Emitted only when both zone and region labels are present.
deployment	deployment:<ns>/<name>	K8s Deployment	Deployment objects. Metadata: label.*
statefulset	statefulset:<ns>/<name>	K8s StatefulSet	StatefulSet objects. Metadata: label.*
daemonset	daemonset:<ns>/<name>	K8s DaemonSet	DaemonSet objects. Metadata: label.*
job	job:<ns>/<name>	K8s Job	Job objects; linked to its CronJob via ownerRef. Metadata: label.*
cronjob	cronjob:<ns>/<name>	K8s CronJob	CronJob objects. Metadata: label.*, cronjob.schedule
rollout Argo Rollouts	rollout:<ns>/<name>	Argo Rollouts CRD (argoproj.io/v1alpha1)	Watched via dynamic informer; skipped if the CRD is absent. Metadata: label.*, rollout.strategy (canary/blueGreen)
pod	pod:<ns>/<name>	K8s Pod	Pod objects. Metadata: label.*, k8s.node.name, k8s.pod.uid
container	container:<ns>/<pod>/<name>	K8s Pod spec	From each pod's spec.containers. Metadata: container.image.name
hpa	hpa:<ns>/<name>	K8s HPA (autoscaling/v2)	HorizontalPodAutoscaler objects. Metadata: hpa.min_replicas, hpa.max_replicas, hpa.target.kind/name
scaledobject KEDA	scaledobject:<ns>/<name>	KEDA CRD (keda.sh/v1alpha1)	Dynamic informer; skipped if the CRD is absent. Metadata: keda.target.*, keda.min/max_replicas, keda.triggers, keda.scaling_policy
endpoint	endpoint:<service>/<METHOD>/<route>	OTel HTTP & RPC spans	span.kind SERVER or CLIENT. RPC is matched first — rpc.method (or a gRPC-shaped http.route / url.full path): id endpoint:<rpc.system>/<rpc.service>/<rpc.method>, host-independent so a client's CALLS and the server's EXPOSES converge on one entity (e.g. grpc /oteldemo.CartService/GetCart). Otherwise HTTP: SERVER → http.route + service.name; CLIENT → url.full path + peer.service (else server.address). HTTP routes collapse :id/{id}/digit segments to {n}; RPC names are kept verbatim.
topic	topic:<name>	OTel messaging spans	span.kind PRODUCER or CONSUMER. Name from messaging.destination.name (fallback: first token of span.name). Used verbatim.
database	database:<system>/<host>[:<port>]	OTel DB-client spans	db.system set + span.kind CLIENT + server.address. Port from server.port (optional).

Edge types

Each edge has a counterpart in the other direction. Span-derived edges anchor on the emitting container (container preferred over pod).

Edge / counterpart	Source	Meaning	How it's derived
CONTAINS / RUNS_IN	K8s	Structural containment (A contains B).	namespace → pod, node → pod, pod → container (pod spec); zone → node, region → zone (node topology labels).
MANAGES / MANAGED_BY	K8s	A workload owns/controls its children.	deployment/statefulset/daemonset/job/rollout → pod (pod ownerRef); cronjob → job (Job ownerRef); scaledobject → hpa (HPA ownerRef = ScaledObject).
SCALES / SCALED_BY	K8s	An autoscaler scales a workload.	hpa → target (HPA spec.scaleTargetRef); scaledobject → target (KEDA spec.scaleTargetRef, defaults Deployment). Targets: Deployment/StatefulSet/Rollout; other kinds recorded in metadata only.
EXPOSES / EXPOSED_BY	Spans	A container serves an HTTP or RPC endpoint.	SERVER span: HTTP (http.request.method + http.route + service.name) or RPC (rpc.method) — see the endpoint entity for id details.
CALLS / CALLED_BY	Spans	A container calls an HTTP or RPC endpoint.	CLIENT span: HTTP (http.request.method + url.full; target = peer.service / server.address) or RPC (rpc.method).
QUERIES / QUERIED_BY	Spans	A container queries a database.	DB CLIENT span: db.system + span.kind=CLIENT + server.address. Carries action = templatized span.name (the SQL op); one edge per distinct operation.
PUBLISHES / PUBLISHED_BY	Spans	A container publishes to a topic.	Messaging PRODUCER span.
CONSUMES / CONSUMED_BY	Spans	A container consumes from a topic.	Messaging CONSUMER span.
EMITS / EMITTED_BY	Spans (rollup)	Workload-level rollup of everything its containers touch — one hop to answer “what does this deployment talk to?”	deployment/statefulset/… → every endpoint/topic/database (the union of its containers' CALLS/EXPOSES/QUERIES/PUBLISHES/CONSUMES), hoisted onto the stable workload id. graph-otel resolves container → pod → workload via the pod's MANAGED_BY edge at flush time.

Redis schema (prefix configurable, default graph)

redis keys

<prefix>:entity:<id>            HASH  id, kind, name, last_seen_at_ms
<prefix>:entity:<id>:metadata   HASH  arbitrary string key/values
<prefix>:entity:<id>:edges      SET   JSON-encoded Edge objects
<prefix>:by_kind:<kind>         SET   entity IDs of the given kind
<prefix>:ids                    SET   all entity IDs

Full source and chart values on GitHub.