root-cause analysis · kubernetes

Service graph on steroids.

get_subgraph(prod/cart-service, max_depth=2)
frontend mobile · deployment CALLS /v1/add_to_cart endpoint EXPOSES ZONE · ASIA-SOUTH-1 NODE · K8S-04 cart-service QUERIES PUBLISHES postgres :cart-db database cart.events kafka topic deployment endpoint database topic zone / node
1 zone 1 node 2 deployments 1 endpoint 1 database 1 topic

arrca builds a live relationship graph of your Kubernetes cluster — from read-only APIs and the OpenTelemetry traces you already emit. Explore it yourself, or let Claude navigate it for you.

read-only k8s APIs· OTel spans· REST· MCP
01 What it is

Your cluster, as one queryable graph.

A microservices topology is scattered across two sources — the Kubernetes API and your telemetry — and neither shows the whole picture. arrca reads both into a single graph of entities and the relationships between them, that engineers can actually explore. No kubectl incantations, no RBAC barriers, no guessing which dashboard holds what.

Kubernetes API nodes, zones, workloads, pods, containers — and how they contain, manage & scale each other — cluster topology
OTel spans endpoints, topics, databases — and who calls, queries & publishes to what — service interactions
one graph of entities + edges — over REST, or straight from Claude
02 For platform & SRE teams

The structural risks hiding in your services.

Microservice anti-patterns become queries. Ask them yourself over REST, or let Claude walk the graph for you.

Single points of failure One service that nearly everything calls. If it falls, the fleet falls — find the nodes with the heaviest CALLED_BY fan-in. get_entity → count CALLED_BY
Large fan-outs A request that triggers a cascade of downstream calls — a latency and failure amplifier. Spot nodes with many outbound CALLS edges. get_entity → count CALLS
Cyclical dependencies A calls B calls C calls A. Cycles make deploys and failures hard to reason about. Detect loops across calls and flow trees. get_subgraph + list_flows
Blast radius Before you take a service down or ship a breaking change, see what depends on it — the upstream callers that break if it fails. get_subgraph(id, max_depth)
Dead code Endpoints that are exposed but have no callers — safe to deprecate. Walk every endpoint and find the ones with zero CALLED_BY. list_entities → no callers
03 For developers

Explore the real topology — no RBAC battle.

New engineers shouldn't need cluster-admin to understand how the system fits together. Every node and edge is aggregated from read-only Kubernetes APIs and telemetry — no secrets, no write access, nothing sensitive. A safe map of production you can hand to anyone on day one.

Walk from a deployment to the endpoints it serves, the databases it queries, the zones it runs in, and the services that depend on it — without ever touching a live resource.

04 Natively integrated

Built for Claude Code, out of the box.

graph-read mcp is a Model Context Protocol server. Point Claude at it and the model reasons over your live topology — reviewing a PR, planning a migration, or running an incident.

search Find entities by keyword across IDs, names, metadata values, and edge actions.
get_entity Full detail for one entity: kind, metadata, and all outbound edges.
list_entities Enumerate every entity of a given kind.
get_subgraph Everything reachable from a root via BFS — dependencies downstream, dependents upstream.
list_flows Distinct request shapes collapsed from many traces.
get_flow The full canonical structure of one trace flow.
ask claude
> What breaks if I take down
  auth-service?
Claude calls get_subgraph(prod/auth-service) and walks its callers (CALLED_BY):
  7 services break
  3 direct
  4 transitive
05 How it works

Three small binaries, coordinating through Redis.

Kubernetes API pods, nodes, … OTel Collector trace spans graph-k8s structure · single writer graph-otel relationships Redis the graph graph-read REST + MCP

graph-k8s

Watches the Kubernetes API and writes structural entities plus containment / management edges. Single writer.

graph-otel

Consumes OTel trace spans and writes relationship entities plus CALLS / QUERIES / PUBLISHES / CONSUMES / EXPOSES edges.

graph-read

Serves the read / query REST API and the MCP server.

Assumptions
  • Reads only — arrca never writes to your cluster. graph-k8s uses read-only Kubernetes APIs.
  • Trace spans carry k8s.* attributes so relationships attach to the right container.
  • graph-otel consumes trace spans directly — no spanmetrics connector or metrics pipeline required.
  • Argo Rollouts and KEDA CRDs are modelled natively alongside core workloads and autoscalers — and arrca runs fine on clusters without them.
06 Quickstart

Two steps to a live graph.

1  Install the chart (bundles Redis)

install
REGISTRY=<your-registry> ./deploy.sh

2  Fan trace spans to graph-otel

otel-collector-values.yaml
exporters:
  # shard spans by trace id so every span of a
  # trace reaches the same graph-otel replica
  loadbalancing:
    routing_key: traceID
    resolver:
      dns:
        hostname: graph-otel-otlp-headless.default.svc.cluster.local
        port: 4317
    protocol:
      otlp:
        tls:
          insecure: true
service:
  pipelines:
    traces:
      exporters: [loadbalancing]
Full docs, every knob
07 Demo

See it on a real system.

coming soon

A live walkthrough on the OpenTelemetry Astronomy Shop — the graph, the patterns, and Claude navigating it end to end.

watch the repo ↗