nf-infra-diagram: Infra Diagrams as Code With Official AWS Icons

Post #7 in the Claude Code Toolkit series. Earlier posts: nf-agents, nf-git-workflow, nf-memory, nf-cc-sync, nf-ignore, nf-direnv. This post is about the skill that draws architecture diagrams, or more precisely, the skill that stops Claude from drawing them by hand.

Every project eventually needs an infrastructure diagram. A new teammate asks how traffic flows, a client wants an architecture slide, or you simply need to see the thing you built. Ask an LLM to produce one and you get one of two outcomes: a hand-drawn SVG where the “AWS icons” are orange rectangles that fool nobody, or a Mermaid flowchart that communicates topology about as well as a grocery list.

There is a third option that has existed for years: mingrammer’s diagrams library. You describe the topology in Python, Graphviz lays it out, and the icons come from the official AWS, GCP, Azure, and OSS icon sets bundled with the package. The output looks like an architect made it, because the icons are the same ones the architect would have dragged out of draw.io.

The library is great. Using it cold from an LLM session is not. nf-infra-diagram packages the setup, the class-name lookup, and the render invocation so none of the sharp edges get rediscovered per session.

What the skill does

Runs scripts/ensure-env.sh: Graphviz, a skill-local .venv, and the diagrams package. Idempotent, prints ready when everything is in place.
Reads the project’s architecture docs before drawing (docs/architecture.md, infra state files). If no doc exists, it asks. It never invents topology.
Resolves every technology to a verified import, either from references/node-catalog.md or via scripts/find-node.sh <keyword>. Guessed class names are banned.
Writes one .py file per diagram, following fixed conventions: show=False, direction="LR", Cluster for VPC/AZ/tier grouping, Edge(label=":port") for protocol annotations.
Renders via scripts/render.sh <file.py> [outdir].
Reads the rendered PNG back and iterates if the layout is crowded or mislabeled.

How to invoke

/nf-infra-diagram <what to diagram> [output dir]

For example: /nf-infra-diagram our AWS stack docs/diagrams.

A walkthrough

A repo with a docs/architecture.md describing a small production stack: CloudFront in front of S3, an API tier behind a load balancer, Kafka feeding workers, RDS plus an analytics Postgres.

$ /nf-infra-diagram production stack docs/diagrams

ensure-env.sh:
  python 3.11.8 - diagrams 0.25.1 - dot: graphviz version 12.2.1
  ready

Reading docs/architecture.md... topology found.

Node lookup:
  CloudFront  -> from diagrams.aws.network import CloudFront
  ELB         -> from diagrams.aws.network import ELB
  Kafka       -> from diagrams.onprem.queue import Kafka
  RDS         -> from diagrams.aws.database import RDS
  PostgreSQL  -> from diagrams.onprem.database import Postgresql

Wrote docs/diagrams/production-stack.py
render.sh: rendered into: docs/diagrams
  production-stack.png

Claude then reads production-stack.png back, checks that clusters nest correctly and edge labels are readable, and fixes the .py if not. The .py and the PNG get committed together, so six months from now the diagram regenerates from code instead of rotting in a wiki.

Decision 1: the Python file is the diagram

The skill’s hardest rule is a prohibition: NEVER hand-draw SVG for an infra diagram. Claude is genuinely good at hand-drawing SVG, which is exactly the problem. A hand-drawn diagram looks 90 percent right, and the missing 10 percent is the icons. Hand-approximated AWS icons read as wrong to anyone who has seen the real ones, and the real ones are the entire reason people want “a proper architecture diagram” in the first place.

Diagram-as-code also fixes the second chronic failure of infra diagrams: drift. A PNG in a wiki is dead on arrival; nobody re-draws it after the third infra change. A .py file in the repo is version-controlled, diffable, and regenerable. When the topology changes, the diff shows exactly which nodes and edges moved, and the render is one command.

The skill treats the rendered image as a build artifact and the Python file as the source. Both get committed, but only one of them is the truth.

Decision 2: an import-verified catalog instead of model memory

The diagrams library has a class-naming convention that no model reliably remembers, because it is inconsistent in precisely the way that looks like a typo:

Cloudwatch, not CloudWatch
Postgresql, not PostgreSQL
Mssql, not MSSQL
GithubActions, not GitHubActions

A guessed import does not fail softly. It raises ImportError at render time, and an LLM that guessed wrong once will often “correct” the casing to another wrong variant. The skill bans guessing entirely and provides two alternatives.

First, references/node-catalog.md: a table of import lines for the common AWS and OSS stack, every row import-verified against the installed package version. Second, scripts/find-node.sh <keyword>: a script that walks every module in the installed diagrams package with pkgutil, matches the keyword against module paths and class names, and prints import-ready lines.

$ scripts/find-node.sh kafka
from diagrams.aws.analytics import ManagedStreamingForKafka
from diagrams.onprem.queue import Kafka

The script has one more job: it exits with code 2 when nothing matches. That exit code is a signal, not an error. It means there is no built-in icon for this technology, and the skill should either substitute the nearest built-in (TimescaleDB renders as Postgresql, because it is Postgres) or fall back to diagrams.custom.Custom with a downloaded logo PNG.

Decision 3: the render wrapper owns the working directory

The diagrams library writes its output relative to the process CWD, not relative to the .py file. Run a diagram file from the repo root and the PNG lands in the repo root, regardless of where the .py lives. This is documented library behavior and it bites every single first-time user.

scripts/render.sh absorbs the trap: it resolves the .py to an absolute path, cds into the output directory, and runs the file through the skill venv from there.

cd "$outdir"
"$PY" "$src"

The same CWD rule applies to Custom node icons, in the other direction: icon paths resolve at render time against the render CWD. A relative icon path that works when you test from one directory silently breaks from another. The skill’s rule is absolute paths for Custom icons, always, or place the icon inside the output directory.

Decision 4: topology comes from docs, never from the model

An LLM asked to “diagram the infrastructure” without a source of truth will produce something plausible. Plausible is the failure mode. A diagram that is 80 percent right is worse than no diagram, because the 20 percent that is wrong looks exactly as authoritative as the rest.

So the skill’s step 2 is a gate: read the project’s architecture docs before writing any diagram code. If there is no doc, or the doc is stale, ask the user instead of improvising. The diagram renders what the docs say, and disagreements between docs and reality surface as questions, not as confident wrong pictures.

Gotchas

show=False is mandatory. Without it the library tries to open the rendered image in a viewer, which goes badly in a headless tool call.
filename= takes no extension. filename="stack" plus outformat="png", or a list ["png", "svg"] to render both.
Graphviz is a system dependency. The setup script installs it via Homebrew on macOS; on Linux it tells you the package manager command instead of guessing.
The venv is skill-local. ensure-env.sh creates .venv inside the skill folder, preferring uv when present. No project ever gains a Python dependency because you drew a diagram of it.
Wrong tool for flowcharts. Sequence diagrams, swimlanes, and ERDs are not infra topology. The skill’s description says so explicitly, so Claude routes those to a flowchart tool instead of forcing everything into Graphviz.

What you should keep even if you never use this skill

Diagram-as-code beats hand-drawn for anything with official icons. The mingrammer diagrams library is the rare tool where the lazy path and the professional-looking path are the same path.
Verify imports against the installed package, not against memory. The pkgutil walk in find-node.sh is twenty lines and generalizes to any Python library with a large, inconsistently-named class surface.
Wrap render commands that depend on CWD. Any tool that writes output relative to the working directory deserves a wrapper script that pins the CWD, because “works from my directory” is not reproducibility.
Make the no-icon case explicit. Exit code 2 for “no match, use the fallback” turns a silent quality degradation into a visible branch in the workflow.

Get the skill

White-labeled in the claude-skills-toolkit repo alongside the rest of this series.

git clone https://github.com/llawliet11/claude-skills-toolkit.git
cp -r claude-skills-toolkit/nf-infra-diagram ~/.claude/skills/

Folder reference: claude-skills-toolkit/nf-infra-diagram.

Verify in a new session: /nf-infra-diagram plus a sentence about what to draw. The first run installs Graphviz and the venv; after that, the distance from “we need an architecture diagram” to a PNG with real AWS icons is one prompt.

The useful part is not the rendering. It is that the diagram became a file in the repo that diffs, regenerates, and never has to be re-drawn by a human who has better things to do.