tensix-viz — Tenstorrent hardware visualizer

Architecture

What are we looking at?

Every canvas maps directly to real Tenstorrent silicon. Understanding the grid makes the animations meaningful.

Tensix cores
The bright cells. 120 on Blackhole (P100/P150/P300c), 64 on Wormhole (N150/N300). Each Tensix runs five RISC-V CPUs plus a matrix engine (the FPU tile that does 8×8 matmuls). These are the cores that get hot during model inference — and what the animations are showing.
DRAM rows
Top and bottom rows are DRAM controllers. Blackhole has 8× 16 GB DDR6 banks — 128 GB total, 2 TB/s bandwidth. DRAM bandwidth (not compute) is the bottleneck for batch=1 LLM decode. The visualizer shows them dimmer because they're not "compute" in the animation sense.
ETH columns
Left and right columns (and some edges) are Ethernet links. 2× 400G per side on Blackhole, used for the 2D torus NoC that connects chips within a card, cards within a server, and servers within a Galaxy cluster. The inter-chip ring animations in CardViz trace these links.
PCIe bridge (col 8)
A single column reserved for the host PCIe interface. This is the gap you see splitting the compute grid on Blackhole — col 8 is not a Tensix core. Models run entirely on-chip once loaded; PCIe is only active during weight transfer.

Animation

Ten modes. Visual metaphors, not telemetry.

These animations are not hardware telemetry. Real Tenstorrent workloads are dominated by matrix multiplications. Those keep most Tensix cores busy simultaneously — a uniform bright grid — regardless of whether the model is a language model, a diffusion model, or a custom kernel. The patterns below are chosen to be visually distinct and conceptually evocative, not physically accurate. Modes marked grounded are the closest to what tt-toplike would actually show. For true per-core telemetry, pipe tt-smi heat data into the heatmap step type.

metaphor

idle

Quiet random shimmer at 20–35%. Background system activity.

hardware ARC management firmware, DDR refresh cycles, thermal monitoring. Compute cores are mostly clock-gated; nothing is scheduled.

metaphor

inference

Left-to-right column sweep. Sequential token generation.

hardware Batch=1 decode is memory-bandwidth-bound, not compute-bound. Matmul tiles distribute across the full mesh simultaneously. The sweep is an abstraction of autoregressive token dependency — not a physical firing sequence.

metaphor

diffusion

Expanding ring from chip center. Denoising timestep propagation.

hardware FLUX.1 and SD3 use DiT (Diffusion Transformer) architectures. Each denoising step is a standard transformer forward pass — attention + FFN matmuls across all cores. Physically identical to the inference pattern.

metaphor

agents

Random burst clusters. Async tool-call dispatch and response.

hardware Tool execution phases are compute-bound; all 120 cores are available to every task. The "clusters" represent logical agent roles — not physical core groupings. The actual hardware view looks closer to inference.

metaphor

explore

Phase-offset sinusoidal wave. Particle Life physics field propagation.

hardware Custom RISC-V kernels on Metalium do distribute work spatially across the mesh. For Particle Life specifically, compute regions roughly correspond to particle neighbourhoods — this mode is the closest to spatial truth for custom kernel workloads.

grounded

thinking

Sustained full-grid glow with slow oscillation. Extended reasoning / chain-of-thought.

hardware The most physically accurate of all ten modes. Long chain-of-thought inference (DeepSeek R1, QwQ-32B) is sustained high matmul utilisation across all Tensix cores with minimal idle time between tokens — essentially what the glow shows.

grounded

prefill

Wide bright band sweeping the full grid. All prompt tokens processed in parallel.

hardware Prefill is genuinely compute-bound and uses all cores simultaneously — unlike decode, which is memory-bound. The wide band is a reasonable approximation of what tt-toplike shows during a long-context prompt ingestion burst.

metaphor

video

Two phase-offset expanding rings. Denoising across consecutive video frames.

hardware Wan 2.2 and SkyReels use 3D DiT models — each frame step is a full transformer forward pass, same as thinking in practice. The second ring is a visual hint at temporal sequence; it has no physical correlate.

grounded

batch

Three concurrent column sweeps. Parallel batched decode for multiple sequences.

hardware Batched inference does run multiple sequences through the same compute simultaneously, and larger batches improve compute efficiency. The concurrent sweeps are a reasonable visual simplification of that parallelism.

grounded

kernel_dispatch

Rectangle of cores lights up via ripple from dispatch origin. Metalium kernel launch via NOC multicast.

hardware Metalium dispatches kernels to rectangular Tensix core grids via NOC multicast. Multiple kernels can run concurrently on disjoint grids. The ripple origin corresponds to the dispatching core.

Memory

DRAM activity & L1 fill overlay

Pass showMemory: true to add a DRAM bandwidth glow around memory rows and per-core L1 fill bars on every Tensix cell. Drive both values from real tt-smi telemetry or let the mode preset animate them.

Blackhole · batch mode · showMemory: true

The overlay composites on top of any animation mode. DRAM rows pulse brighter as bandwidth rises; each Tensix cell grows a slim bar tracking L1 SRAM utilization. Values animate smoothly between updates and can be set independently — only the keys you provide are overridden.

const viz = new TensixViz(canvas, {
  arch: 'blackhole',
  showMemory: true       // enable DRAM glow + L1 bars
})
viz.activate('batch')

// override with live tt-smi data
viz.setMemoryStats({
  dram_bw: 0.82,  // 0–1  (1.0 ≈ 900 GB/s peak on BH)
  l1_fill: 0.61   // 0–1  (fraction of 108 KB L1 in use)
})

// partial override — only dram_bw changes; l1_fill stays on preset
viz.setMemoryStats({ dram_bw: 0.95 })

Works on all architectures and all ten animation modes. Values are cleared when activate() is called and can be re-applied immediately after.

Integration

Quick start

Drop in the IIFE bundle and start any mode with a single call. No build step, no dependencies.

// Script tag
<link rel="stylesheet" href="tensix-viz.css">
<script src="tensix-viz.js"></script>

// Single Blackhole chip — any mode
const viz = new TensixViz(canvas, { arch: 'blackhole' })
viz.activate('thinking')  // or 'prefill', 'batch', 'inference' …

// Memory visualization layer (opt-in)
const memViz = new TensixViz(canvas, { arch: 'blackhole', showMemory: true })
memViz.activate('inference')
memViz.setMemoryStats({ dram_bw: 0.75, l1_fill: 0.60 })  // override with live data

// Full QB2 system (4 chips, intra/inter-card ETH links)
const sys = new SystemViz(container, 'qb2')
sys.activate('batch')

// HTML auto-init — no JS required
<canvas data-viz="chip" data-config="blackhole" data-mode="prefill"
        width="340" height="240"></canvas>

All modes

Mode	Represents	Accuracy
`idle`	Background system activity	metaphor
`inference`	Sequential token generation	metaphor
`diffusion`	Image denoising timestep (DiT)	metaphor
`agents`	Async tool-call dispatch	metaphor
`explore`	Particle Life physics field	metaphor — closest for Metalium
`thinking`	Extended reasoning / chain-of-thought	◆ grounded
`prefill`	Parallel prompt ingestion	◆ grounded
`video`	Temporal video frame denoising (3D DiT)	metaphor
`batch`	Batched parallel decode	◆ grounded
`kernel_dispatch`	Metalium kernel launch via NOC multicast	◆ grounded

Visualize the silicon.

What are we looking at?

Ten modes. Visual metaphors, not telemetry.

DRAM activity & L1 fill overlay

Quick start

All modes