tensix-viz renders Tenstorrent hardware — from a single Tensix core up to a Galaxy SuperCluster — in a zero-dependency Canvas file. Ten animation modes make each workload visible on the chip.
Used in QB2 welcome pages, VS Code lessons, and standalone explainers.
Every canvas maps directly to real Tenstorrent silicon. Understanding the grid makes the animations meaningful.
The bright cells. 120 on Blackhole (P100/P150/P300c), 64 on Wormhole (N150/N300). Each Tensix runs five RISC-V CPUs plus a matrix engine (the FPU tile that does 8×8 matmuls). These are the cores that get hot during model inference — and what the animations are showing.
Top and bottom rows are DRAM controllers. Blackhole has 8× 16 GB DDR6 banks — 128 GB total, 2 TB/s bandwidth. DRAM bandwidth (not compute) is the bottleneck for batch=1 LLM decode. The visualizer shows them dimmer because they're not "compute" in the animation sense.
Left and right columns (and some edges) are Ethernet links. 2× 400G per side on Blackhole, used for the 2D torus NoC that connects chips within a card, cards within a server, and servers within a Galaxy cluster. The inter-chip ring animations in CardViz trace these links.
A single column reserved for the host PCIe interface. This is the gap you see splitting the compute grid on Blackhole — col 8 is not a Tensix core. Models run entirely on-chip once loaded; PCIe is only active during weight transfer.
These animations are not hardware telemetry.
Real Tenstorrent workloads are dominated by matrix multiplications. Those keep most Tensix cores busy simultaneously — a uniform bright grid — regardless of whether the model is a language model, a diffusion model, or a custom kernel. The patterns below are chosen to be visually distinct and conceptually evocative, not physically accurate. Modes marked grounded are the closest to what tt-toplike would actually show. For true per-core telemetry, pipe tt-smi heat data into the heatmap step type.
Quiet random shimmer at 20–35%. Background system activity.
hardware ARC management firmware, DDR refresh cycles, thermal monitoring. Compute cores are mostly clock-gated; nothing is scheduled.
Left-to-right column sweep. Sequential token generation.
hardware Batch=1 decode is memory-bandwidth-bound, not compute-bound. Matmul tiles distribute across the full mesh simultaneously. The sweep is an abstraction of autoregressive token dependency — not a physical firing sequence.
Expanding ring from chip center. Denoising timestep propagation.
hardware FLUX.1 and SD3 use DiT (Diffusion Transformer) architectures. Each denoising step is a standard transformer forward pass — attention + FFN matmuls across all cores. Physically identical to the inference pattern.
Random burst clusters. Async tool-call dispatch and response.
hardware
Tool execution phases are compute-bound; all 120 cores are available to every task. The "clusters" represent logical agent roles — not physical core groupings. The actual hardware view looks closer to inference.
Phase-offset sinusoidal wave. Particle Life physics field propagation.
hardware Custom RISC-V kernels on Metalium do distribute work spatially across the mesh. For Particle Life specifically, compute regions roughly correspond to particle neighbourhoods — this mode is the closest to spatial truth for custom kernel workloads.
Sustained full-grid glow with slow oscillation. Extended reasoning / chain-of-thought.
hardware The most physically accurate of all ten modes. Long chain-of-thought inference (DeepSeek R1, QwQ-32B) is sustained high matmul utilisation across all Tensix cores with minimal idle time between tokens — essentially what the glow shows.
Wide bright band sweeping the full grid. All prompt tokens processed in parallel.
hardware Prefill is genuinely compute-bound and uses all cores simultaneously — unlike decode, which is memory-bound. The wide band is a reasonable approximation of what tt-toplike shows during a long-context prompt ingestion burst.
Two phase-offset expanding rings. Denoising across consecutive video frames.
hardware
Wan 2.2 and SkyReels use 3D DiT models — each frame step is a full transformer forward pass, same as thinking in practice. The second ring is a visual hint at temporal sequence; it has no physical correlate.
Three concurrent column sweeps. Parallel batched decode for multiple sequences.
hardware Batched inference does run multiple sequences through the same compute simultaneously, and larger batches improve compute efficiency. The concurrent sweeps are a reasonable visual simplification of that parallelism.
Rectangle of cores lights up via ripple from dispatch origin. Metalium kernel launch via NOC multicast.
hardware Metalium dispatches kernels to rectangular Tensix core grids via NOC multicast. Multiple kernels can run concurrently on disjoint grids. The ripple origin corresponds to the dispatching core.
Pass showMemory: true to add a DRAM bandwidth glow around memory rows and per-core L1 fill bars on every Tensix cell. Drive both values from real tt-smi telemetry or let the mode preset animate them.
Blackhole · batch mode · showMemory: true
The overlay composites on top of any animation mode. DRAM rows pulse brighter as bandwidth rises; each Tensix cell grows a slim bar tracking L1 SRAM utilization. Values animate smoothly between updates and can be set independently — only the keys you provide are overridden.
const viz = new TensixViz(canvas, {
arch: 'blackhole',
showMemory: true // enable DRAM glow + L1 bars
})
viz.activate('batch')
// override with live tt-smi data
viz.setMemoryStats({
dram_bw: 0.82, // 0–1 (1.0 ≈ 900 GB/s peak on BH)
l1_fill: 0.61 // 0–1 (fraction of 108 KB L1 in use)
})
// partial override — only dram_bw changes; l1_fill stays on preset
viz.setMemoryStats({ dram_bw: 0.95 })
Works on all architectures and all ten animation modes. Values are cleared when activate() is called and can be re-applied immediately after.
Drop in the IIFE bundle and start any mode with a single call. No build step, no dependencies.
// Script tag
<link rel="stylesheet" href="tensix-viz.css">
<script src="tensix-viz.js"></script>
// Single Blackhole chip — any mode
const viz = new TensixViz(canvas, { arch: 'blackhole' })
viz.activate('thinking') // or 'prefill', 'batch', 'inference' …
// Memory visualization layer (opt-in)
const memViz = new TensixViz(canvas, { arch: 'blackhole', showMemory: true })
memViz.activate('inference')
memViz.setMemoryStats({ dram_bw: 0.75, l1_fill: 0.60 }) // override with live data
// Full QB2 system (4 chips, intra/inter-card ETH links)
const sys = new SystemViz(container, 'qb2')
sys.activate('batch')
// HTML auto-init — no JS required
<canvas data-viz="chip" data-config="blackhole" data-mode="prefill"
width="340" height="240"></canvas>
| Mode | Represents | Accuracy |
|---|---|---|
idle | Background system activity | metaphor |
inference | Sequential token generation | metaphor |
diffusion | Image denoising timestep (DiT) | metaphor |
agents | Async tool-call dispatch | metaphor |
explore | Particle Life physics field | metaphor — closest for Metalium |
thinking | Extended reasoning / chain-of-thought | ◆ grounded |
prefill | Parallel prompt ingestion | ◆ grounded |
video | Temporal video frame denoising (3D DiT) | metaphor |
batch | Batched parallel decode | ◆ grounded |
kernel_dispatch | Metalium kernel launch via NOC multicast | ◆ grounded |