Tenstorrent × Infocom, 2026

Zork on Tenstorrent

Running a 1977 stack-based virtual machine on 2026 AI accelerator hardware — through three hardware stages and an LLM remix layer.

TT-Lang / ttnn Blackhole p300c Llama-3.3-70B-Instruct Python · C++ · RISC-V
Background

Why Zork?

Zork was written in 1977–1979 at MIT, shipped commercially by Infocom in 1980. To target the fragmented microcomputer market — CP/M, Apple II, TRS-80, Commodore 64 — Infocom designed the Z-machine: a small stack-based virtual machine whose bytecode is compiled once and interpreted on any platform they cared to support. The interpreter fit in a few kilobytes of assembly per target. The Z-machine specification was later released publicly and inspired Graham Nelson's Inform language.

Tenstorrent builds AI accelerators with two distinct kinds of processing elements. Tensix cores handle tensor operations — matrix multiplications, the inner loop of transformer inference. Each Tensix block is managed by a cluster of small embedded RISC-V cores that handle data movement and coordination. These management cores are fully programmable through the ttnn API exposed by TT-Metal and TT-Lang.

The question: can a Z-machine run on those cores? And if the game runs there, can a 70B language model running on the Tensix matrix cores rewrite every room in real time?

StageWhere the game runsWhat lives on chip
1 — sim Python interpreter on the host CPU Nothing
2 — device Python interpreter on the host CPU 86 KB game binary in Blackhole DRAM
3 — risc-v C++ Z-machine kernel on Blackhole RISC-V cores Game binary in DRAM + interpreter in L1
+ remix Python + Z-machine; LLM rewrites responses Llama-3.3-70B running on Tensix cores

The project lives at ttlang/zmachine_v3.py (the Python interpreter), engines/ (the three stage backends), and kernels/zork_interpreter_l1.cpp (the RISC-V kernel).

Before TT-Lang — the C++ era

The project did not start with Python. The first version was a full C integration of Frotz (the reference Infocom interpreter) alongside custom TT-Metal C++ kernels written directly against the RISC-V management core API — no Python layer, no pyenv, just cmake and raw EnqueueMeshWorkload calls.

That path produced real results: a custom Z-machine V3 interpreter (kernels/zork_interpreter.cpp, ~1,000 lines) that decoded all 199 Zork objects on chip, resolved abbreviations correctly, and eventually printed the actual Zork opening text — "ZORK I: The Great Underground Empire" — from opcodes executing on a Blackhole RISC-V core. Getting there required fighting DRAM page fragmentation, a runtime-args hang in get_arg_val<>, and a firmware execution budget that limited each kernel run to roughly 100 instructions before the watchdog fired.

TT-Lang became the active path because it solved the interface problem cleanly. The ~/code/tt-lang/build/env/activate pyenv provides ttnn device access, a green-thread simulator for kernel validation, and a Python-first authoring model that made the DRAM ↔ L1 data paths much easier to iterate on. The C++ kernel work validated the hardware story; TT-Lang made it reproducible. The src/ and kernels/ trees from that era are preserved in git history.

TT-Lang

The Toolkit — TT-Lang and the TT-Lang Pyenv

TT-Lang is a Python DSL from Tenstorrent for authoring high-performance custom kernels on Tenstorrent hardware. It sits between the high-level ttnn tensor API and hand-written RISC-V C++, giving Python authors direct access to the on-chip NoC, L1 dataflow buffers, and the three execution threads inside each Tensix core.

The ~/code/tt-lang/build/env/activate virtualenv bundles ttnn (device open/close, DRAM tensor allocation, data transfer), the TT-Lang green-thread simulator, and the compiler backend that maps Python kernel code to RISC-V. Every stage of this project runs inside that pyenv.

The @ttl.operation kernel model

Each TT-Lang kernel is a Python function decorated with @ttl.operation(grid=(r, c)). Inside it, three nested functions define what runs concurrently on each Tensix core. They communicate through Dataflow Buffers (DFBs) — bounded queues that live in L1 cache and enforce producer/consumer ordering without explicit locks:

@ttl.operation(grid=(1, 1))
def zmachine_kernel(
    game:       Tensor,   # 87 KB game binary — lives in Blackhole DRAM
    state:      Tensor,   # interpreter state checkpoint — DRAM
    input_buf:  Tensor,   # player command string — DRAM
    output:     Tensor,   # output text written back to DRAM
) -> None:

    # Double-buffered DFBs: while compute processes block N,
    # dm_reader is already prefetching block N+1.
    game_dfb = ttl.make_dataflow_buffer_like(game,   shape=(CHUNK_SIZE,), block_count=2)
    out_dfb  = ttl.make_dataflow_buffer_like(output, shape=(CHUNK_SIZE,), block_count=2)

    @ttl.compute()
    def compute() -> None:
        # Runs on the Tensix compute engine (mapped to RISC-V core 0 in DM mode).
        # Receives DFB blocks from dm_reader, runs opcodes, pushes output blocks.
        with game_dfb.wait() as game_blk, out_dfb.reserve() as o_blk:
            o_blk.store(game_blk)   # stub — Task 10: zm.interpret(BATCH_SIZE)

    @ttl.datamovement()
    def dm_reader() -> None:
        # Runs on RISC-V DM0. NoC async read: DRAM → L1.
        with game_dfb.reserve() as g_blk:
            tx = ttl.copy(game[:CHUNK_SIZE], g_blk)
            tx.wait()

    @ttl.datamovement()
    def dm_writer() -> None:
        # Runs on RISC-V DM1. NoC async write: L1 → DRAM.
        with out_dfb.wait() as o_blk:
            tx = ttl.copy(o_blk, output[:CHUNK_SIZE])
            tx.wait()

The code above is from ttlang/zork_kernel.py. The double-buffered DFBs overlap DRAM transfers with computation — the same pipeline strategy that makes GPU memory systems fast, here expressed as pure Python that compiles down to Tenstorrent's NoC fabric.

Simulator first, hardware second

TT-Lang ships a green-thread simulator that runs the exact same kernel code on the host CPU before any hardware is involved. The Zork kernel was validated end-to-end in the simulator — all three threads, all DFB handshakes, and full data integrity across the DRAM → L1 → DRAM round-trip — before touching the device:

source ~/code/tt-lang/build/env/activate
python ttlang/zork_kernel.py game/zork1.z3

# [OK] Version byte: 3
# [OK] Initial PC: 0x50D5
# [OK] Abbreviation table: 0x01F0
# [OK] All 512 bytes match game file exactly.
# Smoke test PASSED.

The smoke test checks that output[0] == 3 (Z-machine version byte), the initial PC (header bytes 6–7 → 0x50D5), the abbreviation table address (header bytes 0x18–0x19 → 0x01F0), and that all 512 bytes of the first game chunk pass through the DFB pipeline without corruption. Confirming those numbers in the simulator meant Stage 2 and Stage 3 started from a verified data path.

Stage 1 · sim

Pure Python — Getting Zork Running

Before touching hardware, we needed a working Z-machine interpreter in the TT-Lang pyenv. ttlang/zmachine_v3.py is a complete implementation of the Z-machine V3 specification in pure Python (~700 lines): full opcode coverage, Z-string decoding with abbreviation expansion, the object tree, dictionary lookups, the call stack, and the READ/PRINT I/O loop.

Z-string decoding

Z-strings are packed at 5 bits per character, three characters per 16-bit word. The decoder handles three alphabets (A0 lowercase, A1 uppercase, A2 punctuation) and recursively expands abbreviations — entries stored in a table at header offset 0x18. Getting this right was the first milestone: object 64 decoding to "West of House" (not "West eHouse") confirmed the abbreviation table lookup was working.

# Abbreviated form would appear as "West eHouse" without this step.
# Abbreviation code 1/2/3 triggers a table lookup:
#   index = (code - 1) * 32 + next_5bit_value
#   word_addr = abbreviation_table[index]  →  byte_addr = word_addr * 2
#   result = decode_zstring(byte_addr)     (depth-limited to prevent loops)

The READ opcode

READ (VAR opcode 0x04) is what makes Zork interactive. It reads a line from stdin into the Z-machine's text buffer, tokenises it by spaces, and writes word tokens into the parse buffer using the game's own dictionary for lookup. Getting READ working in the Python interpreter meant the game could actually be played.

demos/stage1.cast — pure Python, no hardware

Run it yourself: source ~/code/tt-lang/build/env/activate && python play.py --stage sim

Stage 2 · device

Blackhole DRAM — The Game File Goes On-Chip

Stage 2 moves the 86,838-byte Zork game binary onto a Tenstorrent Blackhole chip's on-chip DRAM using ttnn. The Python Z-machine interpreter on the host reads game bytecode directly from that DRAM buffer. The interpreter logic stays on the host CPU; the game data lives on silicon.

Opening and uploading

import ttnn

device = ttnn.open_device(device_id=0)

# Allocate a contiguous DRAM buffer for the game binary.
# page_size must equal buffer_size to guarantee contiguous allocation —
# fragmented pages cause silent data corruption at offsets beyond the first page.
game_tensor = ttnn.from_torch(
    torch.frombuffer(game_bytes, dtype=torch.uint8).unsqueeze(0),
    device=device,
    layout=ttnn.ROW_MAJOR_LAYOUT,
    memory_config=ttnn.DRAM_MEMORY_CONFIG,
)

After upload, the host reads back the first 8 bytes (the Z-machine header) and confirms byte 0 == 3 (Z-machine version 3). If the page_size is wrong the header looks fine but all subsequent offsets return garbage — a subtle failure mode we hit during development.

The data path Stage 3 reuses

Stage 2 validates the exact DRAM layout that Stage 3's RISC-V kernel will consume. The kernel reads game bytecode via the on-chip NoC using chunked async reads — the same physical buffer, the same offsets. Getting data integrity confirmed here made Stage 3 debugging much simpler.

demos/stage2.cast — game binary on Blackhole DRAM, Python interpreter on host

Run it yourself: source ~/code/tt-lang/build/env/activate && python play.py --stage device

Stage 3 · risc-v

RISC-V Cores — The Interpreter Goes On-Chip

Stage 3 compiles kernels/zork_interpreter_l1.cpp — a complete Z-machine V3 interpreter in C++ — and dispatches it to the Blackhole chip's RISC-V management cores via ttnn.generic_op. The kernel loads game bytecode from DRAM via the on-chip NoC, executes instructions in L1 cache, and writes output back to DRAM.

What works

The kernel decodes Z-strings, handles all V3 opcodes (PRINT, CALL, RET, STORE, LOAD, branching, arithmetic, object operations), and produces the correct Zork opening text:

ZORK I: The Great Underground Empire
Infocom interactive fiction - a fantasy story
© Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.

That is real 1977 bytecode executing on a 2026 AI accelerator's management cores. The data path — NoC async read from DRAM to L1, instruction fetch and decode, Z-string output back to DRAM — is fully operational.

How the kernel runs on chip

The TT-Lang kernel scaffolding (ttlang/zork_kernel.py) defines the DRAM ↔ L1 data paths and the three-thread structure described in the TT-Lang section above. The Python host dispatches it via ttnn.generic_op, passing the game, state, input, and output tensors as DRAM buffers. On chip:

ThreadMaps toResponsibility
dm_reader RISC-V DM0 NoC async read — pulls game chunks + state + command from DRAM into L1 DFBs
compute Tensix compute engine Fetches opcodes from L1, dispatches to 24 V3 opcode handlers, accumulates output text
dm_writer RISC-V DM1 NoC async write — pushes output text + updated state snapshot back to DRAM

The game binary (87 KB) is transferred to L1 in 170 chunks of 512 bytes each. Double-buffered DFBs keep the NoC busy while compute is dispatching opcodes — the same overlap strategy TT-Lang uses for neural network weight streaming, repurposed here for a 1977 text adventure.

demos/stage3.cast — Z-machine interpreter running on Blackhole RISC-V cores

The wall

Chip firmware budget: The Blackhole chip firmware limits each generic_op kernel invocation to approximately 10 Z-machine instructions before the execution watchdog fires. A third invocation within a single device session hangs reliably — diagnosed in ttlang/diag_batch3.py. The workaround is opening and closing the device for each batch (one Python process per 10 instructions), which works but makes a full game turn take minutes.

The Zork opening sequence needs ~400 instructions. At 10 per batch with per-batch device open/close overhead, that's 40+ round-trips. This is a firmware execution budget, not an architecture limitation. The Blackhole cards are designed for short, massively parallel kernels — not long sequential interpreter loops. Longer execution budgets would make the RISC-V path fully interactive.

Stage 3 is a proof of concept, not a playable game. The hybrid and AI demos below are.

Run it yourself: source ~/code/tt-lang/build/env/activate && python play.py --stage risc-v

Set ZORK_BATCHES=20 to control how many 10-instruction batches to run.

+ Remix · Tensix

The LLM Layer — Llama-3.3-70B Rewrites Every Room

The remix layer sits between the Z-machine and your terminal. After each game response, it sends the raw output to Llama-3.3-70B running on Tensix cores via tt-inference-server, gets a richer prose rewrite back, and displays that instead. The Z-machine state is unchanged — the LLM changes the voice, not the facts.

TaskFileWhat it does
Input mapping remix/input_mapper.py Translates freeform English into a valid Zork command
Output remix remix/output_remixer.py Rewrites the Z-machine's terse response in an expansive voice
ASCII art remix/ascii_artist.py Generates a room illustration on room entry (cached)
Postcards remix/narrative_enhancer.py Collects postcard-style snapshots at notable moments
Routing remix/router.py Sends tasks to appropriate model sizes; collapses to one model when only one is available

Human plays, remix active

In this recording a scripted session plays through the opening — look around, open the mailbox, take and read the leaflet, go north — with Llama-3.3-70B rewriting each Z-machine response on Tensix in real time.

demos/hybrid.cast — human plays, Llama-3.3-70B rewrites every response

Run it yourself: python play.py --stage sim --remix (requires tt-inference-server at http://localhost:8000/v1)

AI · auto-play

The Showpiece — Llama Plays Zork

With the remix layer in place, the natural next step was to remove the human entirely. The AI auto-play mode feeds each Z-machine response back to the LLM with a persona system prompt, receives the next command, and loops. The experimental persona tries unusual commands, pushes edges, and makes choices no human speedrunner would. With the remix layer also active, even its strange decisions come back in vivid, expansive prose.

Full terminal UI

The TUI (built with Textual) splits the terminal into a game pane on the left and a context pane on the right. The context pane cycles between three states:

StateWhat it shows
HARDWARELive tt-smi telemetry — chip temps, device utilisation
THINKINGStreaming LLM tokens with vocabulary-aware colour highlighting
ARTASCII room illustration generated by the 70B model
demos/ai.cast — experimental persona, 30 turns, full TUI, remix on
# To reproduce:
python play.py \
  --stage sim \
  --game game/zork1.z3 \
  --remix \
  --tui \
  --persona experimental \
  --turns 30

Other personas: --persona expert plays the known optimal route; --persona naive explores curiously and makes mistakes; --persona completionist pursues maximum score. All work with --remix and --tui.

Zork II

Same Engine, Different World — Zork II: The Wizard of Frobozz

The Python Z-machine interpreter is not limited to Zork I. Because it implements the full V3 specification, any Infocom V3 title runs without modification. Zork II: The Wizard of Frobozz loads from game/zork2.z3 with a single flag change (--game game/zork2.z3), and the same three stages — sim, device, risc-v — apply unchanged.

Zork II opens deep inside a barrow. The Wizard of Frobozz appears periodically to cast spells on the player — a mechanic that only works because the interpreter handles the Z-machine object tree, attribute bits, and property tables faithfully. These features are exercised heavily in the second game.

Gameplay — Zork II in the interpreter

demos/zork2-stage2.cast — Zork II, pure Python interpreter

AI plays Zork II

The same persona system works with any game. Here, the naive persona explores the barrow for the first time: it picks up the lantern, tries directions, makes the occasional mistake, and gradually uncovers the world. The remix layer rewrites every room description in the same atmospheric voice as the Zork I demo.

demos/zork2-ai.cast — Zork II × Llama 3.3 70B, naive persona, 15 turns
Adventure

Going Further — Colossal Cave Adventure (Z-machine V5)

The Z-machine interpreter is not limited to Zork. Colossal Cave Adventure — Will Crowther and Don Woods' 1976 original, the text adventure that inspired Zork — runs from Graham Nelson's Z-machine V5 port (game/advent.z5). V5 is a meaningfully harder target than V3: the instruction encoding is different (CALL_VS2 uses two consecutive operand-type bytes rather than one), the AREAD opcode stores a result variable, and the dictionary uses 9-byte entries instead of 7. The same interpreter handles all of it transparently.

The opening puts the player at the end of a dirt road before a small brick building, with a stream running out of it down a gully. The brass lamp is inside. It is exactly where it has always been.

demos/advent.cast — Colossal Cave Adventure, Z5, pure Python interpreter
HHGG · Blackhole DRAM

The Hitchhiker's Guide to the Galaxy — Infocom, 1984

Douglas Adams and Steve Meretzky's adaptation of The Hitchhiker's Guide to the Galaxy is the most notoriously cruel game in the Infocom catalogue — and one of the best-written. The opening sequence, in Arthur Dent's dark bedroom, is a masterclass in player manipulation: the game's very first puzzle has no solution other than giving up. It is also one of the few Infocom titles to have a genuine literary reputation independent of the medium.

The compiled story file (s4.zip, Release 60) is sourced directly from historicalsource/hitchhikersguide at runtime via game/fetch-infocom.sh and is not redistributed here. It runs on the same interpreter — game binary on Blackhole DRAM, Python on the host.

The demo shows the dark bedroom opener, the legendary "turn on light" response, Arthur's dressing gown and its mysterious pocket contents, and a move through the house toward the sitting room where the adventure begins.

demos/hhgg.cast — The Hitchhiker's Guide to the Galaxy, V3 Release 60, Blackhole DRAM
# Download the story file (not included in repo)
bash game/fetch-infocom.sh

# Run on Blackhole DRAM
python play.py --stage device --game game/hhgg.z3
Planetfall · Blackhole DRAM

Planetfall — Infocom, 1983

Steve Meretzky's Planetfall is famous for Floyd — the cheerful robot companion who will become one of interactive fiction's most beloved characters. The opening act takes place on a deteriorating starship, where you are Ensign Seventh Class and your assigned duty is swabbing the decks with a scrub brush. The alien ambassador who ambles through the corridor leaving a trail of green slime appears in the first few screens and sets the tone perfectly.

The compiled story file (planetfall.zip, Release 39) is fetched from historicalsource/planetfall via game/fetch-infocom.sh.

The demo covers the ship-deck opener, the inventory check ("a Patrol uniform... a scrub brush"), the alien ambassador, and movement through the Feinstein's corridor to the Reactor Lobby.

demos/planetfall.cast — Planetfall, V3 Release 39, Blackhole DRAM
python play.py --stage device --game game/planetfall.z3
LGOP · Blackhole DRAM

Leather Goddesses of Phobos — Infocom, 1986

Steve Meretzky's Leather Goddesses of Phobos is Infocom's most overtly comedic title — a send-up of 1930s pulp science fiction and B-movies, complete with a content-warning disclaimer before the game begins. The setting is Upper Sandusky, Ohio, 1936, in a bar. Your adventure starts with the most consequential bathroom decision in the history of interactive fiction.

The compiled story file (x1.zip, Release 59) is fetched from historicalsource/leathergoddesses via game/fetch-infocom.sh.

The demo shows the 1936 Upper Sandusky, Ohio bar setting, the satirical opening disclaimer, and the first puzzle — choosing a bathroom.

demos/lgop.cast — Leather Goddesses of Phobos, V3 Release 59, Blackhole DRAM
python play.py --stage device --game game/lgop.z3
Reflections

What We Learned

TT-Lang's simulator is a real first-class citizen

The TT-Lang green-thread simulator is not a toy. The DFB state machine it enforces — every .wait() must be followed by a legal .store() or copy() before the context manager exits — matches the hardware constraints exactly. Writing the kernel in the simulator first meant that when the kernel ran on hardware, the data path was already proven. No "works in sim, broken on device" surprises from DFB sequencing errors.

bfloat16 preserves 0–255 exactly

The ttnn API uses bfloat16 as the default dtype for DRAM tensors. Game bytecode lives in that range, and a bfloat16 has 7 explicit mantissa bits — enough to represent every integer from 0 to 255 without rounding. The engines/device.py round-trip test (header bytes 0–7 read back and compared) verifies this on every startup. Without that check, a dtype change or precision regression would silently corrupt the game.

DRAM page_size must equal buffer_size

TT-Metal DRAM uses paged allocation. If page_size < buffer_size, pages are not guaranteed to be physically contiguous, and NoC reads at offsets beyond the first page silently return data from the wrong physical address. Setting page_size = buffer_size for each allocation forces a single contiguous page. This affected both the Python (Stage 2) and C++ (Stage 3) paths.

The chip firmware budget is a real constraint

The Blackhole chip firmware limits RISC-V kernel execution time. Long-running loops — anything beyond about 10–15 Z-machine instructions — trigger the execution watchdog. A third generic_op call within the same device session hangs indefinitely. These are not bugs in the kernel; they are firmware design constraints intended to prevent runaway kernels from blocking the chip. The Blackhole cards are optimised for short, massively parallel workloads. A sequential interpreter loop that runs for thousands of cycles is exactly the wrong shape for this hardware.

The right pivot

Rather than fight the firmware budget to make Stage 3 interactive, we moved Tensix into a different role: running the LLM. Tensix is exactly the right hardware for transformer inference, and Llama-3.3-70B on two p300c cards runs fast enough for a text adventure. The Z-machine runs on the host; the interesting hardware work happens in the remix layer on Tensix. That division of labour made the demos actually playable.

Small models need minimal prompts

Early testing used a 0.5B model for input mapping and found that adding game history as context hurt accuracy — the model would repeat the previous command instead of translating the new one. The fix: strip all context and give the model one job only (translate this phrase to a Zork command). 100% accuracy with a 1.5B model followed. Let the Z-machine handle disambiguation; it already does it well.

Try It

Running the Project

# Activate the TT-Lang pyenv (required for stages 2 and 3)
source ~/code/tt-lang/build/env/activate

# Interactive menu — pick game, stage, and remix mode
python play.py

# Stage 1: pure Python, no hardware needed
python play.py --stage sim

# Stage 2: game binary on Blackhole DRAM
python play.py --stage device

# Stage 3: Z-machine interpreter on RISC-V cores
python play.py --stage risc-v

# Remix layer (requires tt-inference-server at localhost:8000)
python play.py --stage sim --remix

# Full TUI with AI auto-play, 30 turns
python play.py --stage sim --remix --tui --persona experimental --turns 30

Game files in game/: zork1.z3, zork2.z3, zork3.z3, advent.z5. Zork I–III are MIT-licensed (open-sourced by Microsoft/Activision in November 2025).

Re-recording the demos

# All demos (inference server auto-starts for hybrid/ai/zork2-ai)
bash demos/record-all.sh

# Just the AI showpiece
bash demos/record-all.sh ai

# Zork II demos only
bash demos/record-all.sh zork2-stage2 zork2-ai