Live demo · Open source

Usability Engine — an audit catalog you can run.

Nielsen's 10 rewritten for modern surfaces. Plus two extensions for AI agents. Each heuristic carries its own audit question, its own LLM prompt, its own interactive demo.

Period
2026 · live on this site
Role
Designer · Engineer
Tags
UX researchHeuristic evaluationLocal LLMOpen source

Most usability writing online — Nielsen's 10, Norman doors, the WCAG checklist — gives you the principle but not the audit. You read the heuristic. You nod. You close the tab. The product you were going to fix is still broken.

The Usability Engine is the catalog as engine. Twelve heuristics — Nielsen's 10 rewritten in the vocabulary of modern product surfaces, plus two extensions for AI: Uncertainty must be legible and Reversibility is the policy axis. Each row carries its audit question, its fix, its automation spec, and where it makes sense, an interactive good-vs-bad demo.

Type a URL in. The engine pages each heuristic, marks a verdict, reports back. Heuristics a script can answer get a script. Ones that need judgment route through your local Ollama. Ones that need a human reading a system diagram stay manual and say so. Nothing is faked. The static site never touches the cloud.

heuristics.ts · 12 rows · one renderer

The catalog is the spec.

One row of data per heuristic — story, severity, audit question, fix, checkability, automation spec, optional demo key. The engine handles surface filtering, demo lookup, verdict aggregation, and report generation. Add a row, the engine picks it up.

Static exportNo backendLocal Ollama, opt-inApache 2.0

What's in the catalog

Twelve heuristics. Ten are Nielsen's, rewritten in the vocabulary of modern product surfaces — gone is "Help and documentation" as a placid footer link; in is "help that arrives where the user is stuck." Two are mine: an AI confidence axis and an agent reversibility axis. Severity is opinionated — blockers are the ones I will not ship past.

#HeuristicSeverityCheckabilityNote
01Visibility of system statusblockerhybridSilence is the most expensive UX bug.
02Match the user's world, not the system'smajorllmJargon audit — replace anything 2+ users define differently.
03User control & freedomblockerhybridSoft-delete with snackbar > confirmation modal.
04Consistency & standardsmajorscriptSame word, same icon, same action — everywhere.
05Error preventionblockerhybridMake the wrong state unreachable, not just recoverable.
06Recognition over recallmajorllmShow options. Don't make people remember them.
07Flexibility & efficiencymajorscriptPower users deserve shortcuts; novices shouldn't see them.
08Aesthetic & minimalist designmajorllmEvery element competes for attention.
09Recognize, diagnose, recoverblockerllmErrors must say what, why, and what to do next.
10Help & documentationmajorllmHelp that arrives where the user is stuck, not buried in a menu.
11Uncertainty must be legibleblockerllmNew. Every AI claim shows its confidence.
12Reversibility is the policy axisblockermanualNew. Map each agent action to its recovery cost.

Checkability tiers — what an audit can honestly automate

"Run an audit" is a vague verb. Some heuristics reduce to a regex on the DOM. Some are entirely judgment calls a script can never resolve. Each row in the catalog declares which it is — so the engine never pretends to answer a question it can't.

  • Script

    A deterministic check on the DOM, accessibility tree, or rendered text. No model needed; the answer is yes/no.

    e.g. Find every interactive element on the page and verify it has a visible focus ring.

  • LLM

    A prompt against the visible content. The model evaluates judgment-shaped questions a script can't reduce to a regex.

    e.g. Read every error message on the page and judge whether it tells the user what went wrong, why, and what to do next.

  • Hybrid

    Script enumerates candidates, LLM evaluates them. The split is mechanical: scripts find the elements, models judge the quality.

    e.g. Script lists every destructive button; LLM follows each click and rates whether recovery is visible without a modal.

  • Manual

    The judgment requires reading the system architecture or the user model. Pattern detection isn't enough; this is design review.

    e.g. Mapping every agentic action to its recovery cost — needs the actual blast-radius diagram and the approval-authority model.

Two heuristics that aren't Nielsen's

Nielsen's 10 were written in 1994 for desktop GUIs. They still hold. They don't cover what generative interfaces broke. These two are the additions I argue for — both rated blocker, both shipped in the catalog.

11 · Blocker

Uncertainty must be legible.

Generative interfaces output confident prose regardless of how much they actually know. Without a visible confidence signal, the user has no way to weight the output — and repeated overconfidence erodes trust in the whole system.

Fix: a calibrated vocabulary — Confident, Likely, Unsure, Low. Reserve raw percentages for power users who hover. Show the basis for every confident claim.

12 · Blocker

Reversibility is the policy axis.

"Safety" is too vague to design around. Recovery cost is the lever: how quickly, how completely, and at what cognitive cost can the user undo the agent's action?

Fix: a reversibility chip in the agent UX. Cheap to undo → run autonomously. Expensive to undo → present for human approval first. The recovery path is part of the design, not an afterthought.

The live audit mode

The engine has two modes. Manifesto mode is what most visitors see — twelve numbered heuristics, each with its story, an interactive demo where one is registered, and a self-audit question with a tap-to-reveal fix.

Audit mode takes a URL. The engine pages through every applicable heuristic, asks the user to mark Pass / Fail / N/A, and assembles a report with the per-heuristic verdicts and a severity-weighted tally. For heuristics with an LLM prompt, the prompt is right there in the interface — copy it, paste it into Ollama with the page text, get an answer.

What it doesn't do: pretend to be an autonomous crawler. The site is a static export — there is no backend, no headless browser, no cloud call. The audit is human-in-the-loop on purpose. The engine's job is to make the audit cheap and well-organised, not to fake it.

Design moves I'm proud of

  • LESSON 01

    Checkability as a first-class field, not a footnote.

    Every row declares whether a script, an LLM, a hybrid, or a human eye answers its question. The engine renders the tier on the card; the report respects it. No claim is made that a manual heuristic was 'automated.'

  • LESSON 02

    Severity is opinionated, not democratic.

    Five of twelve are blockers — including both AI extensions. I'd rather over-call severity than ship a checklist where everything reads as major and nothing as a stop-the-line. A blocker says: I won't help you launch past this.

  • LESSON 03

    Demos pair a good and a bad version side by side.

    The good-vs-bad pattern is the smallest reproducible UX experiment. One artifact teaches the principle better than a paragraph of prose can. Where a heuristic has one (visibility-of-status, undo, error-prevention, recognition), that demo is the focal point of the card.

  • LESSON 04

    Ollama is opt-in, never default.

    The LLM prompts are part of the spec, not a hosted feature. Anyone with Ollama can run them on their own machine; the static site never makes a network call. The product is the catalog and the engine — the model is whichever one you brought.

  • LESSON 05

    Two new heuristics in a place that respects Nielsen.

    Adding to a canon is delicate work. The 12 are presented in number order — Nielsen's 10 keep their original numbering; the AI extensions land at 11 and 12 with their own claim. The reader can audit the lineage without being asked to take the additions on faith.

A heuristic without an audit is a poster. The Usability Engine's bet is that every principle worth writing down deserves the verb that proves it — the question you can answer, the fix you can ship, the check that catches you when you don't.

Open the live engine →
Upcoming