ainativeui

01Fidelity scores

How three frontier models render the same canvas.

Opus100.0%
Sonnet100.0%
Haiku97.9%

Three models, three price points, all near-ceiling. The Haiku gap is recipe-choice variance on one prompt — reasonable model behavior, not a framework failure.

02Per-prompt breakdown

Every prompt, every dimension.

PromptOpusSonnetHaiku
I'm bored.
boredom
100.0
100.0
100.0
Help me focus for the next hour.
help_focus
100.0
100.0
100.0
Set a 5-minute timer.
set_timer_5min
100.0
100.0
100.0
Let's play tic-tac-toe, you go first.
play_tic_tac_toe
100.0
100.0
100.0
Show me my Q3 dashboard.
show_q3_dashboard
100.0
100.0
100.0
Help me plan tomorrow with three priorities and a date picker.
plan_tomorrow_form
100.0
100.0
100.0
Show my expenses last month with category filters.
chip_tap_filter
100.0
100.0
97.4
Explain set theory with examples.
explain_set_theory
100.0
100.0
81.8
Teach me Python loops with an interactive example.
explain_python_loops
100.0
100.0
100.0
I have an hour. What should I do?
i_have_an_hour
100.0
100.0
100.0

03Methodology

A deterministic rubric.

We score each rendered tree against five dimensions, then take the weighted mean. No LLM judge — every score is reproducible by re-running the scorer on the same recording.

The corpus is 12 prompts spanning recipes, widgets, and freeform layouts. Eligible models pass when overall ≥ 95%.

40

Wired

Every interactive node carries an action. Orphan taps are impossible.

25

Coherent

The layout regularizer didn't have to fix anything.

15

On-style

Heuristic match against the prompt's StyleBrief prefer/avoid items.

15

On-shape

Top-level node matches the expected contract; minimum interactive count met.

5

Interactivity

The scripted user interaction succeeds against the rendered tree.

04Score your model

Two commands. Same results.

1 — Record responses

swift run ainativeui eval-record \
  --provider anthropic \
  --model claude-opus-4-7 \
  --corpus Tests/Eval/Fidelity/corpus.json \
  --output recordings/opus-$(date +%Y-%m-%d).json

2 — Score

swift run ainativeui eval \
  --corpus Tests/Eval/Fidelity/corpus.json \
  --responses recordings/opus-2026-05-07.json

Recording costs (May 2026): Opus ~$3–7 · Sonnet ~$1–3 · Haiku ~$0.30–1 · On-device $0