robotmunki.commodel-selector

LLM Model Selector

Top Models — Default View

Top 8 by capability score — apply levers to filter by task, cost, context, or vendor

Jul 2026 data
1

Claude Fable 5

Anthropic
AA 60

AA v4.1 peak (60) — Mythos-class with safety classifiers; access suspended June 12, staged return.

$50/M out1M ctx
AA 60Mythos-classLimited access
2

Claude Mythos 5

Anthropic
AA 60

Same weights as Fable 5 without cyber/bio classifiers — Project Glasswing partners only.

$50/M out1M ctx
AA 60GlasswingRestricted
3

GPT-5.6 Sol

OpenAI

June 26 preview — gated to government-vetted partners; broader GA expected mid-July.

$30/M out1.05M ctx
PreviewGatedNext-gen
4

Claude Opus 4.8

Anthropic
AA 56

Best generally available AA model — SWE-Bench Pro 69.2%; flat $5/$25 vs Opus 4.6.

$25/M out1M ctx
AA 56SWE-Bench ProAgentsAA v4.1 leader (available)
5

GPT-5.5

OpenAI
AA 55

OpenAI flagship (Apr 2026) — AA 55 xhigh; Terminal-Bench 2.1 leader among closed models; GPT-5.6 Sol preview gated.

$30/M out1.05M ctx
AA 55FrontierTerminal-Benchxhigh
6

Claude Sonnet 5

Anthropic
AA 53

Launched June 30 — default Free/Pro model; Terminal-Bench 2.1 80.4%; intro $2/$10 through Aug 31.

$10/M out1M ctx
AA 53NewAgentsDefault
7

GPT-5.4 Pro

OpenAI

GPT-5.4 at maximum reasoning depth — reserved for highest-stakes professional tasks.

$180/M out272K ctx
Ultra-premiumMax effort
8

GPT-5.4 (xhigh)

OpenAI
AA 51

Superseded by GPT-5.5 — still strong at $15/M output vs $30/M for 5.5.

$15/M out1.05M ctx
AA 51Previous flagshipGPQA / AIME

Lever Reference

1
Capability

AA composite index (MMLU-Pro, SWE-bench, GPQA, ARC-AGI, AIME, context)

2
Task Fit

Peak AA score does not equal best task fit (e.g. coding)

3
Context Window

Standard ≤262K · Large 200K–1M · Massive 1M–10M

4
Cost Per Token

Budget < $1/M · Mid $1–10/M · Premium $10+/M output

5
Deployment

API-only vs open-weight self-host vs both

6
Vendor Origin

Geographic / residency-style grouping for vendor choice