robotmunki.commodel-selector

LLM Model Selector

Top Models — Default View

Top 8 by capability score — apply levers to filter by task, cost, context, or vendor

Apr 2026 data
1

GPT-5.4 (xhigh)

OpenAI
AA 57

OpenAI flagship — top composite score, 1.05M context, 57.7% SWE-bench Pro.

$15/M out1.05M ctx
AA 57FrontierGPQA / AIMEComputer-use
2

Gemini 3.1 Pro

Google
AA 57

Tied #1 overall — 2M context, ARC-AGI-2 leader, best all-round Google model.

$12/M out2M ctx
AA 57FrontierMultimodal2M ctx
3

GPT-5.4 Pro

OpenAI

GPT-5.4 at maximum reasoning depth — reserved for highest-stakes professional tasks.

$180/M out272K ctx
Ultra-premiumMax effort
4

Claude Opus 4.6

Anthropic
AA 53

Coding and agentic flagship — SWE-bench Verified leader, 1M context.

$25/M out1M ctx
SWE-benchAgentsCodingAA 53
5

Claude Sonnet 4.6

Anthropic
AA 52

AA 52 — leads on agentic & terminal tasks; 40% cheaper than Opus at 1M context.

$15/M out1M ctx
AA 52AgentsCodingAgentic leader
6

GLM-5

Z.ai
AA 50

New top-tier entrant from Z.ai — strong agentic engineering at mid-tier cost.

$3.2/M out200K ctx
AA 50AgenticTop 10China
7

MiniMax-M2.7

MiniMax
AA 50

AA 50 top-10 entrant — strong multimodal open-weight at 256K context; pricing TBD.

N/A256K ctxopen
AA 50MultimodalOpen weightTop 10
8

MiMo-V2-Pro

Xiaomi
AA 49

Xiaomi frontier entry — AA 49, 1M context; pricing not yet disclosed.

N/A1M ctx
AA 491M ctxChinaTop 10

Lever Reference

1
Capability

AA composite index (MMLU-Pro, SWE-bench, GPQA, ARC-AGI, AIME, context)

2
Task Fit

Peak AA score does not equal best task fit (e.g. coding)

3
Context Window

Standard ≤262K · Large 200K–1M · Massive 1M–10M

4
Cost Per Token

Budget < $1/M · Mid $1–10/M · Premium $10+/M output

5
Deployment

API-only vs open-weight self-host vs both

6
Vendor Origin

Geographic / residency-style grouping for vendor choice