LLM Model Selector | RobotMunki

Top Models — Default View

Top 8 by capability score — apply levers to filter by task, cost, context, or vendor

Apr 2026 data

GPT-5.4 (xhigh)

OpenAI

AA 57

OpenAI flagship — top composite score, 1.05M context, 57.7% SWE-bench Pro.

$15/M out1.05M ctx

AA 57FrontierGPQA / AIMEComputer-use

Gemini 3.1 Pro

Google

AA 57

Tied #1 overall — 2M context, ARC-AGI-2 leader, best all-round Google model.

$12/M out2M ctx

AA 57FrontierMultimodal2M ctx

GPT-5.4 Pro

OpenAI

—

GPT-5.4 at maximum reasoning depth — reserved for highest-stakes professional tasks.

$180/M out272K ctx

Ultra-premiumMax effort

Claude Opus 4.6

Anthropic

AA 53

Coding and agentic flagship — SWE-bench Verified leader, 1M context.

$25/M out1M ctx

SWE-benchAgentsCodingAA 53

Claude Sonnet 4.6

Anthropic

AA 52

AA 52 — leads on agentic & terminal tasks; 40% cheaper than Opus at 1M context.

$15/M out1M ctx

AA 52AgentsCodingAgentic leader

GLM-5

Z.ai

AA 50

New top-tier entrant from Z.ai — strong agentic engineering at mid-tier cost.

$3.2/M out200K ctx

AA 50AgenticTop 10China

MiniMax-M2.7

MiniMax

AA 50

AA 50 top-10 entrant — strong multimodal open-weight at 256K context; pricing TBD.

N/A256K ctxopen

AA 50MultimodalOpen weightTop 10

MiMo-V2-Pro

Xiaomi

AA 49

Xiaomi frontier entry — AA 49, 1M context; pricing not yet disclosed.

N/A1M ctx

AA 491M ctxChinaTop 10

Lever Reference

Capability

AA composite index (MMLU-Pro, SWE-bench, GPQA, ARC-AGI, AIME, context)

Task Fit

Peak AA score does not equal best task fit (e.g. coding)

Context Window

Standard ≤262K · Large 200K–1M · Massive 1M–10M

Cost Per Token

Budget < $1/M · Mid $1–10/M · Premium $10+/M output

Deployment

API-only vs open-weight self-host vs both

Vendor Origin

Geographic / residency-style grouping for vendor choice