Top Models — Default View
Top 8 by capability score — apply levers to filter by task, cost, context, or vendor
Apr 2026 data
Lever Reference
1
Capability
AA composite index (MMLU-Pro, SWE-bench, GPQA, ARC-AGI, AIME, context)
2
Task Fit
Peak AA score does not equal best task fit (e.g. coding)
3
Context Window
Standard ≤262K · Large 200K–1M · Massive 1M–10M
4
Cost Per Token
Budget < $1/M · Mid $1–10/M · Premium $10+/M output
5
Deployment
API-only vs open-weight self-host vs both
6
Vendor Origin
Geographic / residency-style grouping for vendor choice