Context Window (tokens)
Maximum input tokens the model can process
API Pricing ($ per 1M tokens)
Input token cost - output typically 2-4x higher
Reasoning Score (GPQA Diamond)
Performance on complex reasoning benchmark
Coding Score (LiveCodeBench)
Performance on coding challenges and code generation
Math Score (AIME 2025)
Performance on mathematical reasoning problems