Season 1 Β· March 2026 Β· SHIFT Framework

SMOL AI WORLDCUPSize Β· Honesty Β· Intelligence Β· Fast Β· Thrift

World's first 5-axis benchmark for small AI Β· 125 questions Β· 7 languages Β· πŸ₯… League One Β· ⚽ La Liga Β· πŸ… Premier Β· πŸ† Champions
πŸ† OFFICIAL RANKING FORMULA
WCS = √SHIFT Γ— PIRnorm
QUALITY
SHIFT
HΓ—0.4 + IΓ—0.6
Γ—
EFFICIENCY
PIRnorm
(IΓ—HΓ—F) Γ· (SΓ—T) β†’ log scale
Both quality AND efficiency must be high. A model that's smart but huge, or tiny but dumb, ranks low.
Size Β· Honesty Β· Intelligence Β· Fast Β· Thrift β€” all 5 axes matter.
18
Models
125
Questions
7
Languages
5
SHIFT Axes
4
Leagues
πŸ† STANDINGS
πŸ“Š SHIFT AXES
πŸ’° VALUE
πŸ… VS SOTA
βš” MATCHUP
πŸ”¬ INSIGHTS
πŸ“‹ RULES
LEAGUE: β”‚
# PLAYER↕ πŸ† WCS↕ πŸ₯Š PIR↕ ⭐ SHIFT↕ πŸ›‘ Honesty↕ 🧠 Intel↕ πŸ… Union↕ βš– League↕ πŸ“ Params↕ ⚑ tok/s↕ πŸ’Ύ RAM↕ πŸͺ€ Trap↕ πŸ“Š Calib↕ 🚫 Refuse↕ πŸ”„ Fix↕ 🧩 Logic↕ πŸ”’ Math↕ πŸ’» Code↕ 🌍 Lang↕ πŸ“š Know↕ 🧬 Meta↕
πŸ₯Š PIR = (IΓ—HΓ—F)/(SΓ—T)β”‚ πŸ₯… League One(<2GB) ⚽ La Liga(2-4GB) πŸ… Premier(4-8GB) πŸ† Champions(8-16GB)
SHIFT 5-AXIS DEEP DIVE
πŸ“¦ SIZE
πŸ›‘ HONESTY
🧠 INTEL
⚑ FAST
πŸ’Ύ THRIFT
πŸ’° BEST VALUE β€” GIANT KILLERS
Which models deliver the most intelligence per GB of RAM?

🏟 INTELLIGENCE vs RESOURCE β€” WHO PUNCHES UP?

Upper-left = best value (high performance, low resource). Dot size = PIR. πŸ₯… League One models in the upper-left are Giant Killers.

πŸ₯… LEAGUE CHAMPIONS β€” SHIFT RADAR

Best model from each league, compared on 5 SHIFT axes. Outer = better.

⚑ SPEED EFFICIENCY β€” tok/s PER GB

Who squeezes the most speed from each GB of RAM? Bigger slice = more efficient.

πŸ… GIANT KILLING INDEX
Small models vs Frontier giants β€” same Union Eval questions

🏟 SMOL vs SOTA β€” SCATTER MAP

Red zone = Frontier giants. Colored dots = Smol challengers. Closer to red = closer to SOTA.

βš” TALE OF THE TAPE
πŸ”΅ BLUE CORNER
VS
πŸ”΄ RED CORNER
πŸ”¬ KEY DISCOVERIES
Data-driven insights from SHIFT 125Q + Union 19Q + Speed measurement on 18 models
πŸ₯Š 4B vs 8B
🏭 MoE Edge
🧠 Thinking
πŸͺ€ Hallucination
⚑ Speed
πŸ… Recommend

πŸ₯Š "4B BEATS 8B"

A 4B model using only 2GB RAM achieves higher SHIFT scores than most 8B models requiring 5.5GB. Doubling parameters β‰  doubling performance.

⚽ Gemma-3n-E4B (2GB)
77.3
⚽ Qwen3-4B (2.8GB)
76.8
πŸ… Qwen3-8B (5.5GB)
76.9
πŸ… Llama-3.1-8B (5.5GB)
61.0

β†’ SHIFT gap: 0.1 points for 2.75Γ— more RAM

πŸ“ 1.7B REBELLION

Qwen3-1.7B (1.2GB) outscores three 7-14B models. Latest architecture + small size > old architecture + big size.

πŸ₯… Qwen3-1.7B (1.2GB)
66.8
πŸ… Mistral-7B (5GB)
60.6
πŸ… Llama-3.1-8B (5.5GB)
61.0
πŸ† DeepSeek-R1-14B (9.5GB)
59.8

β†’ 1.7B beats 7B, 8B, and 14B models

🏟 WHAT IS THIS?

World's first 5-axis benchmark for small language models (≀10B active params). SHIFT measures what matters for edge: not just intelligence, but honesty, speed, and efficiency.

πŸ“Š SHIFT FRAMEWORK

Size β€” Model footprint
Honesty β€” Hallucination, calibration, refusal, self-correction
Intelligence β€” Reasoning, math, coding, 7 languages, metacognition
Fast β€” Tokens/sec, TTFA
Thrift β€” Peak VRAM/RAM

πŸ† WCS β€” WORLDCUP SCORE

WCS = √(SHIFT Γ— PIRnorm)
The official ranking metric. Geometric mean of quality (SHIFT) and efficiency (PIR). Both must be high to score well.

πŸ₯Š PIR FORMULA

PIR = (I Γ— H Γ— F) Γ· (S Γ— T) Β· PIRnorm = log₁₀(PIR) / log₁₀(max) Γ— 100
Efficiency rating. Like boxing's P4P: how much punch per pound of hardware.

βš– FOOTBALL LEAGUE TIERS

πŸ₯… League One (<2GB) β€” Raspberry Pi
⚽ La Liga (2-4GB) β€” Smartphone
πŸ… Premier League (4-8GB) β€” Laptop
πŸ† Champions League (8-16GB) β€” PC

🌍 7 LANGUAGES

πŸ‡¬πŸ‡§ EN Β· πŸ‡°πŸ‡· KO Β· πŸ‡ΈπŸ‡¦ AR Β· πŸ‡§πŸ‡· PT Β· πŸ‡ΉπŸ‡· TR Β· πŸ‡§πŸ‡© BN Β· πŸ‡ΉπŸ‡­ TH
2.7B+ speakers. Sentiment, idioms, translation, culture.

πŸ… UNION EVAL

Same 20 cross-benchmark questions given to frontier SOTA models. Direct comparison with Claude, GPT-5, etc. Scores are not publicly disclosed.

SMOL AI WORLDCUP
Season 1 Β· v1.3 Β· 125Q SHIFT + 19Q Union Β· 18 Models Β· 12 Makers Β· 7 Languages Β· WCS Ranking Β· Apache 2.0 Β· 2026
Developed by Ginigen.ai
Small but Mighty AI