Your-Weights Leaderboard
Sliders for quality, speed, cost, context, safety.
Re-sort instantly and share the exact view via URL. The ranking follows your priorities—not a vendor's default formula.
ExploreA fast leaderboard and comparison hub that goes beyond one composite score: re-rank live, compare side-by-side, and stress-test in a multi-model arena.
5 dimensions·∞ rank orders·1 shareable URL
Three ways in
Sliders for quality, speed, cost, context, safety.
Re-sort instantly and share the exact view via URL. The ranking follows your priorities—not a vendor's default formula.
ExploreCompare 2–4 models across 25+ fields.
Differences highlighted, plus export as Markdown.
ExploreStream responses, see TTFT and tokens/sec.
Blind-vote the winner; votes feed community Elo.
ExploreNo hidden default score
Weights update in real time
Arena votes shape the crowd rank
AI Benchmark Hub is the fastest way to choose an LLM for production: rank models with your own priorities, compare GPT vs Claude vs Gemini side-by-side, and stress-test prompts in a live arena at ai-benchmark-hub.com.
AI Benchmark Hub ranks frontier models using five dimensions you control: reasoning quality, inference speed, token cost, maximum context, and safety moderation. Drag sliders and the table re-sorts instantly—no spreadsheet required.
Open the model leaderboard →Pick two to four models and compare pricing per million tokens, context limits, modalities, Arena Elo, and provider metadata. Differences are highlighted and you can export the table as Markdown for docs or procurement.
Compare models →Stream answers from multiple LLMs on the same prompt, measure time-to-first-token and tokens per second, then blind-vote the winner. Community votes feed an Elo-style crowd ranking on AI Benchmark Hub.
Try the live arena →AI Benchmark Hub is a free web app to rank, compare, and battle-test large language models (LLMs). Re-sort models by your own weights for quality, speed, cost, context window, and safety—then share the exact view via URL.
Unlike sites that hide everything behind one composite score, AI Benchmark Hub lets you set priorities with sliders, compare up to four models field-by-field, and run a live multi-model arena with blind voting and latency metrics.
The leaderboard and arena pull from the OpenRouter catalog—covering GPT, Claude, Gemini, Llama, Mistral, DeepSeek, Qwen, and hundreds more—with Arena Elo quality signals and live API pricing where available.
Yes. Browsing the leaderboard, sharing weighted rankings, and comparing specs is free. The arena supports free-tier models; paid models use your own API keys when configured.
Open our GPT vs Claude guide at /topics/gpt-vs-claude or go directly to Compare and pick OpenAI GPT and Anthropic Claude models. You will see pricing per million tokens, context limits, Arena Elo quality, and more.
Cheapest models change often. On the leaderboard, set the cost weight to 100% to re-rank by blended price per million tokens using live OpenRouter data, or read /topics/cheapest-llm-api.
There is no single best coding LLM for every team. Rank models on the leaderboard with high quality and speed weights, compare finalists on /compare, then test real prompts in the /arena.
LMSYS Chatbot Arena focuses on crowd Elo from blind votes. AI Benchmark Hub adds live pricing, context, custom weighting, side-by-side specs, and a multi-model arena you control with your own prompts and API keys.