Prompt Engineering Made Easy

Run AI Evaluations

Compare models, test prompts, and find the best AI for your use case. All in one powerful evaluation suite.

45+ Models

Side-by-Side Compare

45+ LLMs supported

Including the top open source models

Claude 3.5 SonnetClaude 3.5 Sonnet

Claude 3.7 SonnetClaude 3.7 Sonnet

Claude 3.7 Sonnet (T)Claude 3.7 Sonnet (Thinking)

Claude 3.5 HaikuClaude 3.5 Haiku

Claude 4 SonnetClaude 4 Sonnet

Claude 4 OpusClaude 4 Opus

Claude 4 Opus (T)Claude 4 Opus (Thinking)

Claude 4.5 SonnetClaude 4.5 Sonnet

Claude 4.5 HaikuClaude 4.5 Haiku

GPT-5.1GPT-5.1

o4 minio4 mini

o3 minio3 mini

Gemini 2.5 FlashGemini 2.5 Flash

Gemini 2.5 Flash LiteGemini 2.5 Flash Lite

Gemini 2.0 FlashGemini 2.0 Flash

Gemini 3 ProGemini 3 Pro

Grok 3 MiniGrok 3 Mini

Grok 2 VisionGrok 2 Vision

DeepSeek V3.2 SpecialeDeepSeek V3.2 Speciale

DeepSeek V3 (Chat)DeepSeek V3 (Chat)

Mistral Large 2.1Mistral Large 2.1

Mistral SabaMistral Saba

Kimi K2Kimi K2

Cogito v2 Llama 405BCogito v2 Llama 405B

Cogito v2 DeepSeek 671B MoECogito v2 DeepSeek 671B MoE

Qwen 3 MaxQwen 3 Max

Qwen3 VL 235BQwen3 VL 235B

Qwen3 235B InstructQwen3 235B Instruct

Qwen 3 Coder PlusQwen 3 Coder Plus

Qwen2.5-VL 32B InstructQwen2.5-VL 32B Instruct

GLM 4.5GLM 4.5

GLM 4.6GLM 4.6

MiniMax M2MiniMax M2

Claude 4 SonnetClaude 4 Sonnet

Claude 4.5 OpusClaude 4.5 Opus

Claude 4.6 OpusClaude 4.6 Opus

Claude 4.6 SonnetClaude 4.6 Sonnet

Claude 4.5 SonnetClaude 4.5 Sonnet

Claude 4.5 HaikuClaude 4.5 Haiku

GPT-4oGPT-4o

GPT-4o miniGPT-4o mini

GPT-4.1GPT-4.1

GPT-4.1 miniGPT-4.1 mini

o3o3

GPT-5GPT-5

GPT-5 MiniGPT-5 Mini

GPT-5 NanoGPT-5 Nano

GPT-5.2GPT-5.2

GPT-5.4GPT-5.4

GPT-5.4 MiniGPT-5.4 Mini

GPT-5.4 NanoGPT-5.4 Nano

Gemini 3 FlashGemini 3 Flash

Gemini 3.1 Flash LiteGemini 3.1 Flash Lite

Gemini 3.1 ProGemini 3.1 Pro

Gemini 2.5 ProGemini 2.5 Pro

Gemini 2.5 FlashGemini 2.5 Flash

Gemini 2.5 Flash LiteGemini 2.5 Flash Lite

Grok 4.1 FastGrok 4.1 Fast

Grok 4 FastGrok 4 Fast

Grok 4Grok 4

Grok 3Grok 3

DeepSeek V3.2DeepSeek V3.2

DeepSeek V3.1DeepSeek V3.1

DeepSeek R1DeepSeek R1

Mistral Medium 3.1Mistral Medium 3.1

Magistral Medium 1.2Magistral Medium 1.2

Pixtral LargePixtral Large

Perplexity SonarPerplexity Sonar

Perplexity Sonar ProPerplexity Sonar Pro

Perplexity Sonar Reasoning ProPerplexity Sonar Reasoning Pro

Kimi K2Kimi K2

Kimi K2.5Kimi K2.5

Qwen3 Max ThinkingQwen3 Max Thinking

Qwen3.5 PlusQwen3.5 Plus

Qwen3.5 122B A10BQwen3.5 122B A10B

Qwen3.5 FlashQwen3.5 Flash

GLM 5GLM 5

GLM 4.7GLM 4.7

Cogito v2.1 671BCogito v2.1 671B

GPT OpenSource 20BGPT OpenSource 20B

GPT OpenSource 120BGPT OpenSource 120B

MiniMax M2.5MiniMax M2.5

MiniMax M2-herMiniMax M2-her

Find your perfect output

Four ways to get better answers from AI.

Compare models. Pit Claude against GPT against Gemini in seconds. Pick the best part of each response.

Compare prompts. Tweak your wording until you find the right way to unlock dramatically better answers.

Research mode. Send up to 5 models in parallel to search the web and do deep-thinking for powerful research analyses.

Custom mode. Ready to race? Full manual control for every setting each model offers - temperature, thinking effort, and more.

Share, learn, improve

AI gets better when you don't go it alone.

Share your work. Found a killer prompt? Discovered which model aces your use case? One click generates a shareable link for teammates, clients, or your future self. Privacy-first—revoke access anytime.

Clone and remix. Every shared evaluation can be duplicated and tweaked. Learn from how others phrase their requests, then make it your own.

Prompt Doctor.Stuck? Our AI assistant analyzes what's not working and suggests improvements—like having an expert prompt engineer on call, 24/7.

Ready to optimize your AI workflow?

Get started free