Prompt Engineering Made Easy

Run AI Evaluations

Compare models, test prompts, and find the best AI for your use case. All in one powerful evaluation suite.

45+ Models
Side-by-Side Compare

45+ LLMs supported

Including the top open source models

Anthropic iconClaude 3.5 Sonnet
Anthropic iconClaude 3.7 Sonnet
Anthropic iconClaude 3.7 Sonnet (T)
Anthropic iconClaude 3.5 Haiku
Anthropic iconClaude 4 Sonnet
Anthropic iconClaude 4 Opus
Anthropic iconClaude 4 Opus (T)
Anthropic iconClaude 4.5 Sonnet
Anthropic iconClaude 4.5 Haiku
OpenAI iconGPT-5.1
OpenAI icono4 mini
OpenAI icono3 mini
Google iconGemini 2.5 Flash
Google iconGemini 2.5 Flash Lite
Google iconGemini 2.0 Flash
Google iconGemini 3 Pro
xAI iconGrok 3 Mini
xAI iconGrok 2 Vision
DeepSeek iconDeepSeek V3.2 Speciale
DeepSeek iconDeepSeek V3 (Chat)
Mistral iconMistral Large 2.1
Mistral iconMistral Saba
Moonshot AI iconKimi K2
Deep Cogito iconCogito v2 Llama 405B
Deep Cogito iconCogito v2 DeepSeek 671B MoE
Alibaba iconQwen 3 Max
Alibaba iconQwen3 VL 235B
Alibaba iconQwen3 235B Instruct
Alibaba iconQwen 3 Coder Plus
Alibaba iconQwen2.5-VL 32B Instruct
Z AI iconGLM 4.5
Z AI iconGLM 4.6
MiniMax iconMiniMax M2
Anthropic iconClaude 4 Sonnet
Anthropic iconClaude 4.5 Opus
Anthropic iconClaude 4.6 Opus
Anthropic iconClaude 4.6 Sonnet
Anthropic iconClaude 4.5 Sonnet
Anthropic iconClaude 4.5 Haiku
OpenAI iconGPT-4o
OpenAI iconGPT-4o mini
OpenAI iconGPT-4.1
OpenAI iconGPT-4.1 mini
OpenAI icono3
OpenAI iconGPT-5
OpenAI iconGPT-5 Mini
OpenAI iconGPT-5 Nano
OpenAI iconGPT-5.2
OpenAI iconGPT-5.4
OpenAI iconGPT-5.4 Mini
OpenAI iconGPT-5.4 Nano
Google iconGemini 3 Flash
Google iconGemini 3.1 Flash Lite
Google iconGemini 3.1 Pro
Google iconGemini 2.5 Pro
Google iconGemini 2.5 Flash
Google iconGemini 2.5 Flash Lite
xAI iconGrok 4.1 Fast
xAI iconGrok 4 Fast
xAI iconGrok 4
xAI iconGrok 3
DeepSeek iconDeepSeek V3.2
DeepSeek iconDeepSeek V3.1
DeepSeek iconDeepSeek R1
Mistral iconMistral Medium 3.1
Mistral iconMagistral Medium 1.2
Mistral iconPixtral Large
Perplexity iconPerplexity Sonar
Perplexity iconPerplexity Sonar Pro
Perplexity iconPerplexity Sonar Reasoning Pro
Moonshot AI iconKimi K2
Moonshot AI iconKimi K2.5
Alibaba iconQwen3 Max Thinking
Alibaba iconQwen3.5 Plus
Alibaba iconQwen3.5 122B A10B
Alibaba iconQwen3.5 Flash
Z AI iconGLM 5
Z AI iconGLM 4.7
Deep Cogito iconCogito v2.1 671B
OpenAI iconGPT OpenSource 20B
OpenAI iconGPT OpenSource 120B
MiniMax iconMiniMax M2.5
MiniMax iconMiniMax M2-her

Find your perfect output

Four ways to get better answers from AI.

Evaluation types comparison

Compare models. Pit Claude against GPT against Gemini in seconds. Pick the best part of each response.

Compare prompts. Tweak your wording until you find the right way to unlock dramatically better answers.

Research mode. Send up to 5 models in parallel to search the web and do deep-thinking for powerful research analyses.

Custom mode. Ready to race? Full manual control for every setting each model offers - temperature, thinking effort, and more.

Share, learn, improve

AI gets better when you don't go it alone.

Share your work. Found a killer prompt? Discovered which model aces your use case? One click generates a shareable link for teammates, clients, or your future self. Privacy-first—revoke access anytime.

Clone and remix. Every shared evaluation can be duplicated and tweaked. Learn from how others phrase their requests, then make it your own.

Prompt Doctor.Stuck? Our AI assistant analyzes what's not working and suggests improvements—like having an expert prompt engineer on call, 24/7.

Share and Prompt Doctor features

Ready to optimize your AI workflow?

Sign up now and start running evaluations across dozens of AI models.

Get started free