Ask questions like this to multiple AIs at once

Try free

What are the best open source large language models LLMs in English

Best Open Source LLMs in English (Early 2026)

As of February 2026, open source LLMs have closed the gap with proprietary models like GPT-5.x and Gemini 3, especially in reasoning, coding, and efficiency. Chinese labs (DeepSeek, Alibaba Qwen, Zhipu GLM, Moonshot Kimi, MiniMax) dominate leaderboards like Hugging Face Open LLM Leaderboard v2, LMSYS Chatbot Arena (open models section), and community benchmarks (e.g., AIME math, LiveCodeBench coding, GPQA reasoning). Meta's Llama 4 series remains a strong all-rounder with multimodal capabilities. All top models excel in English (multilingual training ensures this) and are available on Hugging Face for free download under permissive licenses (Apache 2.0/MIT).

Focus is on fully open-weight models (weights + permissive licenses for commercial use). Performance from recent benchmarks: MMLU ~85-92%, coding ~80-90% on LiveCodeBench.

Top 6 Recommendations

These are consensus picks from BentoML, Clarifai, WhatLLM.org, Shakudo, r/LocalLLaMA, and snippets from LLM-Stats/Artificial Analysis.

  1. DeepSeek-V3.2 / R1 (DeepSeek AI)

    • Size: 600B+ MoE (efficient active params ~30-60B).
    • License: MIT/Apache 2.0.
    • HF Repo: deepseek-ai/DeepSeek-V3.2.
    • Pros: #1 in reasoning/math (tops AIME/GPQA), coding (89% LiveCodeBench), agentic tasks; fast inference; rivals GPT-5 on LMSYS Arena open rankings.
    • Cons: High VRAM (80-200GB FP16; quantize to 40GB Q4).
    • Best for: Coding, math, agents. Runs well locally quantized.
  2. Qwen3-235B-A22B-Instruct (Alibaba)

    • Size: 235B MoE (22B active).
    • License: Apache 2.0.
    • HF Repo: Qwen/Qwen3-235B-A22B-Instruct.
    • Pros: Outperforms GPT-4o/Llama-3.1-405B on MMLU/multilingual; 256K+ context; top on LMSYS for open models.
    • Cons: Slightly slower than pure dense models.
    • Best for: General chat, multilingual English tasks, long-context RAG.
  3. GLM-4.7 (Zhipu AI)

    • Size: ~400B? (details vary; MoE).
    • License: MIT.
    • HF Repo: THUDM/glm-4.7 (or similar Zhipu repo).
    • Pros: Highest overall quality score (68/100 on WhatLLM); 89% LiveCodeBench; close to Claude Opus in coding/reasoning.
    • Cons: Newer, less community fine-tunes.
    • Best for: Coding/programming (r/LocalLLaMA favorite).
  4. Llama 4 Maverick / Scout (Meta)

    • Size: Maverick 400B MoE (17B active); Scout 109B MoE (10M token context!).
    • License: Llama License (permissive, commercial OK).
    • HF Repo: meta-llama/Llama-4-Maverick.
    • Pros: Multimodal (text/image/video); beats GPT-4o on STEM; huge context for docs.
    • Cons: Larger VRAM (100GB+); older than 2026 Chinese releases in some benches.
    • Best for: Multimodal, long-context analysis, enterprise.
  5. Kimi K2.5 (Moonshot AI)

    • Size: ~200-400B MoE.
    • License: Apache 2.0.
    • HF Repo: moonshot-ai/kimi-k2.5.
    • Pros: Top reasoning (GPQA/HLE leader); agentic; 1/8th cost of closed models.
    • Cons: Less optimized for local (focus on cloud).
    • Best for: Complex reasoning, tools/agents.
  6. MiniMax-M2.1 (MiniMax)

    • Size: 230B MoE (10B active).
    • License: Apache 2.0.
    • HF Repo: minimax-ai/MiniMax-M2.1.
    • Pros: Agent-focused; 204K context; 60 t/s speed.
    • Cons: Specialized (weaker general chat).
    • Best for: Automation, coding agents.

Comparison Table

ModelParams (Active)Key Benchmarks (e.g., MMLU/LiveCode)VRAM Est. (Q4 Quant)Context LengthProsCons
DeepSeek-V3.2600B (30-60B)92%/89%40-80GB128K+Reasoning/coding kingSize
Qwen3-235B235B (22B)91%/85%50GB256KBalanced, multilingualInference speed
GLM-4.7~400B90%/89%60GB128KCoding excellenceFewer variants
Llama 4 Maverick400B (17B)89%/84%70GB1MMultimodal, long contextVRAM-heavy
Kimi K2.5~300B90%/87%50GB128KAgentic reasoningCloud-optimized
MiniMax-M2.1230B (10B)88%/86%40GB204KFast agentsNiche use

Notes: Benchmarks approximate from 2026 sources (e.g., WhatLLM, BentoML). Quantization (Q4/Q5 via GGUF) reduces VRAM 50-70%. Test on your hardware.

Platforms & Tools to Get Started

  • Hugging Face: Download/run models (pip install transformers; pipeline("text-generation", model="deepseek-ai/DeepSeek-V3.2")).
  • Ollama (best for local): ollama run qwen3:235b – easy, quantized, GPU/CPU.
  • LM Studio / Jan.ai: GUI for testing/running quantized models.
  • vLLM / TensorRT-LLM: Production serving (high throughput).
  • Check Live Leaderboards: HF Open LLM Leaderboard, LMSYS Arena, Artificial Analysis.

Choosing One:

  • Limited hardware (<24GB VRAM): Quantized Qwen3-72B or DeepSeek-Coder-V2-Lite.
  • Coding: GLM-4.7 or DeepSeek-V3.2.
  • General/English chat: Qwen3 or Llama 4 Scout.
  • Server/deploy: Any MoE for cost-efficiency.

What’s your primary use case (e.g., coding, chatbots, RAG)? What hardware do you have (GPU VRAM)? Do you prefer small/fast models or max performance? Let me know for tailored recs!