DeepSeek V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5
The whale has resurfaced.
DeepSeek, the Chinese AI startup offshoot of High-Flyer Capital Management quantitative analysis firm, became a https://venturebeat.com/ai/why-everyone-in-ai-is-freaking-out-about-deepseek">near-overnight sensation globally in January 2025 with the release of its open source R1 model that matched proprietary U.S. giants.
It's been an epoch in AI since then, and while DeepSeek has released https://venturebeat.com/ai/deepseek-just-dropped-two-insanely-powerful-ai-models-that-rival-gpt-5-and?mc_cid=2dcac1da6a">several https://venturebeat.com/ai/deepseek-r1-0528-arrives-in-powerful-open-source-challenge-to-openai-o3-and-google-gemini-2-5-pro">updates to that model and its other V3 series, the international AI and business community has been largely waiting with baited breath for the follow-up to the R1 moment.
Now it's arrived withhttps://x.com/deepseek_ai/status/2047516922263285776"> last night's release of DeepSeek-V4, a 1.6-trillion-parameter Mixture-of-Experts (MoE) model available free under commercially-friendly open source MIT License, which nears — and on some benchmarks, surpasses — the performance of the world’s most advanced closed-source systems at approximately 1/6th the cost over the application programming interface (API).
This release—which https://x.com/victor207755822/status/2047518146689732858?s=20">DeepSeek AI researcher Deli Chen described on X as a "labor of love" 484 days after the launch of V3—is being hailed as the "second DeepSeek moment".
As Chen noted in his post, "AGI belongs to everyone".
It's available now on AI code sharing community https://huggingface.co/collections/deepseek-ai/deepseek-v4">Hugging Face and through https://api-docs.deepseek.com/quick_start/pricing">DeepSeek's API.
Frontier-class AI gets pushed into a lower price band
The most immediate impact of the DeepSeek-V4 launch is economic.
The corrected pricing table shows DeepSeek is not pricing its new Pro model at near-zero levels, but it is still pushing high-end model access into a far lower cost tier than the leading U.S. frontier models.
https://api-docs.deepseek.com/quick_start/pricing">DeepSeek-V4-Pro is priced through its API at $1.74 USD per 1 million input tokens on a cache miss and $3.48 per million output tokens.
That puts a simple one-million-input, one-million-output comparison at $5.22.
With cached input, the input price drops to $0.145 per million tokens, bringing that same blended comparison down to $3.625.
That is dramatically cheaper than the current premium pricing from OpenAI and Anthropic.
GPT-5.5 is priced at $5.00 per million input tokens and $30.00 per million output tokens, for a combined $35.00 in the same simple comparison.
Claude Opus 4.7 is priced at $5.00 input and $25.00 output, for a combined $30.00.
Model | Input | Output | Total Cost | Source |
Grok 4.1 Fast | $0.20 | $0.50 | $0.70 | |
MiniMax M2.7 | $0.30 | $1.20 | $1.50 | https://platform.minimax.io/docs/guides/models-intro">MiniMax |
Gemini 3 Flash | $0.50 | $3.00 | $3.50 | https://ai.google.dev/pricing">Google |
Kimi-K2.5 | $0.60 | $3.00 | $3.60 | |
MiMo-V2-Pro (≤256K) | $1.00 | $3.00 | $4.00 | https://platform.xiaomimimo.com/">Xiaomi MiMo |
GLM-5 | $1.00 | $3.20 | $4.20 | |
GLM-5-Turbo | $1.20 | $4.00 | $5.20 | |
DeepSeek V4 Pro | $1.74 | $3.48 | $5.22 | |
GLM-5.1 | $1.40 | $4.40 | $5.80 | |
Claude Haiku 4.5 | $1.00 | $5.00 | $6.00 | https://www.anthropic.com/pricing">Anthropic |
Qwen3-Max | $1.20 | $6.00 | $7.20 | https://www.alibabacloud.com/help/en/model-studio/developer-reference/model-pricing">Alibaba Cloud |
Gemini 3 Pro | $2.00 | $12.00 | $14.00 | https://ai.google.dev/pricing">Google |
GPT-5.2 | $1.75 | $14.00 | $15.75 | https://openai.com/pricing">OpenAI |
GPT-5.4 | $2.50 | $15.00 | $17.50 | https://openai.com/api/pricing/">OpenAI |
Claude Sonnet 4.5 | $3.00 | $15.00 | $18.00 | https://www.anthropic.com/pricing">Anthropic |
Claude Opus 4.7 | $5.00 | $25.00 | $30.00 | https://platform.claude.com/docs/en/about-claude/pricing">Anthropic |
GPT-5.5 | $5.00 | $30.00 | $35.00 | https://openai.com/api/pricing/">OpenAI |
GPT-5.4 Pro | $30.00 | $180.00 | $210.00 | https://openai.com/api/pricing/">OpenAI |
On standard, cache-miss pricing, DeepSeek-V4-Pro comes in at roughly one-seventh the cost of GPT-5.5 and about one-sixth (1/6th) the cost of Claude Opus 4.7.
With cached input, the gap widens: DeepSeek-V4-Pro costs about one-tenth as much as GPT-5.5 and about one-eighth as much as Claude Opus 4.7.
The more extreme near-zero story belongs to DeepSeek-V4-Flash, not the Pro model.
Flash is priced at $0.14 per million input tokens on a cache miss and $0.28 per million output tokens, for a combined $0.42.
With cached input, that drops to $0.308.
In that case, DeepSeek’s cheaper model is more than 98% below GPT-5.5 and Claude Opus 4.7 in a simple input-plus-output comparison, or nearly 1/100th the cost — though the performance dips significantly.
DeepSeek is compressing advanced model economics into a much lower band, forcing developers and enterprises to revisit the cost-benefit calculation around premium closed models.
For companies running large inference workloads, that price gap can change what is worth automating.
Tasks that look too expensive on GPT-5.5 or Claude Opus 4.7 may become economically viable on DeepSeek-V4-Pro, and even more so on DeepSeek-V4-Flash.
The launch does not make intelligence free, but it does make the market harder for premium providers to defend on performance alone.
Benchmarking the frontier: DeepSeek-V4-Pro gets close, but GPT-5.5 and Opus 4.7 still lead on most shared tests
DeepSeek-V4-Pro-Max is best understood as a major open-weight leap, not a clean across-the-board defeat of the newest closed frontier systems.
The model’s strongest benchmark claims come from DeepSeek’s own comparison tables, where it is shown against GPT-5.4 xHigh, Claude Opus 4.6 Max and Gemini 3.1 Pro High and bests them on several tests, including Codeforces and Apex Shortlist.
But that is not the same as a head-to-head against OpenAI’s newer GPT-5.5 or Anthropic’s newer Claude Opus 4.7.
Looking only at DeepSeek V4 versus the latest proprietary models, the picture is more restrained.
On this shared set, GPT-5.5 and Claude Opus 4.7 still lead most categories.
DeepSeek-V4-Pro-Max’s best showing is on BrowseComp, the benchmark measuring agentic AI web browsing prowess (especially highly containerized information), where it scores 83.4%, narrowly behind GPT-5.5 at 84.4% and ahead of Claude Opus 4.7 at 79.3%.
On Terminal-Bench 2.0, DeepSeek scores 67.9%, close to Claude Opus 4.7’s 69.4%, but far behind GPT-5.5’s 82.7%.
Benchmark | DeepSeek-V4-Pro-Max | GPT-5.5 | GPT-5.5 Pro, where shown | Claude Opus 4.7 | Best result among these |
GPQA Diamond | 90.1% | 93.6% | — | 94.2% | Claude Opus 4.7 |
Humanity’s Last Exam, no tools | 37.7% | 41.4% | 43.1% | 46.9% | Claude Opus 4.7 |
Humanity’s Last Exam, with tools | 48.2% | 52.2% | 57.2% | 54.7% | GPT-5.5 Pro |
Terminal-Bench 2.0 | 67.9% | 82.7% | — | 69.4% | GPT-5.5 |
SWE-Bench Pro / SWE Pro | 55.4% | 58.6% | — | 64.3% | Claude Opus 4.7 |
BrowseComp | 83.4% | 84.4% | 90.1% | 79.3% | GPT-5.5 Pro |
MCP Atlas / MCPAtlas Public | 73.6% | 75.3% | — | 79.1% | Claude Opus 4.7 |
The shared academic-reasoning results favor the closed models: On GPQA Diamond, DeepSeek-V4-Pro-Max scores 90.1%, while GPT-5.5 reaches 93.6% and Claude Opus 4.7 reaches 94.2%.
On Humanity’s Last Exam without tools, DeepSeek scores 37.7%, behind GPT-5.5 at 41.4%, GPT-5.5 Pro at 43.1% and Claude Opus 4.7 at 46.9%.
With tools enabled, DeepSeek rises to 48.2%, but still trails GPT-5.5 at 52.2%, GPT-5.5 Pro at 57.2% and Claude Opus 4.7 at 54.7%.
The agentic and software-engineering results are more mixed, but they still show DeepSeek-V4-Pro-Max trailing GPT-5.5 and Opus 4.7.
On Terminal-Bench 2.0, DeepSeek’s 67.9% is competitive with Claude Opus 4.7’s 69.4%, but GPT-5.5 is much higher at 82.7%.
On SWE-Bench Pro, DeepSeek’s 55.4% trails GPT-5.5 at 58.6% and Claude Opus 4.7 at 64.3%.
On MCP Atlas, DeepSeek’s 73.6% is slightly behind GPT-5.5 at 75.3% and Claude Opus 4.7 at 79.1%.
BrowseComp is the standout: DeepSeek’s 83.4% beats Claude Opus 4.7’s 79.3% and nearly matches GPT-5.5’s 84.4%, though GPT-5.5 Pro’s 90.1% remains well ahead.
So ultimately, DeepSeek-V4-Pro-Max does not appear to dethrone GPT-5.5 or Claude Opus 4.7 on the benchmarks that can be directly compared across the companies’ published tables.
But it gets close enough on several of them — especially BrowseComp, Terminal-Bench 2.0 and MCP Atlas — that its much lower API pricing becomes the headline.
In practical terms, DeepSeek does not need to win every leaderboard row to matter.
If it can deliver near-frontier performance on many enterprise-relevant agent and reasoning tasks at roughly one-sixth to one-seventh the standard API cost of GPT-5.5 or Claude Opus 4.7, it still forces a major rethink of the economics of advanced AI deployment.
DeepSeek-V4-Pro-Max is clearly the strongest open-weight model in the field right now, and it is unusually close to frontier closed systems on several practical benchmarks.
While GPT-5.5 and Claude Opus 4.7 still retain the lead in most direct head-to-head comparisons across the company's benchmark charts, DeepSeek V4 Pro gets close while being dramatically cheaper and openly available.
A big jump from DeepSeek V3.2
To understand the magnitude of this release, one must look at the performance gains of the base models.
DeepSeek-V4-Pro-Base represents a significant advancement over the previous generation, DeepSeek-V3.2-Base.
In World Knowledge, V4-Pro-Base achieved 90.1 on MMLU (5-shot) compared to V3.2’s 87.8, and a massive jump on MMLU-Pro from 65.5 to 73.5.
The improvement in high-level reasoning and verified facts is even more pronounced: on SuperGPQA, V4-Pro-Base reached 53.9 compared to V3.2's 45.0, and on the FACTS Parametric benchmark, it more than doubled its predecessor's performance, jumping from 27.1 to 62.6.
Simple-QA verified scores also saw a dramatic rise from 28.3 to 55.2.
The Long Context capabilities have also been refined.
On LongBench-V2, V4-Pro-Base scored 51.5, significantly outpacing the 40.2 achieved by V3.2-Base.
In Code and Math, V4-Pro-Base reached 76.8 on HumanEval (Pass@1), up from 62.8 on V3.2-Base.
These numbers underscore that DeepSeek has not just optimized for inference cost, but has fundamentally improved the intelligence density of its base architecture.
The efficiency story is equally compelling for the Flash variant.
DeepSeek-V4-Flash-Base, despite utilizing a substantially smaller number of parameters, outperforms the larger V3.2-Base across wide benchmarks, particularly in long-context scenarios.
A new information 'traffic controller,' Manifold-Constrained Hyper-Connections (mHC)
DeepSeek’s ability to offer these prices and performance figures is rooted in radical architectural innovations detailed in its technical report also released today, "Towards Highly Efficient Million-Token Context Intelligence."
The standout technical achievement of V4 is its native one-million-token context window.
Historically, maintaining such a large context required massive memory (the key values or KV cache).
DeepSeek solved this by introducing a Hybrid Attention Architecture that combines Compressed Sparse Attention (CSA) to reduce initial token dimensionality and Heavily Compressed Attention (HCA) to aggressively compress the memory footprint for long-range dependencies.
In practice, the V4-Pro model requires only 10% of the KV cache and 27% of the single-token inference FLOPs compared to its predecessor, the DeepSeek-V3.2, even when operating at a 1M token context.
To stabilize a network of 1.6 trillion parameters, DeepSeek moved beyond traditional residual connections.
The company's researchers incorporated Manifold-Constrained Hyper-Connections (mHC) to strengthen signal propagation across layers while preserving the model’s expressivity.
mHC allows an AI to have a much wider flow of information (so it can learn more complex things) without the risk of the model becoming unstable or "breaking" during its training.
It’s like giving a city a 10-lane highway but adding a perfect AI traffic controller to ensure no one ever hits the brakes.
This is paired with the Muon optimizer, which allowed the team to achieve faster convergence and greater training stability during the pre-training on more than 32T diverse and high-quality tokens.
This pre-training data was refined to remove hatched auto-generated content, mitigating the risk of model collapse and prioritizing unique academic values.
The model’s 1.6T parameters utilize a Mixture-of-Experts (MoE) design where only 49B parameters are activated per token, further driving down compute requirements.
Training the mixture-of-experts (MoE) to work as a whole
DeepSeek-V4 was not simply trained; it was "cultivated" through a unique two-stage paradigm.
First, through Independent Expert Cultivation, domain-specific experts were trained through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) using the GRPO (Group Relative Policy Optimization) algorithm.
This allowed each expert to master specialized skills like mathematical reasoning or codebase analysis.
Second, Unified Model Consolidation integrated these distinct proficiencies into a single model via on-policy distillation, where the unified model acts as the student learning to optimize reverse KL loss with teacher models.
This distillation process ensures that the model preserves the specialized capabilities of each expert while operating as a cohesive whole.
The model’s reasoning capabilities are further segmented into three increasing "effort" modes.
The "Non-think" mode provides fast, intuitive responses for routine tasks.
"Think High" provides conscious logical analysis for complex problem-solving.
Finally, "Think Max" pushes the boundaries of model reasoning, bridging the gap with frontier models on complex reasoning and agentic tasks.
This flexibility allows users to match the compute effort to the difficulty of the task, further enhancing cost-efficiency.
Breaking the Nvidia GPU stranglehold with local Chinese Huawei Ascend NPUs
While the model weights are the headline, the software stack released alongside them is arguably more important for the future of "Sovereign AI."
https://x.com/ruima/status/2047548613711327541">Analyst Rui Ma highlighted a single sentence from the release as the most critical: DeepSeek validated their fine-grained Expert Parallelism (EP) scheme onhttps://www.reuters.com/business/media-telecom/huawei-ascend-supernode-support-deepseek-v4-2026-04-24/"> Huawei Ascend https://en.wikipedia.org/wiki/Neural_processing_unit">NPUs (neural processing units).
By achieving a 1.50x to 1.73x speedup on non-Nvidia GPU platforms, DeepSeek has provided a blueprint for high-performance AI deployment that is resilient to Western GPU supply chains and export controls.
However, it's important to note that DeepSeek still claims it used officially licensed, legal Nvidia GPUs for DeepSeek V4's training, in addition to the Huawei NPUs.
DeepSeek has also open-sourced the MegaMoE mega-kernel as a component of its DeepGEMM library.
This CUDA-based implementation delivers up to a 1.96x speedup for latency-sensitive tasks like RL rollouts and high-speed agent serving.
This move ensures that developers can run these massive models with extreme efficiency on existing hardware, further cementing DeepSeek’s role as the primary driver of open-source AI infrastructure.
The technical report emphasizes that these optimizations are crucial for supporting a standard 1M context across all official services.
Licensing and local deployment
DeepSeek-V4 is released under the MIT License, the most permissive framework in the industry.
This allows developers to use, copy, modify, and distribute the weights for commercial purposes without royalties—a stark contrast to the "restricted" open-weight licenses favored by other companies.
For local deployment, DeepSeek recommends setting sampling parameters to temperature = 1.0 and top_p = 1.0.
For those utilizing the "Think Max" reasoning mode, the team suggests setting the context window to at least 384K tokens to avoid truncating the model's internal reasoning chains.
The release includes a dedicated encoding folder with Python scripts demonstrating how to encode messages in OpenAI-compatible format and parse the model's output, including reasoning content.
DeepSeek-V4 is also seamlessly integrated with leading AI agents like Claude Code, OpenClaw, and OpenCode.
This native integration underscores its role as a bedrock for developer tools, providing an open-source alternative to the proprietary ecosystems of major cloud providers.
Community reactions and what comes next
The community reaction has been one of shock and validation. https://x.com/huggingface/status/2047572895832915977?s=20">Hugging Face officially welcomed the "whale" back, stating that the era of cost-effective 1M context length has arrived.
Industry experts noted that the "https://x.com/AILeaksAndNews/status/2047650325943714014">second DeepSeek moment" has effectively reset the developmental trajectory of the entire field, placing massive pressure on closed-source providers like OpenAI and Anthropic to justify their premiums.
AI evaluation firm https://x.com/ValsAI/status/2047513613750202452">Vals AI noted that DeepSeek-V4 is now the "#1 open-weight model on our Vibe Code Benchmark, and it’s not close".
DeepSeek is moving quickly to retire its older architectures.
The company announced that the legacy deepseek-chat and deepseek-reasoner endpoints will be fully retired on July 24, 2026.
All traffic is currently being rerouted to the V4-Flash architecture, signifying a total transition to the million-token standard.
DeepSeek-V4 is more than just a new model; it is a challenge to the status quo.
By proving that architectural innovation can substitute for raw compute-maximalism, DeepSeek has made the highest levels of AI intelligence accessible to the global developer community at a far lower cost — something that could benefit the globe, even at a time when lawmakers and leaders in Washington, D.C. are raisinghttps://www.bbc.com/news/articles/cpqxgxx9nrqo"> concerns about Chinese labs "distilling" from U.S. proprietary giants to train open source models, and fears of said open source or jailbroken proprietary https://www.politico.com/news/2026/04/22/ai-chatbots-jailbreak-safety-00887869">models being used to create weapons and commit terror.
The truth is, while all of these are potential risks — as they were and have been with prior technologies that broadened information access, like search and https://www.washingtonpost.com/news/wonk/wp/2015/04/02/dianne-feinstein-says-the-anarchists-cookbook-should-be-removed-from-the-internet/">the internet itself — the benefits seem far outweigh them, and DeepSeek's quest to keep frontier AI models open is of benefit to the entire planet of potential AI users, especially enterprises looking to adopt the cutting-edge at the lowest possible cost.
- China's DeepSeek Unveils AI Model To Challenge Anthropic, OpenAI | The Pulse 4/24 Bloomberg —
- China's DeepSeek launches an update of its AI model Associated Press —
- DeepSeek releases next-gen AI model with ‘world-leading’ efficiency South China Morning Post —
- DeepSeek promises its new AI model has 'world-class' reasoning Engadget —
- China's DeepSeek rolls out a long-anticipated update of its AI model The Independent —