Drones Take Center Stage: Expert Insights On Revolutionizing Emergency Response
Unlocking the Potential of Public Safety Drones A unique gathering of public safety professionals, …
30. September 2025
DeepSeek, a China-based AI laboratory, has unveiled its latest experimental model, DeepSeek-V3.2-Exp, on September 29, promising to revolutionize the field of Large Language Models (LLMs) by achieving substantial efficiency improvements in both training and inference phases. The introduction of “DeepSeek Sparse Attention” (DSA), a novel sparse attention mechanism, is at the heart of this innovation.
To comprehend how DSA works, it’s essential to break down its two core components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer maintains a compact key cache of 128 tokens per token, significantly reducing computational requirements compared to the previous version. This allows for faster query processing and filtering, enabling the model to efficiently explore long-context scenarios.
Despite employing a more efficient attention mechanism that processes fewer tokens during long-context tasks, DeepSeek’s DSA has been shown to perform on par with its predecessor, DeepSeek-V3.1-Terminus. To put this achievement into perspective, consider the scores obtained by prominent LLMs on various benchmarks. The Artificial Intelligence index, which aggregates performance across 10 diverse domains, places DeepSeek-V3.2-Exp at a competitive level with Anthropic’s Claude 4.1 Opus model (59 points), Gemini 2.5 Pro (60 points), and OpenAI’s GPT-5 (68 points).
The impact of DSA extends beyond improved performance; it also boasts significant computational efficiency gains. According to the company, MLA, a component of DSA, is approximately 5.6 times faster than traditional Multi-Head Attention (MHA). Moreover, DSA itself is 9 times faster than MLA.
This achievement has been met with widespread acclaim from industry experts and enthusiasts alike. “The DeepSeek team cracked cheap long context for LLMs: a ~3.5x cheaper prefill and ~10x cheaper decode at 128k context at inference with the same quality,” said Deedy Das, partner at Menlo Ventures.
Beyond its impressive technical prowess, DeepSeek-V3.2-Exp also features several other notable improvements. The model boasts Day-0 support for Huawei Ascend and Cambricon chips, further solidifying China’s growing presence in the AI landscape. Additionally, the incorporation of TileLang, a ML compiler that enables users to write Python code and compile it to optimized kernels on diverse hardware, offers unprecedented flexibility.
The API pricing has also been revised, with DeepSeek reducing input costs by 50%. This change translates to reduced costs for cache hits (from $0.07 to $0.028 per 1M tokens) and cache misses (from $0.56 to $0.28), while output costs have decreased from $1.68 to $0.42.
The availability of DeepSeek-V3.2-Exp on the DeepSeek app, web, and API allows users to seamlessly integrate this cutting-edge technology into their workflows. Furthermore, the model’s weights are now accessible on Hugging Face, providing developers with an opportunity to explore and build upon this innovation.
As AI continues to evolve at a rapid pace, it’s clear that DeepSeek is at the forefront of this revolution. With its groundbreaking introduction of DSA and other notable features, this experimental model sets the stage for even more exciting developments in the world of LLMs.