DeepSeek Unveils V3.2-exp Model with Sparse Attention to Cut AI Inference Costs Tech Happened

Researchers at DeepSeek have released a new experimental AI model, V3.2-exp, designed to dramatically lower inference costs in long-context operations. The model was announced Monday via Hugging Face, accompanied by an academic paper posted on GitHub.

At the core of the release is DeepSeek Sparse Attention, a new attention mechanism built around two modules. The first, a “lightning indexer”, prioritizes key excerpts from the context window. The second, a “fine-grained token selection system”, filters specific tokens within those excerpts to load into the model’s limited attention window. By combining these methods, Sparse Attention enables the model to process long inputs with reduced server demands.

Preliminary testing suggests the system could cut API costs in half for long-context scenarios. While more robust third-party testing is needed, the model is open-weight and freely available, making independent verification likely in the near term.

The breakthrough adds to a growing set of innovations targeting the cost of inference, the ongoing expense of running large models, distinct from the upfront cost of training. DeepSeek’s approach focuses on optimizing the transformer architecture itself for greater efficiency.

DeepSeek, based in China, has positioned itself as an unconventional player in the global AI race. Earlier this year, it gained attention with its R1 model, trained primarily through reinforcement learning at a fraction of U.S. competitors’ costs. While R1 did not deliver the predicted upheaval in AI training, V3.2-exp may influence how providers approach the persistent problem of inference expenses.

Though unlikely to generate the same level of debate as R1, DeepSeek’s sparse attention method could still offer practical lessons for U.S. and global AI developers struggling with long-context efficiency.

Trending →

Bevel Raises $10M to Build AI-Powered Health Companion

Canva launches new AI design model and marketing tools

Threads Adds Reply Approvals and New Activity Filters

Figma Acquires AI Startup Weavy to Launch “Figma Weave”

WhatsApp Adds Passkey Support for Encrypted Backups

DeepSeek Unveils V3.2-exp Model with Sparse Attention to Cut AI Inference Costs

Chinese AI startup DeepSeek has launched V3.2-exp, an open-weight model featuring a novel sparse attention system to slash long-context inference costs.

You Might Also Like ↷

Apple Testing Smarter Siri That Can Control Apps With Just Your Voice

Uber Eats Adds Real-Time Order Chat and AI Menu Tools to Fix Takeout Mistakes

German Court Orders Apple to Drop Carbon-Neutral Watch Claim

Anthropic Names Former Stripe CTO Rahul Patil as New Chief Technical Officer

Trending →

Our Newsletter