DeepSeek-V4 optimizes 1M-token context for agents through efficient attention and specialized training.
- •Hybrid attention (CSA/HCA) reduces KV cache to 2% of standard grouped query attention
- •V4-Pro uses 27% inference FLOPs and 10% KV cache of V3.2; V4-Flash uses 10% and 7%
- •Preserves reasoning across tool-call boundaries with XML-based tool schema
- •RL training via DSec sandbox with fast image loading and trajectory replay
- •Competitive agent benchmarks: 67.9 Terminal Bench, 80.6 SWE Verified, 73.6 MCPAtlas
This summary was automatically generated by AI based on the original article and may not be fully accurate.