Granite 4.1 comprises dense, decoder-only LLMs in three sizes (3B, 8B, 30B) trained on 15 trillion tokens with multi-stage optimization techniques.
- •Decoder-only transformer architecture with Grouped Query Attention, Rotary Position Embeddings, SwiGLU activations, and RMSNorm
- •Five-phase pre-training progression from general web data to curated high-quality and domain-specific content
- •Context window extended to 512K tokens through staged long-context training phases
- •Supervised fine-tuning with LLM-as-Judge quality framework evaluating 4.1M samples across six dimensions
- •Multi-stage reinforcement learning using on-policy GRPO with DAPO loss for domain performance optimization
This summary was automatically generated by AI based on the original article and may not be fully accurate.