Dropbox Tech Blog  logo Dropbox Tech Blog
|Machine Learning

Half-Quadratic Quantization of large machine learning models

2025-10-22
30 min read
1
by Hicham Badri,Appu Shaji,Craig Wilhite,Josh Clemm,Jason Shang,Artem Nabirkin,Dropbox Team,Ameya Bhatawdekar,Sean-Michael Lewis,Appu Shaji,Hicham Badri,Appu Shaji

Endigest AI Core Summary

This post introduces Half-Quadratic Quantization (HQQ), a calibration-free quantization method for large machine learning models that achieves calibration-based quality at data-free speeds.

  • HQQ minimizes weight quantization error using a sparsity-promoting lp<1 norm, modeling outliers via a hyper-Laplacian distribution instead of squared error
  • A Half-Quadratic solver splits the non-convex optimization into two closed-form sub-problems solved via alternating optimization, avoiding gradient descent entirely
  • HQQ quantizes Llama-2-70B in under 5 minutes, over 50x faster than GPTQ, and over 100x faster than autograd-based approaches for Llama-2-7B
  • At 2-bit precision, HQQ on Llama-2-70B achieves lower perplexity than full-precision Llama-2-13B at comparable memory usage
  • The method generalizes beyond LLMs, also benchmarked on OpenCLIP Vision Transformer (ViT) models evaluated on ImageNet zero-shot accuracy