The Cloudflare Blog  logo The Cloudflare Blog
|Data Engineering

Announcing support for GROUP BY, SUM, and other aggregation queries in R2 SQL

2025-12-18
11 min read
2
by Jérôme Schneider

Endigest AI Core Summary

Cloudflare announces support for GROUP BY, SUM, and other aggregation queries in R2 SQL, its serverless analytics query engine over R2 Data Catalog.

  • Aggregations split into two phases: pre-aggregate computation on worker nodes, then final merge at the coordinator (scatter-gather)
  • Pre-aggregates allow horizontal scaling: e.g., count(*) pre-aggregate is a partial row count, avg(value) stores sum and count separately
  • Scatter-gather fails for ORDER BY/HAVING on aggregates when grouping by high-cardinality columns, as local top-N results can miss global leaders
  • Shuffling solves this via deterministic hash partitioning: each worker routes rows to the same destination worker based on the GROUP BY key hash
  • A synchronization barrier ensures all workers finish sending data before any worker computes final aggregates
Tags:
#R2
#Data
#Edge Computing
#Rust
#Serverless
#SQL