This post introduces Coban, Grab's platform for real-time Kafka stream data quality monitoring using user-defined data contracts with syntactic and semantic test rules.
- •Data contracts define Kafka stream schema agreements, field-level semantic rules (string patterns, number ranges, constant values), and observability metadata for alerting.
- •LLM-based recommendations assist users in predicting semantic test rules from Kafka stream schemas and anonymized sample data, reducing manual rule definition overhead.
- •The Test Runner uses FlinkSQL as the compute engine, converting human-readable semantic rules into inverse SQL queries to identify records that violate defined contracts.
- •Problematic data events are published to a dedicated Kafka topic and sinked to AWS S3; Grab's Genchi platform sends Slack notifications with sample data links and bad record counts.
- •The solution actively monitors 100+ critical Kafka topics and enables immediate identification and halting of invalid data pro