This post describes Square's development of a RoBERTa-based ML model for accurately categorizing merchants into business types.
•Square uses merchant categorization for product personalization, business strategy, growth targeting, product eligibility, and accurate Merchant Category Codes (MCCs) for payment processing
•Historical self-selection during onboarding is highly inaccurate due to merchants rushing through signup, overly granular categories, and unclear category definitions
•Previous ML approaches relied on outdated methods and over-indexed on self-selected labels as ground truth
•The new RoBERTa model is trained on 20,000+ manually reviewed sellers with high-quality ground truth labels
•Data preprocessing includes removing auto-created services, ranking catalog items by purchase frequency, and formatting inputs as structured prompt strings