Using LLMs to amplify human labeling and improve Dash search relevance

2026-02-26

13 min read

by Dmitriy Meyerzon

Tags:

LLM

models

Machine Learning

Dash

RAG

Read Original

Get the latest tech trends every morning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Endigest AI Core Summary

This post explains how Dropbox Dash trains its search ranking model by combining small-scale human labeling with LLM-generated relevance judgments to produce training data at scale.

•Dash follows a RAG pattern where enterprise search retrieves candidate documents before an LLM generates answers, making search ranking quality critical to overall response quality
•The ranking model uses XGBoost trained on query-document pairs annotated with 1-5 relevance scores, where higher scores indicate closer alignment with user intent
•Human labeling is expensive and hard to scale, while LLMs offer cheaper and more consistent relevance judgments across large multilingual datasets
•A small human-labeled dataset is used to tune LLM prompts and validate quality thresholds before deploying the LLM to generate hundreds of thousands to millions of training labels

Using LLMs to amplify human labeling and improve Dash search relevance

Get the latest tech trends every morning

Endigest AI Core Summary

Related Articles

Developer's guide to Gemini Enterprise and A2UI integration

Boston Children’s uses AI to unlock new diagnoses

How Braintrust turns customer requests into code with Codex

May 29, 2026