Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
Endigest AI Core Summary
Falcon Perception is a 0.6B-parameter early-fusion Transformer for open-vocabulary object grounding and segmentation from natural language prompts.
•Uses hybrid attention mask with bidirectional image tokens and causal text/task tokens in a single backbone, eliminating separate vision and fusion stages.
•Implements Chain-of-Perception: sequential prediction of coordinate → size → segmentation with Fourier feature encoding for precise localization.
•Lightweight output heads compute masks via dot products with upsampled image features, avoiding Hungarian matching and separate mask decoders.
•Trained on 54M images with 195M positive expressions and 488M hard negatives through three-stage curriculum with multi-teacher distillation.
•
Achieves 68.0 Macro-F1 on SA-Co benchmark (vs 62.3 for SAM 3), with remaining gap primarily in presence calibration (MCC 0.64 vs 0.82).
This summary was automatically generated by AI based on the original article and may not be fully accurate.