Gemini Embedding 2 is a multimodal embedding model that maps text, images, video, audio, and documents into a single semantic space.
- •Processes up to 8,192 text tokens, 6 images, 120 seconds of video, 180 seconds of audio, and 6 PDFs in a single call
- •Enables agentic RAG for multi-step reasoning tasks with improved accuracy
- •Real-world improvements: Harvey 3% Recall@20 increase, Supermemory 40% search accuracy improvement
- •Supports multimodal search, reranking, clustering, classification, and anomaly detection
- •Uses Matryoshka Representation Learning to compress vectors from 3,072 to 768 dimensions for cost efficiency
This summary was automatically generated by AI based on the original article and may not be fully accurate.