Amazon ESCI Search Engine – AI/ML Engineer

See Project with Full Code on GitHub

🎯 The Objective

This project builds an industry-style search and relevance pipeline using the rigorous Amazon Shopping Queries Dataset (ESCI). The goal is to retrieve the most relevant products for a given e-commerce query by implementing a high-performance, two-stage search architecture.

🏗️ The Architecture & Methodology

The pipeline was engineered to balance rapid candidate retrieval with high-precision reranking, utilizing advanced representation learning:

Matryoshka Fine-Tuning: The core engineering addition was fine-tuning a Matryoshka bi-encoder using MultipleNegativesRankingLoss (MNRL). This forces the most critical semantic information into the earliest dimensions of the vector, enabling the use of highly compressed 64-dimensional embeddings without destroying retrieval quality.

Stage 1 (Candidate Generation): Implemented a Hybrid Retrieval stack to maximize coverage. This combined a Dense bi-encoder, lexical search (BM25) restricted strictly to product titles for precision, and learned sparse expansion (SPLADE) to capture synonyms. These methods were robustly merged using Weighted RRF fusion.

Stage 2 (Reranking): Applied a Cross-Encoder to the retrieved candidates to heavily optimize top-K precision.

📊 Key Performance Metrics

By leveraging a Hybrid architecture at just 64 dimensions, the system achieved phenomenal e-commerce metrics entirely on consumer-grade hardware.

Reranker nDCG@20: 0.5395

Retrieval Recall@200 (64-dim): 81.25%

Retrieval QPS (Queries Per Second): 70.51

💡 Core Insights & Business Impact

Solving Dimensionality Collapse: Standard baseline embedding models truncated to 64 dimensions suffered a catastrophic recall collapse, dropping to a Recall@200 of 0.4270. The Matryoshka fine-tuned model successfully preserved semantic meaning, jumping to a dense-only Recall of 0.7392 at the exact same size.
Drastic Cost Reduction: Serving 64-dimensional vectors rather than standard 768-dimensional vectors significantly reduces index size, memory footprint, and latency—translating to massive infrastructure cost savings in a production environment.
Strategic Retrieval Rules: By treating both “Exact” and “Substitute” items as positives during candidate generation, the pipeline maximizes coverage and surfaces profitable product alternatives rather than aggressively filtering them out.

⚙️ Technical Stack

Languages & Tools: Python, SentenceTransformers, FAISS, MLflow.
Techniques: Bi-encoders, Cross-encoders, Hybrid Retrieval, Learned Sparse Expansion (SPLADE), Matryoshka Representation Learning, Contrastive Learning (MNRL).