To recommend the next song, Spotify has to find the most similar items among billions of vectors — in milliseconds. The trick that makes it possible is admitting you don't actually need the exact answer.
Spotify long relied on Annoy, an approximate-nearest-neighbor library, but it fell behind for high-dimensional embeddings at their scale. They built Voyager on a newer algorithm (HNSW) — and the conceptual heart of the story is why "approximate" is a feature, not a compromise.
Modern recommendation represents every song, user, and podcast as a vector — a list of numbers capturing its 'meaning.' Similar items have nearby vectors. To recommend, you find the vectors closest to yours. The exact way is to compare your vector against every other one — perfectly accurate, and hopelessly slow when there are billions.
Approximate nearest neighbor (ANN) search makes a deal: give up a tiny bit of accuracy (maybe you get 98 of the true top-100) in exchange for being orders of magnitude faster. For recommendations, that's a phenomenal trade — nobody notices a near-perfect list, and everybody notices a slow app.
Voyager is built on HNSW (Hierarchical Navigable Small World) graphs. Instead of scanning all vectors, HNSW arranges them in a layered graph: a sparse top layer for big jumps across the space, denser layers below for fine-grained hops. A search starts at the top, greedily moves toward the query, and descends — touching only a tiny fraction of the dataset to arrive at the near-neighbors. It's navigation, not scanning.
Against their old Annoy setup, Voyager delivered roughly 10x faster search at equal recall (or up to 50% more accurate at equal speed) using about 4x less memory. It's been in production since 2022 and is now open source.
Worth knowing
The memory result matters as much as the speed. At billions of vectors, the index must fit in RAM to be fast, so a 4x memory reduction directly lowers how many machines you need — an accuracy/speed optimisation that's also a cost optimisation. In vector search, recall, latency, and memory form a triangle you tune together; you rarely move one without the others.
The gap it reveals
As vector search becomes standard (recommendations, RAG, semantic search), the naive mental model is 'find the closest vectors' — as if exact search were feasible. The realisation is that at scale you must go approximate, that ANN deliberately trades a little recall for huge speed/memory wins, and that recall/latency/memory are a single tunable triangle. That framing is what separates someone who's deployed vector search from someone who's read about it.
In the interview room
Any design with recommendations, semantic search, or RAG should mention ANN explicitly: "I'd use an HNSW index for approximate nearest-neighbor search — exact search doesn't scale to billions of vectors, and I can trade a few points of recall for an order of magnitude in latency and memory." Naming the recall/latency/memory trade-off shows you understand the engine, not just the buzzword.
The reframe
A great deal of scaling is the courage to give up perfection you don't need. The exact nearest neighbors are theoretically better, but a 98%-accurate answer delivered 10x faster wins every real product. The engineering maturity is identifying where 'approximately right, immediately' beats 'exactly right, eventually' — and in recommendations, it almost always does.
Stop computing the exact answer nobody can wait for. Compute the near-perfect one they'll actually feel.
Primary source →
engineering.atspotify.com — Introducing Voyager: Spotify's New Nearest-Neighbor Search Library