HomeResourcesCase study
Case study

Spotify Finds Your Next Song by Being Wrong

By The SDL team·3 min read·Updated Jun 8, 2026

Exact nearest-neighbor search doesn't scale to billions of vectors. HNSW trades a sliver of accuracy for 10x speed and 4x less memory.

To recommend the next song, Spotify has to find the most similar items among billions of vectors — in milliseconds. The trick that makes it possible is admitting you don't actually need the exact answer.

Spotify long relied on Annoy, an approximate-nearest-neighbor library, but it fell behind for high-dimensional embeddings at their scale. They built Voyager on a newer algorithm (HNSW) — and the conceptual heart of the story is why "approximate" is a feature, not a compromise.

Plain English

Modern recommendation represents every song, user, and podcast as a vector — a list of numbers capturing its 'meaning.' Similar items have nearby vectors. To recommend, you find the vectors closest to yours. The exact way is to compare your vector against every other one — perfectly accurate, and hopelessly slow when there are billions.

Approximate nearest neighbor (ANN) search makes a deal: give up a tiny bit of accuracy (maybe you get 98 of the true top-100) in exchange for being orders of magnitude faster. For recommendations, that's a phenomenal trade — nobody notices a near-perfect list, and everybody notices a slow app.

Now the engineering

Voyager is built on HNSW (Hierarchical Navigable Small World) graphs. Instead of scanning all vectors, HNSW arranges them in a layered graph: a sparse top layer for big jumps across the space, denser layers below for fine-grained hops. A search starts at the top, greedily moves toward the query, and descends — touching only a tiny fraction of the dataset to arrive at the near-neighbors. It's navigation, not scanning.

Against their old Annoy setup, Voyager delivered roughly 10x faster search at equal recall (or up to 50% more accurate at equal speed) using about 4x less memory. It's been in production since 2022 and is now open source.

Approximate nearest neighbor: skip 99% of the candidates, on purpose Exact search compare query to EVERY vector → perfect, but too slow at billions HNSW graph: navigate, don’t scan coarse top layer hop greedily toward the query — touch a tiny fraction of nodes Voyager (HNSW) vs old Annoy: ~10× faster at equal recall (or up to 50% more accurate at equal speed) · ~4× less memory Trade a sliver of accuracy for orders-of-magnitude speed — the right deal for recommendations.
Navigate the graph instead of scanning the set. HNSW reaches the near-neighbors by touching a sliver of the data — trading exactness for orders-of-magnitude speed.

Worth knowing

The memory result matters as much as the speed. At billions of vectors, the index must fit in RAM to be fast, so a 4x memory reduction directly lowers how many machines you need — an accuracy/speed optimisation that's also a cost optimisation. In vector search, recall, latency, and memory form a triangle you tune together; you rarely move one without the others.

The gap it reveals

As vector search becomes standard (recommendations, RAG, semantic search), the naive mental model is 'find the closest vectors' — as if exact search were feasible. The realisation is that at scale you must go approximate, that ANN deliberately trades a little recall for huge speed/memory wins, and that recall/latency/memory are a single tunable triangle. That framing is what separates someone who's deployed vector search from someone who's read about it.

In the interview room

Any design with recommendations, semantic search, or RAG should mention ANN explicitly: "I'd use an HNSW index for approximate nearest-neighbor search — exact search doesn't scale to billions of vectors, and I can trade a few points of recall for an order of magnitude in latency and memory." Naming the recall/latency/memory trade-off shows you understand the engine, not just the buzzword.

The reframe

A great deal of scaling is the courage to give up perfection you don't need. The exact nearest neighbors are theoretically better, but a 98%-accurate answer delivered 10x faster wins every real product. The engineering maturity is identifying where 'approximately right, immediately' beats 'exactly right, eventually' — and in recommendations, it almost always does.

Stop computing the exact answer nobody can wait for. Compute the near-perfect one they'll actually feel.

Primary source →
engineering.atspotify.com — Introducing Voyager: Spotify's New Nearest-Neighbor Search Library

Want feedback on your design?

Related articles