The Find-Nearby Problem
Why is 'find drivers near me' hard?
Every ride-hailing app, store locator, and dating app needs the same primitive: given a point (your latitude/longitude), return all entities within some radius, or the k nearest ones. This is called a range query (everything in a box/circle) or a kNN query (the k nearest neighbors).
The naive approach scans every row and computes the distance: SELECT * FROM drivers ORDER BY distance(lat, lng, my_lat, my_lng) LIMIT 10. With millions of drivers this is a full table scan on every request — completely unusable.
Why a normal index does not save you
The obvious fix is to add a B-tree index on lat and another on lng. But a B-tree only helps with a query on one dimension at a time. You can quickly narrow to all rows in a latitude band, but those rows stretch around the entire planet at that latitude. The index on longitude is independent, so the database cannot combine them efficiently — the two filtered sets are huge and must be intersected.
- Range queries over two independent B-trees scan far too many candidate rows.
- kNN is worse: 'nearest' depends on both dimensions jointly, so single-column ordering is useless.
The core insight: latitude and longitude are two-dimensional and correlated for the query, but a B-tree is fundamentally one-dimensional. We need a structure that preserves spatial locality — points that are close on the map should be close in the index. That is exactly what geohashes, quadtrees, and R-trees provide.