Concept

Why Indexes Exist

The problem: full table scans are O(n)

Without an index, answering a query like SELECT * FROM users WHERE email = 'a@b.com' forces the database to read every row and compare it — a full table scan. On a table with 100 million rows that is 100 million comparisons and a lot of disk I/O. The cost grows linearly with table size: O(n).

An index is a separate, sorted (or hashed) auxiliary data structure that maps column values to the location of the matching rows. Instead of scanning everything, the database walks the index to find exactly where the data lives. For a sorted tree index this turns the lookup into O(log n) — on 100 million rows, roughly 27 steps instead of 100 million.

The mental model

Think of the index at the back of a textbook. To find every page that mentions caching, you do not read the whole book; you jump to C in the alphabetical index and it points you straight to the pages. The book pages are the table rows; the index at the back is the database index.

  • Speeds up: equality lookups, range scans, sorting (ORDER BY), joins, and uniqueness checks.
  • Costs: extra storage, and slower writes — every INSERT/UPDATE/DELETE must also update the index.

That trade — faster reads in exchange for slower writes and more space — is the theme of this entire module.