Concept

Why Search Engines Exist

The problem with scanning

Imagine a table with 50 million product descriptions and a user types wireless noise cancelling headphones. A relational query like WHERE description LIKE '%wireless%' forces a full table scan: the database reads every row and runs a substring match. A leading wildcard (%term%) cannot use a B-tree index, so cost grows linearly with data size and there is no notion of relevance — every match is equal.

A search engine inverts the problem. Instead of asking which words does this document contain? it pre-computes which documents contain this word? That data structure is the inverted index, and it turns a multi-word query into a few index lookups plus a merge — typically sub-millisecond even over billions of documents.

What a search engine adds over a database

  • Speed — term lookups instead of scans.
  • Relevance ranking — results ordered by how well they match, not just whether they match.
  • Linguistic matchingrunning matches run, colour matches color, typos and synonyms can be handled.
  • Rich query types — phrase, proximity, fuzzy, faceted filters, autocomplete.

The trade-off: a search index is a derived, eventually-consistent copy of your source of truth. You still keep the canonical data in a primary store (SQL or a document DB) and feed changes into the search engine.