Why Search Engines Exist
The problem with scanning
Imagine a table with 50 million product descriptions and a user types wireless noise cancelling headphones. A relational query like WHERE description LIKE '%wireless%' forces a full table scan: the database reads every row and runs a substring match. A leading wildcard (%term%) cannot use a B-tree index, so cost grows linearly with data size and there is no notion of relevance — every match is equal.
A search engine inverts the problem. Instead of asking which words does this document contain? it pre-computes which documents contain this word? That data structure is the inverted index, and it turns a multi-word query into a few index lookups plus a merge — typically sub-millisecond even over billions of documents.
What a search engine adds over a database
- Speed — term lookups instead of scans.
- Relevance ranking — results ordered by how well they match, not just whether they match.
- Linguistic matching —
runningmatchesrun,colourmatchescolor, typos and synonyms can be handled. - Rich query types — phrase, proximity, fuzzy, faceted filters, autocomplete.
The trade-off: a search index is a derived, eventually-consistent copy of your source of truth. You still keep the canonical data in a primary store (SQL or a document DB) and feed changes into the search engine.