When a System Is Read-Heavy
What Makes a System Read-Heavy?
A system is read-heavy when its read/write ratio exceeds roughly 10:1 — SELECT statements dominate over INSERT and UPDATE. This pattern is typical of consumption-based business models: content platforms, product catalogs, dashboards, and social feeds. Users browse far more than they create.
The primary danger is that standard optimizations applied to write-balanced systems are wasteful or even counter-productive in a read-heavy context. Knowing the right lever to pull at the right time is the skill.
The 6-Step Diagnostic Ladder
The ladder is a sequential decision process. Work through each rung in order and stop at the first step that resolves the bottleneck. Jumping ahead wastes engineering time and introduces unnecessary complexity.
- CPU-bound on reads? → Add indexes. A missing index on a WHERE or JOIN column forces a full table scan on every read. Adding the correct index can eliminate 95% of the load. Always check this first — it is free.
- Queries fast but DB still overloaded? → Add read replicas. If indexed queries run quickly but the database is still saturated, the problem is volume. Read replicas distribute SELECT traffic across multiple servers. All writes still go to the primary.
- Same data fetched repeatedly? → Add Redis caching. If the same query result is computed thousands of times per minute, cache the result in Redis. Reduces database calls to near-zero for hot data sets.
- App servers overwhelmed? → Horizontal scale behind a load balancer. If the application servers — not the database — are the bottleneck, add instances and distribute traffic with a load balancer. Requires stateless app servers.
- Complex multi-table JOINs bottleneck? → Denormalize. If the queries themselves are the problem — slow because they JOIN many tables — flatten the data model. Store pre-joined data. Trades write complexity for read speed.
- Analytical aggregations? → Move to a columnar database. If the use case is large-scale aggregations over many rows (GROUP BY, SUM, AVG across millions of records), row-oriented databases are structurally mismatched. Move those workloads to Redshift, BigQuery, or Snowflake.
Stop at the first rung that solves the bottleneck. Don't skip ahead. Each subsequent rung is more expensive and more complex to operate.