Learn System Design

Master core concepts before your next interview. Work through modules at your own pace.

Learning is free. Sign up to track your progress, save completed modules, and practice with an AI interviewer.

44 modules
EasyDistributed Systems

Scalability Fundamentals

Learn how systems grow to handle more load through vertical and horizontal scaling, statelessness, load distribution, replicas, caching, and the AKF Scale Cube.

~15 min · 9 sectionsStart →
EasyTraffic & Entry Layer

Load Balancing: From Core Concepts to High Availability

Master L4 vs L7 routing, sticky session pitfalls, and active-active HA design for load balancers.

~15 min · 3 sectionsStart →
EasyTraffic & Entry Layer

API Gateway: The Front Door for Microservices

Learn what an API Gateway does, when to use unified vs BFF patterns, and how to avoid its two core failure modes.

~10 min · 3 sectionsStart →
HardDatabases

Database Sharding & Partitioning

Learn how to split a database across many machines — partitioning strategies, shard-key selection, hotspots, query routing, cross-shard queries, and zero-downtime resharding.

~25 min · 9 sectionsStart →
MediumDatabases

Database Indexing

How indexes turn slow full-table scans into fast lookups, the data structures behind them (B+tree, hash, LSM-tree), and when indexing helps versus hurts.

~18 min · 11 sectionsStart →
EasyCaching & CDN

Caching: The Performance Multiplier

Understand why caching exists, where to place it, the cache-aside pattern, and how to choose the right invalidation strategy.

~15 min · 3 sectionsStart →
MediumAPI Design

API Design: REST vs gRPC vs GraphQL

Compare REST, gRPC, and GraphQL and learn how to choose, version, paginate, and design idempotent APIs for system design interviews.

~18 min · 10 sectionsStart →
MediumCaching & CDN

Cache Stampede, TTL Jitter & CDN Strategy

Defend against cache stampede with TTL jitter and probabilistic early expiration, and choose the right cache layer (CDN vs Redis).

~15 min · 3 sectionsStart →
EasyDatabases

Database Fundamentals: SQL vs NoSQL & Scaling Limits

Understand what a database actually does, when to choose SQL vs NoSQL, and why vertical scaling is a strategic trap.

~12 min · 3 sectionsStart →
EasyNetworking

Content Delivery Networks (CDN)

Understand how CDNs cut latency by serving content from edge servers near users — pull vs push, cache-control and TTLs, invalidation, origin shield, anycast routing, and cache hit ratio.

~12 min · 10 sectionsStart →
MediumDatabases

Database Scaling: Read Replicas, Write Bottlenecks & Indexing

Scale reads with replicas, diagnose the four write bottlenecks, and use indexing as the most cost-effective optimization.

~18 min · 3 sectionsStart →
MediumArchitecture

Microservices vs Monolith

Understand monoliths and microservices, how to draw service boundaries, how services communicate and own data, and when to choose each.

~15 min · 11 sectionsStart →
HardDistributed Systems

Consensus: Raft & Paxos

How distributed systems agree on a single value or replicated log despite crashes and network failures.

~25 min · 9 sectionsStart →
MediumMessaging

Messaging Systems: Queues, Pub/Sub & Async Processing

Understand when to go asynchronous, the queue vs stream distinction, and how to draw the sync/async boundary correctly.

~15 min · 3 sectionsStart →
HardDistributed Systems

Distributed Transactions: 2PC & Saga

How to keep data consistent across multiple services and databases using Two-Phase Commit and the Saga pattern.

~22 min · 9 sectionsStart →
HardMessaging

Messaging Guarantees: Delivery, Ordering & Dead-Letter Queues

Master at-least-once vs exactly-once semantics, idempotent consumers, and how dead-letter queues prevent poison pills.

~18 min · 3 sectionsStart →
MediumDistributed Systems

Idempotency & Exactly-Once Processing

Learn how distributed systems survive retries and duplicates using delivery guarantees, idempotency keys, and dedup stores.

~15 min · 10 sectionsStart →
MediumConsistency

CAP Theorem & Eventual Consistency in Practice

Apply the CAP theorem as a practical design tool, not a definition to recite. Use polyglot persistence to make different CAP choices per component.

~15 min · 3 sectionsStart →
MediumNetworking

Real-Time: WebSockets, SSE & Polling

How to push data to clients in real time using polling, Server-Sent Events, and WebSockets, and how to scale persistent connections across servers.

~15 min · 10 sectionsStart →
MediumRate Limiting & Security

Rate Limiting & Throttling: Protecting Your System

Compare five rate limiting algorithms and learn when to reject (rate limit) vs delay (throttle) excess traffic.

~15 min · 3 sectionsStart →
MediumDatabases

Search Systems & Inverted Indexes

How full-text search engines build inverted indexes, rank results with BM25, and scale via shards and replicas.

~18 min · 11 sectionsStart →
MediumRate Limiting & Security

Authentication & Authorization: The System's Gatekeepers

Distinguish AuthN from AuthZ, choose between session-based and JWT patterns, and place auth correctly in a microservices architecture.

~12 min · 3 sectionsStart →
EasyScaling Patterns

Scaling Reads: The Diagnostic Ladder

Apply the 6-step Diagnostic Ladder to systematically resolve read-heavy bottlenecks with the right tool at each rung.

~12 min · 3 sectionsStart →
HardArchitecture

Event-Driven Architecture: Event Sourcing & CQRS

Learn how event-driven systems use immutable event logs (Event Sourcing) and separate read/write models (CQRS) to build auditable, scalable architectures.

~22 min · 9 sectionsStart →
MediumScaling Patterns

Replication Strategies: Full Spectrum

Compare single-leader, multi-leader, and leaderless replication topologies, conflict resolution strategies, and when to use each.

~15 min · 3 sectionsStart →
EasyStorage

Blob & Object Storage

How object storage like S3 stores blobs as buckets and keys, and when to reach for it over block or file storage.

~12 min · 12 sectionsStart →
MediumData Structures

Bloom Filters & Probabilistic Data Structures

Learn how Bloom filters and other probabilistic structures trade a tiny error rate for massive savings in space and lookup cost.

~15 min · 10 sectionsStart →
EasyScaling Patterns

Edge Caching & CDN Strategy

Understand PoP routing, cache invalidation at the edge, and content-addressable URLs for zero-downtime cache busting.

~10 min · 3 sectionsStart →
EasyObservability

Observability: Logs, Metrics & Traces

Learn the three pillars of observability — logs, metrics, and traces — and how SLIs, SLOs, and symptom-based alerting keep distributed systems debuggable in production.

~12 min · 10 sectionsStart →
MediumScaling Patterns

Scaling Writes: Four Strategies

Master sharding, async write queues, CQRS/Event Sourcing, and LSM-Tree databases as complementary strategies for write-heavy systems.

~18 min · 3 sectionsStart →
MediumObservability

Circuit Breakers & Resilience Patterns

Stop one slow or failing dependency from cascading into a full system outage using circuit breakers, timeouts, retries, bulkheads, and graceful degradation.

~15 min · 9 sectionsStart →
HardScaling Patterns

Write Amplification & Write Queues

Understand write amplification sources across LSM-Trees, B-Trees, and SSDs, and design write queue architectures with correct delivery guarantees.

~15 min · 3 sectionsStart →
EasyScaling Patterns

Batch Processing vs Stream Processing

Know when to reach for batch processing vs stream processing, and how to build resilient batch jobs with idempotency and DLQ handling.

~12 min · 3 sectionsStart →
EasyOperations

Deployment: Blue-Green, Canary & Rolling

How to ship new versions to production safely using recreate, rolling, blue-green, and canary strategies with health gates and automated rollback.

~12 min · 10 sectionsStart →
HardScaling Patterns

Multi-Region Systems: The Four-Step Playbook

Design global systems using the four-step playbook: topology, data strategy, traffic routing, and failover planning.

~20 min · 3 sectionsStart →
MediumArchitecture

API Gateway & Service Mesh

How an API gateway manages client (north-south) traffic at the edge while a service mesh handles service-to-service (east-west) traffic inside the cluster.

~15 min · 10 sectionsStart →
MediumScaling Patterns

Data Sovereignty & Compliance Architecture

Design systems that satisfy data residency and localization requirements using geographic sharding, regional KMS, and sovereignty-aware pipelines.

~12 min · 3 sectionsStart →
MediumDatabases

Geospatial Indexing: Geohash & Quadtrees

How geohashes, quadtrees, and R-trees make find-nearby and proximity queries fast at scale.

~15 min · 10 sectionsStart →
HardDistributed Systems

Concurrency Control & Distributed Locks

How to coordinate concurrent writers safely using optimistic and pessimistic concurrency, database row locks, and distributed locks with TTLs and fencing tokens.

~22 min · 9 sectionsStart →
MediumScaling Patterns

Latency vs Throughput: Optimizing the Right Axis

Understand the latency-throughput tension and apply the correct optimization playbook for latency-sensitive vs throughput-oriented workloads.

~12 min · 3 sectionsStart →
HardScaling Patterns

Conflict Resolution in Multi-Leader Systems

Master the four conflict resolution strategies — LWW, application merge, CRDTs, and vector clocks — and know when each is appropriate.

~15 min · 3 sectionsStart →
MediumResilience

Reliability Engineering & Designing for Failure

Use MTBF, MTTR, and availability math to frame reliability, then apply SPOFs, bulkheads, graceful degradation, and the DR spectrum.

~18 min · 3 sectionsStart →
HardResilience

Circuit Breakers, Retries & Cascading Failures

Implement the three-state circuit breaker, exponential backoff with jitter, and prevent cascading failures through bulkheads and load shedding.

~20 min · 3 sectionsStart →
MediumResilience

Chaos Engineering: Proactive Resilience Testing

Apply the five principles of chaos engineering, run targeted fault injection experiments, and build the organizational culture for resilience.

~10 min · 3 sectionsStart →