Master core concepts before your next interview. Work through modules at your own pace.
Learning is free. Sign up to track your progress, save completed modules, and practice with an AI interviewer.
Learn how systems grow to handle more load through vertical and horizontal scaling, statelessness, load distribution, replicas, caching, and the AKF Scale Cube.
Master L4 vs L7 routing, sticky session pitfalls, and active-active HA design for load balancers.
Learn what an API Gateway does, when to use unified vs BFF patterns, and how to avoid its two core failure modes.
Learn how to split a database across many machines — partitioning strategies, shard-key selection, hotspots, query routing, cross-shard queries, and zero-downtime resharding.
How indexes turn slow full-table scans into fast lookups, the data structures behind them (B+tree, hash, LSM-tree), and when indexing helps versus hurts.
Understand why caching exists, where to place it, the cache-aside pattern, and how to choose the right invalidation strategy.
Compare REST, gRPC, and GraphQL and learn how to choose, version, paginate, and design idempotent APIs for system design interviews.
Defend against cache stampede with TTL jitter and probabilistic early expiration, and choose the right cache layer (CDN vs Redis).
Understand what a database actually does, when to choose SQL vs NoSQL, and why vertical scaling is a strategic trap.
Understand how CDNs cut latency by serving content from edge servers near users — pull vs push, cache-control and TTLs, invalidation, origin shield, anycast routing, and cache hit ratio.
Scale reads with replicas, diagnose the four write bottlenecks, and use indexing as the most cost-effective optimization.
Understand monoliths and microservices, how to draw service boundaries, how services communicate and own data, and when to choose each.
How distributed systems agree on a single value or replicated log despite crashes and network failures.
Understand when to go asynchronous, the queue vs stream distinction, and how to draw the sync/async boundary correctly.
How to keep data consistent across multiple services and databases using Two-Phase Commit and the Saga pattern.
Master at-least-once vs exactly-once semantics, idempotent consumers, and how dead-letter queues prevent poison pills.
Learn how distributed systems survive retries and duplicates using delivery guarantees, idempotency keys, and dedup stores.
Apply the CAP theorem as a practical design tool, not a definition to recite. Use polyglot persistence to make different CAP choices per component.
How to push data to clients in real time using polling, Server-Sent Events, and WebSockets, and how to scale persistent connections across servers.
Compare five rate limiting algorithms and learn when to reject (rate limit) vs delay (throttle) excess traffic.
How full-text search engines build inverted indexes, rank results with BM25, and scale via shards and replicas.
Distinguish AuthN from AuthZ, choose between session-based and JWT patterns, and place auth correctly in a microservices architecture.
Apply the 6-step Diagnostic Ladder to systematically resolve read-heavy bottlenecks with the right tool at each rung.
Learn how event-driven systems use immutable event logs (Event Sourcing) and separate read/write models (CQRS) to build auditable, scalable architectures.
Compare single-leader, multi-leader, and leaderless replication topologies, conflict resolution strategies, and when to use each.
How object storage like S3 stores blobs as buckets and keys, and when to reach for it over block or file storage.
Learn how Bloom filters and other probabilistic structures trade a tiny error rate for massive savings in space and lookup cost.
Understand PoP routing, cache invalidation at the edge, and content-addressable URLs for zero-downtime cache busting.
Learn the three pillars of observability — logs, metrics, and traces — and how SLIs, SLOs, and symptom-based alerting keep distributed systems debuggable in production.
Master sharding, async write queues, CQRS/Event Sourcing, and LSM-Tree databases as complementary strategies for write-heavy systems.
Stop one slow or failing dependency from cascading into a full system outage using circuit breakers, timeouts, retries, bulkheads, and graceful degradation.
Understand write amplification sources across LSM-Trees, B-Trees, and SSDs, and design write queue architectures with correct delivery guarantees.
Know when to reach for batch processing vs stream processing, and how to build resilient batch jobs with idempotency and DLQ handling.
How to ship new versions to production safely using recreate, rolling, blue-green, and canary strategies with health gates and automated rollback.
Design global systems using the four-step playbook: topology, data strategy, traffic routing, and failover planning.
How an API gateway manages client (north-south) traffic at the edge while a service mesh handles service-to-service (east-west) traffic inside the cluster.
Design systems that satisfy data residency and localization requirements using geographic sharding, regional KMS, and sovereignty-aware pipelines.
How geohashes, quadtrees, and R-trees make find-nearby and proximity queries fast at scale.
How to coordinate concurrent writers safely using optimistic and pessimistic concurrency, database row locks, and distributed locks with TTLs and fencing tokens.
Understand the latency-throughput tension and apply the correct optimization playbook for latency-sensitive vs throughput-oriented workloads.
Master the four conflict resolution strategies — LWW, application merge, CRDTs, and vector clocks — and know when each is appropriate.
Use MTBF, MTTR, and availability math to frame reliability, then apply SPOFs, bulkheads, graceful degradation, and the DR spectrum.
Implement the three-state circuit breaker, exponential backoff with jitter, and prevent cascading failures through bulkheads and load shedding.
Apply the five principles of chaos engineering, run targeted fault injection experiments, and build the organizational culture for resilience.