Write Amplification: The Hidden Tax
Defining Write Amplification
Write Amplification (WA) is the ratio of actual bytes written to storage versus the bytes the application intended to write. A WA of 10 means for every 1 GB of data your application sends, 10 GB is physically written to disk. This is a hidden cost that erodes storage throughput, increases SSD wear, and raises infrastructure costs.
Sources of Write Amplification
LSM-Tree compaction: Data written to an LSM-Tree (Cassandra, RocksDB, LevelDB) is first written to a memtable in memory and a Write-Ahead Log for durability, then flushed to an immutable SSTable on disk. As SSTables accumulate, background compaction merges and rewrites them into consolidated files. Each compaction round re-writes data that has already been written — sometimes multiple times across compaction levels. Total WA of 10-30× is common.
B-Tree updates: A single row update requires writing: the updated data page, the updated index pages for every affected index, and the WAL entry. A row with five indexes may produce six or more physical writes per logical update.
SSD hardware amplification: SSDs cannot overwrite individual bytes — they must erase an entire block (typically 256KB or more) before writing new data. A 4KB update erases and rewrites a 256KB block. This hardware-level WA is multiplied on top of database-level WA.
Mitigation Strategies
- Choose the right compaction strategy: Size-tiered compaction (default Cassandra) writes less but produces more SSTables to read. Leveled compaction (LevelDB, RocksDB option) reads faster but writes more. Match the strategy to your read/write ratio.
- Batch small writes: Accumulate small writes in a buffer and flush as one larger write. Amortizes per-write overhead and reduces total compaction cycles.
- Design write-aligned keys: Monotonically increasing keys (ULIDs, time-prefixed IDs) cause new data to land at the end of the SSTable structure, minimizing compaction. Random UUID keys scatter writes across the keyspace, maximizing compaction churn.
- Align writes to storage geometry: Write in multiples of the SSD page size (4KB). Misaligned writes trigger extra read-modify-write cycles at the hardware level.