Skip to main content

Command Palette

Search for a command to run...

Cache Strategies in Distributed Systems

Published
11 min read

The Avengers of Backend Engineering — Assembled Against the Thundering Herd

System Design Cohort · March 8, 2026

18 min read · #DistributedSystems #Caching #SystemDesign

Caching TTL Jitter SWR Mutex Cache Warming Thundering Herd

🏏

Live Scenario — Today, 7:00 PM

India vs New Zealand T20 Final on JioHotstar. 300 million fans. One platform. One cache layer standing between epic cricket and a catastrophic outage. This is the story of the heroes that save it.

It's 6:55 PM. Rohit Sharma is walking to the toss. Your phone is open. 300 million other phones are open. In that single moment, JioHotstar's distributed systems face what the database engineering team quietly calls the most dangerous five seconds of the year.

If caches are cold, the database sees an avalanche it was never designed to handle. If caches expire together, a different kind of avalanche hits — the Thundering Herd. And if nobody thought about this? You get buffering screens. Rage tweets. A PR catastrophe at the worst possible moment.

This is not a horror story. This is the story of how great distributed systems are engineered — using five battle-tested cache strategies that act as Avengers-level protection for high-traffic systems.

Suit up. Let's go.

origin story

ISSUE #01

Why Basic TTL Caching

Isn't Enough

Every distributed system story starts the same way. Your app is growing. Database queries are slow. Someone says: "Let's cache it." You store data in Redis with a TTL — a fixed expiry time. When the TTL runs out, the key evicts. The next request recomputes and refills it.

Clean. Simple. Wrong — at scale.

Basic TTL works beautifully for low-traffic, spread-out use cases. The fatal flaw reveals itself only under coordinated, synchronized load — which is exactly what every JioHotstar match creates. Millions of users open the app at the same time. They all fetch the same cached keys. Those keys were all cached around the same time. And they all expire at exactly the same time.

⚡ The Core Problem Basic TTL assumes cache misses are rare and spread out. But in distributed systems under synchronized load, they cluster together — creating a burst of simultaneous recomputation that overwhelms the origin database.

This is the origin story of the Thundering Herd — and it's the villain every strategy in this article is designed to defeat.

the villain

ISSUE #02

Cache Expiry & Traffic Spikes:

The Thundering Herd

Here's the scenario in precise terms. JioHotstar sets TTL = 300 seconds for match scorecard data. At 6:55 PM, all 50 app servers cached this data. At 7:00:00 PM sharp, every cached key expires simultaneously. Every one of those 300 million requests hits the origin database at the same moment.

Your database handles 10,000 queries/second on a good day. It's just been asked for 2,000,000 queries/second.

Synchronized TTL Expiry — The Villain's Plan

  Time →     6:55:00 PM              7:00:00 PM
    ──────────────────────────────────────────────────────────
    Key A:  [■■■■■■■■■■■■■■■■■■]  EXPIRED  ← 50M requests hit DB
    Key B:  [■■■■■■■■■■■■■■■■■■]  EXPIRED  ← 80M requests hit DB
    Key C:  [■■■■■■■■■■■■■■■■■■]  EXPIRED  ← 60M requests hit DB
    Key D:  [■■■■■■■■■■■■■■■■■■]  EXPIRED  ← 110M requests hit DB
                                        ↓
                            ALL HIT DATABASE SIMULTANEOUSLY
                            DB Load: 2,000,000 queries/sec  💥 CRASH

This failure mode has a name: Cache Stampede. It's been responsible for outages at Netflix, Amazon, Twitter, and countless others. It's not a bug in your code — it's an architectural oversight in how you set up your caching layer.

Now meet the heroes.

heroes assemble

HERO #01 — CAPTAIN CHAOS

TTL Jitter: The Randomizer

Power: Spreading recomputation load across time instead of a single moment

TTL Jitter is the simplest, most universally applicable defense against the Thundering Herd. Instead of setting all cache keys to expire at the same fixed TTL, you add a random offset to each entry's expiration time.

Keys that were cached together now expire at slightly different times — spreading the recomputation load across a window instead of a single millisecond.

Formula

TTL_actual = base_TTL + random(0, jitter_window)

For JioHotstar's scorecard cache:

base_TTL = 300s, jitter = random(0, 60s)

Result: Keys expire between 300s and 360s — spread over a full minute, turning a 2M QPS spike into a manageable 33K QPS trickle.

Jittered TTL — Hero's Defense

  Time →  7:00:00  7:00:12  7:00:24  7:00:36  7:00:48  7:01:00
    ──────────────────────────────────────────────────────────────
    Key A:                  ↑ expires (300 + 12s)
    Key B:                           ↑ expires (300 + 24s)
    Key C:                                    ↑ expires (300 + 36s)
    Key D:                                             ↑ expires (300 + 48s)

DB Load: ~33,000 queries/sec (spread, manageable) ✅
vs. 2,000,000 queries/sec with synchronised TTL

Jitter alone can reduce peak database load by 80–90%. It's the first line of defense and costs almost nothing to implement. But it doesn't eliminate stampedes entirely — individual hot keys still cause spikes on expiry. That's where the next heroes come in.

HERO #02 — DOCTOR STRANGE

Probabilistic Early Re-computation (PER)

Power: Seeing the future — refreshing the cache before it even expires

What if your cache could detect, before an entry expires, that it's about to expire — and refresh it early? That's the idea behind Probabilistic Early Re-computation, also known as XFetch.

As a cache entry approaches its expiration time, each incoming request has an increasing probability of triggering a background recomputation — even before the TTL hits zero. The closer to expiry, the higher the chance. By the time the TTL actually runs out, the cache has almost certainly been refreshed already.

⚡ The Prophet's Logic
Imagine the scorecard has 30 seconds left. At T-30s → 10% refresh chance. At T-15s → 40%. At T-5s → 80%. The stampede never happens because someone always refreshed it early. Doctor Strange has already seen the future.

Probabilistic Early Expiration Timeline

    TTL Remaining:   60s       45s       30s       15s        0s
    ────────────────────────────────────────────────────────────
    Refresh Prob:    2%        10%       25%       60%      100%

Standard: ────────────────────────────────────── EXPIRE 💥
PER: ────────────────── ↑ Background Refresh ─── Never Expires
(single worker refreshes silently)

JioHotstar uses this for live match state — current score, overs bowled, run rate. As the cached snapshot ages, background workers are increasingly likely to refresh it. By the time the next 50 million requests arrive, the data is fresh and the cache is already warm.

HERO #03 — NICK FURY

Mutex / Cache Locking: The Gatekeeper

Power: Only one request allowed through the door. Everyone else waits.

Even with jitter and PER, a cold cache miss can still happen — especially for keys that were never cached to begin with. When a miss occurs, you face a race: multiple concurrent requests all discover the miss and all rush to recompute simultaneously.

Mutex locking solves this with absolute authority: when a cache miss occurs, only ONE request is allowed to recompute and refill the cache. All other concurrent requests either wait at the lock, or are served stale data if available.

Mutex Locking Flow — Nick Fury's Door

    300M requests arrive → Cache MISS on 'match_score_key'
    ─────────────────────────────────────────────────────────────
    Request #1:  Acquires Lock → Computes → Fills Cache → Releases Lock ✅
    Request #2:  Waits at lock .......................→ Cache HIT ✅
    Request #3:  Waits at lock .......................→ Cache HIT ✅
    Request #4:  Waits at lock .......................→ Cache HIT ✅
    ... 299,999,996 others → all served from cache ✅

DB receives: 1 query (not 300 million) 🎯

⚠️ Critical Warning
The lock MUST have its own TTL. If the computing request crashes before releasing, you've replaced a stampede with a permanent deadlock. Even Nick Fury can lock himself in his own vault. Use Redis SETNX with an expiry: SET lock_key 1 EX 10 NX.

The pattern: try cache → on miss, try SETNX distributed lock → if acquired, compute and fill → release. If not acquired, retry after a brief sleep, or serve stale data. Simple. Ruthlessly effective.

HERO #04 — THE WATCHER

Stale-While-Revalidate:

The Time-Keeper

Power: Serves instantly, revalidates in the background, never blocks

Stale-While-Revalidate (SWR) is one of the most elegant patterns in distributed systems. When a cache entry expires, instead of blocking while you recompute — serve the stale data immediately, and simultaneously trigger a background refresh.

The user gets an instant response. The next request gets fresher data. The database is never stampeded.

Stale-While-Revalidate Flow

    Request arrives at T+301s (entry expired at T+300s)
    ─────────────────────────────────────────────────────────────
    Step 1: Cache MISS (expired) → Serve STALE data immediately ← User sees in 1ms
    Step 2: Background worker triggered → Queries DB for fresh data
    Step 3: Cache updated with fresh result
    Step 4: Next request at T+302s → Cache HIT with fresh data ✅

User Latency: ~1ms | DB Load: 1 query per expiry window

CDNs like Cloudflare and Fastly make SWR a first-class feature. When you see Cache-Control: max-age=300, stale-while-revalidate=60 in HTTP headers — that's exactly this. JioHotstar uses SWR for player images, team logos, match schedules, and semi-dynamic commentary feeds.

⚡ The Tradeoff
The first user after expiry always gets slightly stale data. For match scores, a 1-second lag is totally fine. For a payment confirmation? Absolutely not. Know your tolerance before deploying SWR.

HERO #05 — BLACK WIDOW

Cache Warming: The Scout

Power: Prepares the battlefield long before the battle begins

Cache Warming is the proactive hero — it fills your cache before traffic arrives, so the first real user never experiences a cold miss. No waiting. No recomputation. No stampede. The data is already there.

For JioHotstar, cache warming before the T20 Final means: automated scripts start loading the cache from 6:00 PM — two hours before kickoff. Match metadata, team rosters, player statistics, streaming endpoints, ad configurations — all warmed up in priority order before Rohit Sharma walks to the toss.

Cache Warming — Pre-Match Timeline

    6:00 PM  ──── Tier 1: Team rosters, player stats (slow-changing)
    6:20 PM  ──── Tier 2: Match schedule, venue, umpires
    6:40 PM  ──── Tier 3: Streaming URLs, CDN edge configs
    6:55 PM  ──── Tier 4: Live scorecard, toss result, XI
    ─────────────────────────────────────────────────────────
    7:00 PM → MATCH STARTS → 300M requests → 99% Cache HIT ✅
                                                                
    Without warming: 7:00 PM → 300M requests → 99% Cache MISS 💥

Netflix runs cache warming scripts for hours before every major release. Before a big series drop, content metadata, thumbnails, recommendation lists, and user preference caches are pre-loaded across every edge node globally.

Amazon and Flipkart warm caches hours before flash sales — product pages, pricing, inventory counts, recommendation carousels — so the first sale-hunter gets a sub-100ms response, not a cold cache miss.

the avengers table

ISSUE #06

Tradeoffs: Freshness, Latency & Consistency

No single strategy wins on all three dimensions. The art is in knowing which levers matter most for your specific use case.

Strategy Freshness Latency Consistency Best For
Basic TTL Medium Low Medium Static, low-traffic data
TTL Jitter Medium Low Medium Always — apply everywhere
Probabilistic (PER) High Low High Hot keys, live scoreboards
Mutex Locking High Medium Very High Financial data, auth tokens
Stale-While-Revalidate Lower Very Low Lower CDN, UI configs, media
Cache Warming High Very Low High Predictable traffic spikes

real world

ISSUE #07

Real-World Playbooks

JioHotstar

Jitter on all match data. Warming from 6 PM. SWR for commentary. Mutex for streaming token generation. PER for live scorecards. All five heroes deployed together.

Netflix

Pre-warms content metadata globally 6 hours before drops. SWR for recommendation caches. Jitter on personalization TTLs to prevent global coordinated expiry.

Amazon / Flipkart

Cache warming 2 hours before flash sales. Mutex on inventory write-through. Jitter on price caches. SWR for non-critical UI. Prevents stampedes at sale open.

CDN Providers

Cloudflare and Fastly implement SWR natively via Cache-Control headers. Edge nodes use probabilistic early refresh for hot assets close to TTL boundary.

mission briefing

ISSUE #08

When to Use Which Strategy

Think of assembling your Avengers team based on the mission type:

  • Always deploy TTL Jitter — It's your base shield. Zero cost, 80–90% spike reduction. There is no valid reason to skip it on any distributed cache.

  • Add Cache Warming when you can predict spikes — product launches, match kickoffs, election nights, Black Friday. If you know traffic is coming, prepare for it.

  • Use SWR for high-read, slight-lag-tolerant data — CDN assets, UI configs, recommendation lists, news feeds. Never for financial or inventory data.

  • Use Mutex Locking when stale means wrong — financial totals, inventory counts, auth tokens, payment state. Correctness beats latency here.

  • Use Probabilistic Early Expiration for hot keys under sustained high read load — live scoreboards, trending feeds, viral content, real-time leaderboards.

The Avengers
of Backend Engineering

TTL Jitter · Cache Warming · SWR · Mutex · Probabilistic Expiry

Together, they protect your system from the Thundering Herd.

They'll make sure 300 million fans watch every boundary, every six,
every wicket of India vs New Zealand — without a single dropped frame.

More from this blog