Back to Blog

Vanishing Links: I Designed a URL Shortener and the Expiry Logic Was the Hard Part

Vanishing Links: URL Shortener with Custom Alias + Expiry

A URL shortener is not a mapping problem. It is a lifecycle problem under adversarial conditions.

I didn't see that until I sat down and actually designed one. I kept thinking about hashing, about base62, about how to make a short string. But the hard questions were all downstream: what happens when a link expires? Who gets the alias next? What stops someone from weaponizing the recycling window? The moment I started answering those, the system stopped looking like a key-value store and started looking like three distinct systems sharing a data layer.

I wrote about what functional requirements, non-functional requirements, and capacity estimation mean and why they matter in this Netflix design walkthrough. This post is part of my "Learning System Design the Easy Way" series.

The Three Systems Inside a URL Shortener

The insight that unlocked this design for me: a URL shortener is three systems pretending to be one.

System 1: The Redirect Engine. Optimized purely for read latency. Its job is to resolve an alias to a destination URL and return an HTTP redirect as fast as possible. It lives behind CDN and multi-layer caching. The database should almost never be in its critical path.

System 2: The Creation Service. Optimized for write safety. Its job is to validate URLs, enforce alias uniqueness, and guarantee ownership. It talks to the database on every request and prioritizes consistency over speed.

System 3: The Lifecycle Manager. Optimized for correctness over time. Its job is to expire links, enforce cooldown periods, purge or archive dead aliases, and decide whether an alias can ever be reused. This is the system nobody thinks about until it breaks.

Every design decision I made traced back to which of these three systems it belonged to. The rest of this post is organized around them.

System 1: The Redirect Engine

The Redirect Engine and Creation Service

Core Behavior

Alias lookup has four possible outcomes, each producing a specific HTTP response:

Active → alias found, link live → 301 or 302 redirect (depending on permanence policy). Not found → alias never existed → 404. Expired, in cooldown → alias exists but TTL passed → 410 Gone. Purged → cooldown complete, alias recyclable → 410 Gone.

Latency Target: p99 < 20ms

The redirect path must be fast enough that users never perceive the hop. The mechanism: CDN + multi-layer caching with a ~95% hit rate, keeping the database entirely out of the critical read path for the vast majority of requests.

At a 50x viral spike (~575K RPS), 95% CDN absorption means only ~29K RPS reaches origin. At ~2K RPS per app server, that's ~15 servers handling the spike. Without CDN, you'd need hundreds.

Cache Strategy: Beyond Hit Rates

A 95% hit rate is the target, but the strategy for getting there matters as much as the number.

TTL policy: fixed, not sliding. Sliding TTLs (reset-on-access) keep popular keys alive indefinitely, which sounds good until you realize it means popular expired links could keep serving stale redirects long after they should be returning 410. Fixed TTLs ensure cache entries expire on a predictable schedule aligned with the link's actual lifecycle state.

Stale-while-revalidate. When a cache entry expires, serve the stale entry while asynchronously fetching a fresh one from the database. This prevents the user-facing latency spike that happens when a cache miss forces a synchronous DB lookup on the read path. The tradeoff: for a brief window, a recently expired link might still redirect instead of returning 410. For most use cases, that sub-second inconsistency is acceptable.

Request coalescing. When a popular key expires and 10,000 concurrent requests arrive for it, without coalescing all 10,000 hit the database. With coalescing, the first request triggers the DB lookup and the remaining 9,999 wait for that single result. This is the primary defense against cache stampedes.

Cache Stampede: The Numbers

Even with coalescing, stampedes can still hurt at scale. 100M active URLs. 1% cache eviction = 1M keys suddenly uncached. If the DB supports 10K QPS, 1M concurrent lookups cause total choking without connection pooling and horizontal DB scaling (sharding, partitioning). Request coalescing reduces the blast radius, but the underlying DB must still be sized to absorb bursts.

System 2: The Creation Service

Validation and Alias Assignment

On creation, the system validates the long URL first. Malformed URLs are rejected. Anything beyond the allowed size cap (2000 characters if that's your policy) is rejected. This is the first abuse gate.

Two alias paths:

Custom alias (e.g., /promo): must be globally unique, reserved atomically on success, owned by the creator. If /promo already exists → 409 Conflict. Not a silent reassignment. A clear conflict error.

Generated alias: base62-encoded, typically 7 characters. If the generated value collides, retry until a free key is found. The collision retry is an internal success-path operation, never user-visible.

Alias Ownership

Custom aliases are user-intentful: someone typed /promo because they want that exact string. These are explicit user-owned identifiers not reassigned unless the original link is deleted or a release policy allows it. Generated aliases are also creator-owned, but the system controls naming.

The rule: one alias maps to exactly one active destination at a time.

URI constraints: URL-safe characters only, enforced min/max length, explicit case policy (lowercase-only to prevent /Promo vs /promo ambiguity). Long URLs have a hard upper bound for storage, validation, and abuse control.

Latency Target: p99 < 200ms

Latency is dominated by DB writes (50-100ms) and replication. The system enforces consistency through atomic writes and uniqueness constraints.

Rate Limiting: Derived from DB Capacity

This is a constraint, not a guideline. Rate limits are derived from database write capacity.

If peak writes are ~5-6K/sec and a DB node supports ~1K writes/sec, global write throughput is capped at ~6K/sec. Enforcement layers: 10 req/sec/IP, per-account limits, global throttling.

Even a botnet coordinating 1,000 IPs generates ~10K req/sec. Per-IP limits cap each at 10, and the global throttle catches anything beyond 6K/sec total.

Alias length is capped at 64 characters. Without this, an attacker submitting maximum-length strings could inflate storage from a few GB to hundreds of GB.

System 3: The Lifecycle Manager

This is the system I underestimated. It governs what happens after a link stops being active.

The Cooldown Model

When a link expires (by TTL or by hitting its max click count), it does not get deleted. It enters a cooldown period of approximately one month. During cooldown:

  • The alias is blocked from reassignment.
  • Incoming traffic receives 410 Gone.
  • The original creator can still extend the expiry (re-activate the link).

After cooldown, the link is purged or moved to cold storage. Only then does the alias become a candidate for recycling.

Why the cooldown? If /promo was on a billboard, people are still typing that URL days after expiry. Without the cooldown window, an attacker could claim the alias immediately and capture all residual traffic.

Edge Cases That Stress-Test the Model

The one-second-late request. Alias expires at 12:00:00, GET arrives at 12:00:01. Response: 410 Gone. But stale cache entries may still serve the old redirect briefly. This is an accepted tradeoff. Sub-second expiry precision across distributed caches is not practical.

Premium user extends a passed expiry. If the link is still in cooldown, allow the extension. The alias is still reserved. If cooldown has completed and the link was purged, extension is not possible. The system can only reallocate the alias if no other user has claimed it.

Max click limit reached. Same lifecycle flow. Limit hit → cooldown → 410 to incoming traffic → purge.

30-second reclaim attempt. User deletes a famous alias, someone else tries to grab it 30 seconds later. Blocked. The alias is in cooldown. No exceptions.

Taking a Stance: Should Aliases Ever Be Reused?

This is a real design fork, and I think the answer is no. Not by default.

The case for reuse is namespace efficiency. With a 7-character base62 alias space, you have ~3.5 trillion possible values. You're not running out. But if you allow custom aliases to be recycled, you get a system where /promo might point to a shoe company today and a phishing site next month. Anyone who bookmarked, embedded, or cached that alias now hits a different destination.

Reuse creates an attack surface. No-reuse creates predictability. For generated aliases, the namespace is large enough that reuse is unnecessary. For custom aliases, the user chose that string for a reason. Once it's gone, it should stay gone.

The system should treat alias deletion as permanent retirement, not recycling. If someone truly needs the same alias, they need to prove ownership of the original (same account, same org) and explicitly re-register it.

Cleanup Process Contention

With 1M new links per day and a 2-month cleanup policy, ~1M records hit TTL daily and need deletion.

If the DB handles 10K QPS and cleanup queries share capacity with normal traffic, the split is roughly 50/50. At 5K deletion QPS, processing 1M deletions takes ~200 seconds. During those 200 seconds, half the DB capacity is consumed by cleanup. Normal traffic faces degraded performance and timeouts.

The fix: throttle deletion rate and schedule cleanup during low-traffic windows. Cleanup is not optional (annual storage without it: ~365 GB vs ~60-65 GB with a 2-month policy), but it must never starve read traffic.

Graceful Degradation

Latency, Caching, and Degradation Under Load

Normal traffic: ~1,000 req/sec. Viral spike: 50x → 50,000 req/sec. DB capacity: 2,000-4,000 req/sec.

The math forces the design. At 50K req/sec with a 4K DB ceiling, 46,000-48,000 requests must be served from cache. Required cache hit rate: 92-96%.

Circuit breaker trigger: DB hit rate exceeds 4,000 req/sec. In circuit-open state, all requests are served from cache. Cache miss = temporary 404. Cache hit = serve the cached entry, even if stale.

The governing principle during degradation: read availability takes priority over write consistency. A new link taking extra seconds to propagate is acceptable. Existing links failing for millions of users is not.

Capacity Estimation

Capacity Estimation and Scale Planning

Traffic

Starting assumptions: 1M DAU, 10:1 read-to-write ratio.

MetricCalculationResult
Read requests/day1M × 1010M
Write requests/day1M × 11M
Avg read RPS10M / 86,400~115
Avg write RPS1M / 86,400~11.5
Combined avg QPS115 + 11.5~130
Peak QPS (5x)130 × 5~650

Cache Sizing

StrategyCalculationResult
90% hit rate (10% of read data cached)1M × 500 bytes500 MB
Pareto (20% of data, 80% of traffic)2 × 500 MB~1 GB RAM

Storage

ScenarioCalculationResult
Annual, no expiry1M × 365 × 500 bytes~200 GB
With 3-month expiry + 1-month cooldown~1/3 of annual~70 GB
With 2-month policy (stricter)~1/6 of annual~60-65 GB

The expiry policy is not just a product feature. It is a storage constraint that reduces long-term requirements by 65-70%.

What to Explore Next

Paste service (like Pastebin). Same lifecycle model but with much larger payloads. Forces different storage tier decisions and content-based expiry.

Distributed rate limiter. This post derived rate limits from DB capacity. Designing a standalone rate limiter makes sliding windows, token buckets, and Redis coordination into first-class problems.

Web crawler. Flips the URL shortener. Instead of mapping short to long, you're discovering and storing URLs at massive scale. Same deduplication and rate management concerns.

CDN design. We leaned on CDN to keep redirect latency under 20ms. Designing the CDN itself is the problem underneath this one.

The hardest part of a URL shortener is not generating short links. It's deciding when a link stops being safe to exist.

Keep Reading

System Design

Database Sharding - The Story of How Instagram Scaled Past 2 Billion Users

Instagram started with one PostgreSQL instance. Then they hit 2 billion users. Here's the sharding journey that kept the Likes flowing - and the lessons that will ace your interview.

Read Article
System Design

I Designed Netflix in 45 Minutes Flat - Here's the Exact Blueprint

Sweaty palms, a blank whiteboard, and five words: 'Design Netflix.' Here's the exact blueprint that turned my panic into a perfect system design answer.

Read Article
System Design

How WhatsApp Handles 100 Billion Messages a Day - Explained Like You're in an Interview

One message, a million hops. Follow a single WhatsApp message from your thumb hitting send to your friend's phone buzzing - and every system it touches along the way.

Read Article