Caching options in Cloudflare Workers
When you get into systems design, eventually you are going to have to deal with caching. I am currently writing my serverless MUD and part of that system requires an automatic registration of users hitting the API. This means that every single API call reads from Cloudflare D1 in the hot path. Reading from D1 is incredibly fast, but it’s not latency-free, particularly during periods of high load.
This got me thinking about caching the read “somewhere cheaper”. By cheaper, I mean lower latency and cost. So, what are your options, and when would you choose one over the other?

In my Worker, I have a simple cache definition:
export interface CacheManager<T> { get(key: string): Promise<T | undefined>; set(key: string, value: T): Promise<void>; remove(key: string): Promise<void>; clear(): Promise<void>;}I can then have multiple implementations of the cache manager and inject them into my auto-registration middleware to get different behavior.
An in-memory cache
An in-memory cache is perhaps the easiest to comprehend. Use a Map<K, V> to store data and automatically evict stored values to prevent the cache from getting too big. Here is the code:
export class InMemoryCacheManager<T> implements CacheManager<T> { private readonly store = new Map<string, T>(); private readonly maxEntries: number;
constructor(maxEntries: number = DEFAULT_MAX_ENTRIES) { this.maxEntries = maxEntries; }
get(key: string): Promise<T | undefined> { return Promise.resolve(this.store.get(key)); }
set(key: string, value: T): Promise<void> { if (!this.store.has(key) && this.store.size >= this.maxEntries) { const oldest = this.store.keys().next().value; if (oldest !== undefined) { this.store.delete(oldest); } } this.store.set(key, value); return Promise.resolve(); }
// ... other methods not shown}This is your fastest and least reliable option and best used as an opportunistic L1 cache. There is no explicit per-read/write billing, but there are significant drawbacks:
- Not durable.
- Not shared across isolates or colos.
- Not guaranteed to exist on the next request.
- Limited by the Workers memory limit of 128MB (total).
Cloudflare Workers are stateless; each request may run on a different instance, in a different location, with no shared memory between requests. At the same time, global scope can persist when an isolate is reused. There is no fixed TTL for Worker isolates - each request may hit the same isolate or it may hit a completely different one. Assume a Worker isolate can vanish at any time.
As a result, the in-memory cache should be considered a best-effort hot-isolate cache, not a real cross-request store. Use it for ultra-short-lived hot-key caching and per-isolate memoization. Don’t use it to reduce the reads/writes on your D1 database or for revocation-sensitive auth state.
Workers Cache API
Each colo within the Cloudflare network has a CDN cache. The Workers Cache API can be used to read and write data to that cache, which provides a mechanism for storing small bits of data. It’s useful when your users are likely to be hitting the same colo all the time (which, in normal operation, they would be). Here is the implementation:
export class ColoCacheManager<T> implements CacheManager<T> { private url(key: string) { return `https://cache.local/${encodeURIComponent(key)}`; }
async get(key: string): Promise<T | undefined> { const res = await caches.default.match(this.url(key)); return res ? (await res.json()) as T : undefined; }
async set(key: string, value: T): Promise<void> { const res = new Response(JSON.stringify(value), { headers: { "Content-Type": "application/json", "Cache-Control": "max-age=60" }, }); await caches.default.put(this.url(key), res); }
// ... other methods not shown}The Cache API mirrors the browser CacheStorage API, so it should be a familiar interface. All operations use a Request -> Response store, so you have to serialize data accordingly.
The Cache API has major advantages over the in-memory cache - most notably, it’s really good when the user tends to land in the same colo repeatedly, which is the common case. Also, the TTL is under your control. It’s still included in your Workers / CDN price.
That being said, it does not replicate outside the originating data center. That makes revoking or deleting cache keys problematic, especially when you are storing authorization data. Workers Cache API is a good fit when user data changes infrequently and slight staleness is acceptable. Workers Cache API does not work when the application is fronted by Cloudflare Access, and it requires a custom domain.
This is the solution to choose when your main goal is to reduce repeated reads from the same region. Use it for user profile blobs, hydrated session responses and per-user response caching. Don’t use it when the data must be globally fresh (e.g. authorization revocation data) or where users roam between colos frequently.
Workers KV
Workers KV is the key-value store of choice when a global cache is required and eventual consistency is acceptable. KV data is centrally stored and then cached through Cloudflare’s network after access. As a result of this design, it has good ‘read-hit’ latency, but relatively poor ‘read-miss’ latency. Use this for read-heavy workloads.
export class KVCacheManager<T> implements CacheManager<T> { private kv: KVNamespace;
constructor(kv: KVNamespace) { this.kv = kv; }
async get(key: string): Promise<T | undefined> { const value = await this.kv.get<T>(ey, { type: "json", cacheTtl: 60 }); return value ?? undefined; }
async set(key: string, value: T): Promise<void> { await this.kv.put(key, JSON.stringify(value), { expirationTtl: 60, metadata: { cachedAt: Date.now() } }); }
// ... other methods not shown}You will need access to the KV namespace binding, which is passed in via context to a Hono app. This makes the construction of the pipeline doing auto-registration a little more complex since you no longer have a globally defined cache.
There are two TTL values you control here:
expirationTtlis set onput()and controls when the key is deleted (min: 60s).cacheTtlis set onget()and determines how long a read stays cached at the edge (min: 30s).
Workers KV is more globally available than Workers Cache and built for read-heavy workloads. It’s also got better limits than the in-memory cache - up to 25MB values. It’s good for write-once / write-rarely workloads. However, it is eventually consistent, so you have to tune the TTL according to your needs by setting the combination of expirationTtl (on writes) and cacheTtl (on reads). It’s also more expensive than a D1 read (although it has lower latency).
No cache option
Of course, you don’t have to use a cache. D1 has many advantages even in situations where you think a cache might be useful. Writes are durable and atomic (ACID), and the D1 Sessions API can introduce (eventually consistent) read replicas so reads reflect committed writes (with a slight lag).
This does involve making certain decisions about your code, however. For instance, reads are much cheaper than writes:
| Unit billed | Free plan/day | Included amount/month | Cost per million |
|---|---|---|---|
| Rows read | 5 million | 25 billion | $0.001 |
| Rows written | 100 thousand | 50 million | $1.00 |
If I’m not paying for the writes, I’d be tempted to use an INSERT operation that protects against race conditions on every API call:
INSERT INTO users (email, sub) VALUES (?1, ?2) ON CONFLICT(email) DO UPDATE SET email = excluded.email RETURNING *This is an atomic operation. I don’t have to worry about ordering. However, it incurs a write cost per API call. Instead, do a SELECT operation before the INSERT. The SELECT call hits your D1 read usage, but it avoids writing in the hot path. Your write usage remains low. For my use case, I only incur a write when a user first uses the site (and potentially during a race condition where two API calls for the same new user happen at the same time).
Final thoughts
What can you expect from each layer?
| Layer | Latency | Scope | Freshness | Durability | Max value size | Good for |
|---|---|---|---|---|---|---|
| In-memory | Best | One isolate | Unpredictable | None | Very small | Burst reuse / memoization |
| Workers Cache | Very good on hit | One colo | TTL-based, colo-local | Temporary | 512MB | Edge response caching |
| Workers KV | Good, especially hot | Global read distribution | Eventually consistent | Durable | 25MB | Read-heavy workloads |
| D1 | Higher than cache | Primary + replicas | Stronger than KV for truth | Durable | N/A | Authoritative user state |
For my situation, I’m using D1 with a small in-memory cache. The reasons?
- My hot path is on every single API call.
- Users tend to issue API calls in rapid succession, so they are more likely to hit the same isolate given the low volume of users I expect.
- I’m using D1 as the source of truth.
- I’m not using the cache for authorization decisions.
- I can’t use Workers Cache API because my app is behind Cloudflare Access.
Every situation is unique. You need to look at your constraints, consider the cost and latency strategy, then decide which architecture to pick for effective use of the resources you have available.
Comments