High‑load in‑memory HTTP cache & reverse proxy for Go.
Designed for low latency and sustained high throughput — hundreds of thousands of RPS on commodity hosts.
Built around sharded storage, LRU with TinyLFU admission (Doorkeeper), background refresh, a resilient upstream cluster, and a lightweight worker orchestrator. The hot path is engineered to avoid allocations and global locks.
- Sharded storage (power‑of‑two shards) with per‑shard LRU and a global shard balancer for proportional eviction.
- Admission = W‑TinyLFU + Doorkeeper (Count‑Min Sketch + gated Bloom‑like filter).
- Background refresh with TTL, β‑staggering, scan‑rate and upstream rate limiting.
- Reverse proxy mode with an upstream cluster: per‑backend rate limiting, health probing, slow‑start, throttling/quarantine, and a fan‑in slot‑selection pattern for balanced dispatch.
- Worker orchestration for eviction/refresh/GC: on/off/start/reload/scale via a lightweight governor.
- Careful memory discipline: pooled buffers, zero‑copy headers, predictable storage budget.
- Dump/restore per shard with CRC32 and version rotation (optional GZIP).
- fasthttp HTTP layer, focused REST surface, Prometheus/VictoriaMetrics metrics.
- Kubernetes‑friendly: liveness probe, graceful shutdown, configurable GOMAXPROCS (auto when
0
).
See also:
METRICS.md
andROADMAP.md
.
Quick orientation to major components.
cmd/
main.go # entrypoint, flags, wiring, logger, probes
internal/cache/api/ # HTTP controllers: main cache route, on/off, clear, metrics
pkg/
config/ # YAML config loader & derived fields
http/server/ # fasthttp server, middlewares (Server header, JSON, etc)
orchestrator/ # worker governor & transport
pools/ # buffer and slice pools
prometheus/metrics/ # Prometheus/VictoriaMetrics exposition
storage/ # sharded map, per‑shard LRU, LFU, dumper, refresher, evictor
lru/ lfU/ map/
upstream/ # backend & cluster: rate‑limit, health, proxy logic
k8s/ # probes
utils/, common/, types/ # helpers
To ensure keys are consistent and idempotent, requests are normalized before lookup/insert:
Only items listed in config participate in the key:
- Query:
rules.*.cache_key.query
— exact parameter names (supports names likeproject[id]
). - Headers:
rules.*.cache_key.headers
— exact header names to include (e.g.Accept-Encoding
).
All other query params and headers are ignored for the key.
Selected query params and headers are sorted lexicographically (by name, then value) before key construction, so semantically identical requests map to the same key.
Source: pkg/model/query.go
, pkg/model/header.go
, pkg/common/sort/key_value.go
.
If you whitelist Accept-Encoding
, its normalized value becomes part of the key to isolate gzip/brotli/plain variants.
- Whitelist filtering (no sorting): Only response headers from
rules.*.cache_value.headers
are stored and forwarded as‑is.
(No reordering is performed.)
Server: <service-name>
— always set by middleware; if an upstream server name was present, it is preserved asX-Origin-Server
and replaced with localServer
.
Source:pkg/http/server/middleware/server_name.go
.
Note:
X-Refreshed-At
is planned to indicate background refresh timing. (SeeROADMAP.md
.)
Two example profiles are included:
advcache.cfg.yaml
— deployment profileadvcache.cfg.local.yaml
— local/stress profile
Selected top‑level keys (under cache:
):
env
— log/metrics label (dev
,prod
, etc).runtime.gomaxprocs
—0
= auto (via automaxprocs); set explicit N to cap CPUs.api.{name,port}
— service name and HTTP port.upstream.policy
—"await"
(back‑pressure) or"deny"
(fail‑fast).upstream.cluster.backends[]
— per‑backend:rate
,timeout
,max_timeout
,use_max_timeout_header
,healthcheck
.data.dump
— snapshots:{enabled,dir,name,crc32_control_sum,max_versions,gzip}
.storage.size
— memory budget (bytes).admission
— TinyLFU:table_len_per_shard
(power‑of‑two),estimated_length
,door_bits_per_counter
(8–16 typical),sample_multiplier
(traffic‑proportional aging).eviction
— pressure policy:soft_limit
(background eviction + enforce admission),hard_limit
(minimal hot‑path eviction + runtime memory limit);replicas
,scan_rate
.refresh
—{enabled,ttl,beta,rate,replicas,scan_rate,coefficient}
.forceGC
— periodicFreeOSMemory
.metrics.enabled
— Prometheus/VictoriaMetrics.k8s.probe.timeout
— probe timeout.rules
— per‑path overrides + cache key/value canonicalization.
cache:
env: "dev"
enabled: true
runtime:
gomaxprocs: 0
api:
name: "starTeam.advCache"
port: "8020"
upstream:
policy: "await"
cluster:
backends:
- id: "prod-node-1"
enabled: true
host: "localhost:8081"
scheme: "http"
rate: 100000
timeout: "10s"
max_timeout: "1m"
use_max_timeout_header: ""
healthcheck: "/healthcheck"
- id: "low-resources-prod-node-2"
enabled: true
host: "localhost:8082"
scheme: "http"
rate: 3000
timeout: "10s"
max_timeout: "1m"
use_max_timeout_header: ""
healthcheck: "/healthcheck"
- id: "legacy-prod-node-3"
enabled: true
host: "localhost:8083"
scheme: "http"
rate: 500
timeout: "1m"
max_timeout: "10m"
use_max_timeout_header: ""
healthcheck: "/legacy/health/is-ok"
data:
dump:
enabled: false
dir: "public/dump"
name: "cache.dump"
crc32_control_sum: true
max_versions: 3
gzip: false
storage:
size: 53687091200 # 50 GiB
admission:
table_len_per_shard: 32768
estimated_length: 10000000
door_bits_per_counter: 12
sample_multiplier: 12
eviction:
enabled: true
replicas: 4
scan_rate: 8
soft_limit: 0.8
hard_limit: 0.9
refresh:
enabled: true
ttl: "3h"
beta: 0.5
rate: 1250
replicas: 4
scan_rate: 32
coefficient: 0.5
forceGC:
enabled: true
interval: "10s"
metrics:
enabled: true
k8s:
probe:
timeout: "5s"
rules:
/api/v2/cloud/data:
cache_key:
query: [project[id], domain, language, choice, timezone]
headers: [Accept-Encoding]
cache_value:
headers: [Content-Type, Content-Length, Content-Encoding, Connection, Strict-Transport-Security, Vary, Cache-Control]
/api/v1/stats:
enabled: true
ttl: "36h"
beta: 0.4
coefficient: 0.7
cache_key:
query: [language, timezone]
headers: [Accept-Encoding]
cache_value:
headers: [Content-Type, Content-Length, Content-Encoding, Connection, Strict-Transport-Security, Vary, Cache-Control]
cache:
env: "dev"
enabled: true
runtime:
gomaxprocs: 12
api:
name: "starTeam.adv:8020"
port: "8081"
upstream:
policy: "deny"
cluster:
backends:
- id: "adv"
enabled: true
host: "localhost:8020"
scheme: "http"
rate: 250000
timeout: "5s"
max_timeout: "3m"
use_max_timeout_header: "X-Google-Bot"
healthcheck: "/k8s/probe"
storage:
size: 10737418240 # 10 GiB
admission:
table_len_per_shard: 32768
estimated_length: 10000000
door_bits_per_counter: 12
sample_multiplier: 10
eviction:
enabled: true
replicas: 4
scan_rate: 8
soft_limit: 0.9
hard_limit: 0.99
forceGC:
enabled: true
interval: "10s"
metrics:
enabled: true
-
Background eviction at SOFT‑LIMIT
Whenheap_usage >= storage.size × soft_limit
, the evictor runs in the background and does not touch the hot path. It removes items using a larger LRU sample (preferentially keeping newer entries). Increasereplicas
andscan_rate
to shave memory continuously. -
Admission at SOFT‑LIMIT
TinyLFU admission is enforced on the hot path during pressure to avoid polluting the cache with low‑value inserts while the evictor catches up. -
Minimal hot‑path eviction at HARD‑LIMIT
Whenheap_usage >= storage.size × hard_limit
, a single‑item eviction per request is applied to reduce contention with the background worker, and the runtime memory limit is set in parallel. This preserves throughput and avoids latency cliffs.
- Count‑Min Sketch (depth=4) with compact counters, sharded to minimize contention.
- Sample‑based aging: ages after
estimated_length × sample_multiplier
observations (traffic‑proportional). - Doorkeeper (Bloom‑like bitset) gates first‑seen keys; reset with aging to avoid FPR growth.
Recommended starting points:
table_len_per_shard
: 8192–32768 · door_bits_per_counter
: 12 · sample_multiplier
: 8–12
With randomized object sizes between 1 KiB and 16 KiB (mocks), the cache fills to ~10 GiB of logical data with ~500 MiB of overhead. Resident usage stabilizes around ~10.5 GiB for a 10 GiB dataset under these conditions.
Requirements: Go 1.24+
# Build
go build -o advCache ./cmd/main.go
# Run (uses default config path if present)
./advCache
# Run with an explicit config path
./advCache -cfg ./advcache.cfg.yaml
# Docker (example multi‑stage)
docker build -t advcache .
docker run --rm -p 8020:8020 -v "$PWD/public/dump:/app/public/dump" advcache -cfg /app/advcache.cfg.yaml
The built‑in defaults try
advcache.cfg.yaml
and thenadvcache.cfg.local.yaml
if-cfg
is not provided.
GET /{any}
— main cached endpoint (cache key = path + selected query + selected request headers).GET /cache/on
— enable caching.GET /cache/off
— disable caching.GET /cache/clear
— two‑step clear (first call returns a token with 5‑min TTL; second call with?token=
clears).GET /metrics
— Prometheus/VictoriaMetrics exposition.
- Hits, misses, proxied/fallback counts, errors, panics.
- Cache length and memory gauges.
- Upstream health: healthy/sick/dead.
- Eviction/admission activity.
- Refresh scan/attempt metrics.
Enable periodic stats to stdout with logs.stats: true
in config.
- Upstream policy:
deny
for fail‑fast load tests;await
in production for back‑pressure. - Eviction thresholds: start with
soft_limit: 0.8–0.9
,hard_limit: 0.9–0.99
,forceGC.enabled: true
. If hot‑path eviction triggers often, increase evictorreplicas
orscan_rate
. - Admission: watch Doorkeeper density and reset interval; if density > ~0.5, increase
door_bits_per_counter
or reducesample_multiplier
. - CPU: leave
gomaxprocs: 0
in production; pin CPUs via container limits/quotas if needed. - Headers: whitelist only what must participate in the key;
Accept-Encoding
is a good default when you store compressed variants.
- Unit tests around storage hot path, TinyLFU, and shard balancer.
- Dump/load tests with CRC and rotation.
- Upstream fault injection: timeouts, spikes, error bursts.
- Benchmarks with
-benchmem
, race tests for concurrency‑sensitive code.
MIT — see LICENSE
.
Borislav Glazunov — glazunov2142@gmail.com · Telegram @glbrslv