Skip to content

Borislavv/adv-cache

Repository files navigation

Advanced Cache (advCache)

High‑load in‑memory HTTP cache & reverse proxy for Go.
Designed for low latency and sustained high throughput — hundreds of thousands of RPS on commodity hosts.

Built around sharded storage, LRU with TinyLFU admission (Doorkeeper), background refresh, a resilient upstream cluster, and a lightweight worker orchestrator. The hot path is engineered to avoid allocations and global locks.


Highlights

  • Sharded storage (power‑of‑two shards) with per‑shard LRU and a global shard balancer for proportional eviction.
  • Admission = W‑TinyLFU + Doorkeeper (Count‑Min Sketch + gated Bloom‑like filter).
  • Background refresh with TTL, β‑staggering, scan‑rate and upstream rate limiting.
  • Reverse proxy mode with an upstream cluster: per‑backend rate limiting, health probing, slow‑start, throttling/quarantine, and a fan‑in slot‑selection pattern for balanced dispatch.
  • Worker orchestration for eviction/refresh/GC: on/off/start/reload/scale via a lightweight governor.
  • Careful memory discipline: pooled buffers, zero‑copy headers, predictable storage budget.
  • Dump/restore per shard with CRC32 and version rotation (optional GZIP).
  • fasthttp HTTP layer, focused REST surface, Prometheus/VictoriaMetrics metrics.
  • Kubernetes‑friendly: liveness probe, graceful shutdown, configurable GOMAXPROCS (auto when 0).

See also: METRICS.md and ROADMAP.md.


Repository map

Quick orientation to major components.

cmd/
  main.go                  # entrypoint, flags, wiring, logger, probes

internal/cache/api/        # HTTP controllers: main cache route, on/off, clear, metrics
pkg/
  config/                  # YAML config loader & derived fields
  http/server/             # fasthttp server, middlewares (Server header, JSON, etc)
  orchestrator/            # worker governor & transport
  pools/                   # buffer and slice pools
  prometheus/metrics/      # Prometheus/VictoriaMetrics exposition
  storage/                 # sharded map, per‑shard LRU, LFU, dumper, refresher, evictor
    lru/ lfU/ map/
  upstream/                # backend & cluster: rate‑limit, health, proxy logic
  k8s/                     # probes
  utils/, common/, types/  # helpers

How requests are canonicalized (cache key)

To ensure keys are consistent and idempotent, requests are normalized before lookup/insert:

Whitelist filtering

Only items listed in config participate in the key:

  • Query: rules.*.cache_key.queryexact parameter names (supports names like project[id]).
  • Headers: rules.*.cache_key.headersexact header names to include (e.g. Accept-Encoding).

All other query params and headers are ignored for the key.

Deterministic ordering

Selected query params and headers are sorted lexicographically (by name, then value) before key construction, so semantically identical requests map to the same key.
Source: pkg/model/query.go, pkg/model/header.go, pkg/common/sort/key_value.go.

Compression variants

If you whitelist Accept-Encoding, its normalized value becomes part of the key to isolate gzip/brotli/plain variants.


Response headers policy

  • Whitelist filtering (no sorting): Only response headers from rules.*.cache_value.headers are stored and forwarded as‑is.
    (No reordering is performed.)

Server‑added diagnostic headers

  • Server: <service-name> — always set by middleware; if an upstream server name was present, it is preserved as X-Origin-Server and replaced with local Server.
    Source: pkg/http/server/middleware/server_name.go.

Note: X-Refreshed-At is planned to indicate background refresh timing. (See ROADMAP.md.)


Configuration

Two example profiles are included:

  • advcache.cfg.yaml — deployment profile
  • advcache.cfg.local.yaml — local/stress profile

Selected top‑level keys (under cache:):

  • env — log/metrics label (dev, prod, etc).
  • runtime.gomaxprocs0 = auto (via automaxprocs); set explicit N to cap CPUs.
  • api.{name,port} — service name and HTTP port.
  • upstream.policy"await" (back‑pressure) or "deny" (fail‑fast).
  • upstream.cluster.backends[] — per‑backend: rate, timeout, max_timeout, use_max_timeout_header, healthcheck.
  • data.dump — snapshots: {enabled,dir,name,crc32_control_sum,max_versions,gzip}.
  • storage.size — memory budget (bytes).
  • admission — TinyLFU: table_len_per_shard (power‑of‑two), estimated_length, door_bits_per_counter (8–16 typical), sample_multiplier (traffic‑proportional aging).
  • eviction — pressure policy: soft_limit (background eviction + enforce admission), hard_limit (minimal hot‑path eviction + runtime memory limit); replicas, scan_rate.
  • refresh{enabled,ttl,beta,rate,replicas,scan_rate,coefficient}.
  • forceGC — periodic FreeOSMemory.
  • metrics.enabled — Prometheus/VictoriaMetrics.
  • k8s.probe.timeout — probe timeout.
  • rules — per‑path overrides + cache key/value canonicalization.

Example (deployment excerpt)

cache:
  env: "dev"
  enabled: true

  runtime:
    gomaxprocs: 0

  api:
    name: "starTeam.advCache"
    port: "8020"

  upstream:
    policy: "await"
    cluster:
      backends:
        - id: "prod-node-1"
          enabled: true
          host: "localhost:8081"
          scheme: "http"
          rate: 100000
          timeout: "10s"
          max_timeout: "1m"
          use_max_timeout_header: ""
          healthcheck: "/healthcheck"
        - id: "low-resources-prod-node-2"
          enabled: true
          host: "localhost:8082"
          scheme: "http"
          rate: 3000
          timeout: "10s"
          max_timeout: "1m"
          use_max_timeout_header: ""
          healthcheck: "/healthcheck"
        - id: "legacy-prod-node-3"
          enabled: true
          host: "localhost:8083"
          scheme: "http"
          rate: 500
          timeout: "1m"
          max_timeout: "10m"
          use_max_timeout_header: ""
          healthcheck: "/legacy/health/is-ok"

  data:
    dump:
      enabled: false
      dir: "public/dump"
      name: "cache.dump"
      crc32_control_sum: true
      max_versions: 3
      gzip: false

  storage:
    size: 53687091200  # 50 GiB

  admission:
    table_len_per_shard: 32768
    estimated_length: 10000000
    door_bits_per_counter: 12
    sample_multiplier: 12

  eviction:
    enabled: true
    replicas: 4
    scan_rate: 8
    soft_limit: 0.8
    hard_limit: 0.9

  refresh:
    enabled: true
    ttl: "3h"
    beta: 0.5
    rate: 1250
    replicas: 4
    scan_rate: 32
    coefficient: 0.5

  forceGC:
    enabled: true
    interval: "10s"

  metrics:
    enabled: true

  k8s:
    probe:
      timeout: "5s"

  rules:
    /api/v2/cloud/data:
      cache_key:
        query: [project[id], domain, language, choice, timezone]
        headers: [Accept-Encoding]
      cache_value:
        headers: [Content-Type, Content-Length, Content-Encoding, Connection, Strict-Transport-Security, Vary, Cache-Control]

    /api/v1/stats:
      enabled: true
      ttl: "36h"
      beta: 0.4
      coefficient: 0.7
      cache_key:
        query: [language, timezone]
        headers: [Accept-Encoding]
      cache_value:
        headers: [Content-Type, Content-Length, Content-Encoding, Connection, Strict-Transport-Security, Vary, Cache-Control]

Example (local stress excerpt)

cache:
  env: "dev"
  enabled: true

  runtime:
    gomaxprocs: 12

  api:
    name: "starTeam.adv:8020"
    port: "8081"

  upstream:
    policy: "deny"
    cluster:
      backends:
        - id: "adv"
          enabled: true
          host: "localhost:8020"
          scheme: "http"
          rate: 250000
          timeout: "5s"
          max_timeout: "3m"
          use_max_timeout_header: "X-Google-Bot"
          healthcheck: "/k8s/probe"

  storage:
    size: 10737418240  # 10 GiB

  admission:
    table_len_per_shard: 32768
    estimated_length: 10000000
    door_bits_per_counter: 12
    sample_multiplier: 10

  eviction:
    enabled: true
    replicas: 4
    scan_rate: 8
    soft_limit: 0.9
    hard_limit: 0.99

  forceGC:
    enabled: true
    interval: "10s"

  metrics:
    enabled: true

Eviction & pressure policy

  • Background eviction at SOFT‑LIMIT
    When heap_usage >= storage.size × soft_limit, the evictor runs in the background and does not touch the hot path. It removes items using a larger LRU sample (preferentially keeping newer entries). Increase replicas and scan_rate to shave memory continuously.

  • Admission at SOFT‑LIMIT
    TinyLFU admission is enforced on the hot path during pressure to avoid polluting the cache with low‑value inserts while the evictor catches up.

  • Minimal hot‑path eviction at HARD‑LIMIT
    When heap_usage >= storage.size × hard_limit, a single‑item eviction per request is applied to reduce contention with the background worker, and the runtime memory limit is set in parallel. This preserves throughput and avoids latency cliffs.


TinyLFU + Doorkeeper (admission)

  • Count‑Min Sketch (depth=4) with compact counters, sharded to minimize contention.
  • Sample‑based aging: ages after estimated_length × sample_multiplier observations (traffic‑proportional).
  • Doorkeeper (Bloom‑like bitset) gates first‑seen keys; reset with aging to avoid FPR growth.

Recommended starting points:
table_len_per_shard: 8192–32768 · door_bits_per_counter: 12 · sample_multiplier: 8–12


Sizing evidence (current tests)

With randomized object sizes between 1 KiB and 16 KiB (mocks), the cache fills to ~10 GiB of logical data with ~500 MiB of overhead. Resident usage stabilizes around ~10.5 GiB for a 10 GiB dataset under these conditions.


Build & run

Requirements: Go 1.24+

# Build
go build -o advCache ./cmd/main.go

# Run (uses default config path if present)
./advCache

# Run with an explicit config path
./advCache -cfg ./advcache.cfg.yaml

# Docker (example multi‑stage)
docker build -t advcache .
docker run --rm -p 8020:8020 -v "$PWD/public/dump:/app/public/dump" advcache -cfg /app/advcache.cfg.yaml

The built‑in defaults try advcache.cfg.yaml and then advcache.cfg.local.yaml if -cfg is not provided.


HTTP endpoints

  • GET /{any} — main cached endpoint (cache key = path + selected query + selected request headers).
  • GET /cache/on — enable caching.
  • GET /cache/off — disable caching.
  • GET /cache/clear — two‑step clear (first call returns a token with 5‑min TTL; second call with ?token= clears).
  • GET /metrics — Prometheus/VictoriaMetrics exposition.

Observability

  • Hits, misses, proxied/fallback counts, errors, panics.
  • Cache length and memory gauges.
  • Upstream health: healthy/sick/dead.
  • Eviction/admission activity.
  • Refresh scan/attempt metrics.

Enable periodic stats to stdout with logs.stats: true in config.


Tuning guide (ops)

  • Upstream policy: deny for fail‑fast load tests; await in production for back‑pressure.
  • Eviction thresholds: start with soft_limit: 0.8–0.9, hard_limit: 0.9–0.99, forceGC.enabled: true. If hot‑path eviction triggers often, increase evictor replicas or scan_rate.
  • Admission: watch Doorkeeper density and reset interval; if density > ~0.5, increase door_bits_per_counter or reduce sample_multiplier.
  • CPU: leave gomaxprocs: 0 in production; pin CPUs via container limits/quotas if needed.
  • Headers: whitelist only what must participate in the key; Accept-Encoding is a good default when you store compressed variants.

Testing

  • Unit tests around storage hot path, TinyLFU, and shard balancer.
  • Dump/load tests with CRC and rotation.
  • Upstream fault injection: timeouts, spikes, error bursts.
  • Benchmarks with -benchmem, race tests for concurrency‑sensitive code.

License

MIT — see LICENSE.


Maintainer

Borislav Glazunov — glazunov2142@gmail.com · Telegram @glbrslv