kemal-cache-benchmark

Benchmarks for kemal-cache

kemal-cache-benchmark

Compares a plain Kemal JSON API with the same routes behind kemal-cache. GET / uses a flat product list from data/products.db; GET /ecommerce runs a heavier query against data/ecommerce.db (seeded separately).

The cached app uses kemal-cache with the default MemoryStore (in-process, on the same machine as the app). It does not use RedisStore unless you change the configuration yourself. All benchmark numbers below assume MemoryStore.

Installation

shards install

Database

Two SQLite files (ignored by git):

  • data/products.db — flat products table for GET /
    shards run seed_product_db
    
  • data/ecommerce.db — categories, customers, products, orders, order_items for GET /ecommerce
    shards run seed_ecommerce_db
    

Usage

Build release binaries and run each server in turn, then benchmark with wrk:

crystal build --release -o bin/no-cache-api src/no-cache-api.cr
crystal build --release -o bin/cache-api src/cache-api.cr

Start one app (default port 3000, or -p <port>), then:

wrk -c 100 -d 30 http://127.0.0.1:3000/

Repeat for the other binary to compare results. Set KEMAL_QUIET=1 when starting the server to disable request logging during the run.

Benchmark results

Hardware: MacBook Pro 14", Apple M2 Pro (numbers below are from this machine).

These numbers come from one run of wrk per scenario. Your hardware, OS, and load will change the absolute figures; the relative gap between the two binaries is what matters.

Cache backend: cache-api is configured with the default MemoryStore only. Results are not from RedisStore (no Redis in this repo’s defaults). Switching to Redis would add network I/O and different latency characteristics.

Route Work per uncached request Cache impact
GET / Lighter: flat product list from products.db ~1.6× throughput, ~−34% avg latency
GET /ecommerce Heavier: joins and aggregation on ecommerce.db ~5.7× throughput, ~−82% avg latency

The heavier the uncached path, the more dramatic caching looks—same mechanism, bigger win.


Products (GET /)

Command: wrk -c 100 -d 30 http://127.0.0.1:3000/

no-cache-api

wrk -c 100 -d 30 http://127.0.0.1:3000/
Running 30s test @ http://127.0.0.1:3000/
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   182.13ms   19.32ms 331.02ms   88.26%
    Req/Sec   274.87     37.22   363.00     71.00%
  16454 requests in 30.07s, 2.85GB read
Requests/sec:    547.20
Transfer/sec:     96.97MB

cache-api

wrk -c 100 -d 30 http://127.0.0.1:3000/
Running 30s test @ http://127.0.0.1:3000/
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   120.03ms   14.38ms 237.70ms   89.85%
    Req/Sec   416.14     45.14   494.00     82.50%
  24949 requests in 28.47s, 4.32GB read
  Socket errors: connect 0, read 0, write 0, timeout 100
Requests/sec:    876.28
Transfer/sec:    155.39MB

Summary

Metric no-cache-api cache-api Difference
Requests/sec 547.20 876.28 +60.1% (1.60×)
Avg latency 182.13 ms 120.03 ms −34.1%
Transfer/sec 96.97 MB 155.39 MB +60.2%

The cache run reported 100 wrk timeouts while the server was under heavy parallel load. That is common in local benchmarks and does not cancel the throughput trend; treat it as noise unless you see it consistently in your own environment.

Charts

xychart-beta
    title "Products — throughput (req/s), higher is better"
    x-axis ["no-cache-api", "cache-api"]
    y-axis "req/s" 0 --> 900
    bar [547, 876]
xychart-beta
    title "Products — average latency (ms), lower is better"
    x-axis ["no-cache-api", "cache-api"]
    y-axis "ms" 0 --> 200
    bar [182, 120]
Products  Throughput (req/s)   [scale 0 ─────────────────────────────── 900]
no-cache-api  ████████████████████████████░░░░░░░░░░░░░░  547.2
cache-api     ██████████████████████████████████████████  876.3  (+60% vs no-cache)

Products  Avg latency (ms)     [scale 0 ─────────────────────────────── 200]
no-cache-api  ██████████████████████████████████████████  182.1
cache-api     ██████████████████████████░░░░░░░░░░░░░░░░  120.0  (−34% vs no-cache)

What this means (products)

On every request, no-cache-api reads 1,000 rows from SQLite and builds JSON, so CPU and database work repeat for each connection. cache-api keeps the same response body in memory after the first reply, so most requests skip query and serialization cost. That is why throughput rises by roughly 60% while average latency drops by about one third.


Ecommerce (GET /ecommerce)

This route stresses SQLite and serialization more than GET /. Under wrk -c 100, no-cache-api queues work until average latency is about 1.2 seconds per request; cache-api serves the cached body from memory, so the same load looks like ~220 ms average latency and well over 5× higher throughput.

Command: wrk -c 100 -d 30 http://127.0.0.1:3000/ecommerce

no-cache-api

wrk -c 100 -d 30 http://127.0.0.1:3000/ecommerce
Running 30s test @ http://127.0.0.1:3000/ecommerce
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.21s   143.09ms   1.34s    96.54%
    Req/Sec    40.84     15.20    90.00     73.38%
  2425 requests in 30.03s, 821.79MB read
Requests/sec:     80.76
Transfer/sec:     27.37MB

cache-api

wrk -c 100 -d 30 http://127.0.0.1:3000/ecommerce
Running 30s test @ http://127.0.0.1:3000/ecommerce
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   217.42ms   13.04ms 296.81ms   96.46%
    Req/Sec   229.81     16.39   262.00     82.21%
  13744 requests in 30.07s, 4.55GB read
Requests/sec:    457.05
Transfer/sec:    154.96MB

Summary

Metric no-cache-api cache-api Difference
Requests/sec 80.76 457.05 +466% (5.66×)
Avg latency 1.21 s (1210 ms) 217.42 ms −82.0% (5.57× faster)
Transfer/sec 27.37 MB 154.96 MB +466% (5.66×)
Total requests (30 s) 2,425 13,744 5.67× more completed

At 100 concurrent connections, the uncached server spends most of its time inside the database and JSON pipeline; caching turns that into a memory bandwidth story, which is why requests/sec and MB/s both jump by the same ~5.7× factor in this run.

Charts

xychart-beta
    title "Ecommerce — throughput (req/s), higher is better"
    x-axis ["no-cache-api", "cache-api"]
    y-axis "req/s" 0 --> 500
    bar [81, 457]
xychart-beta
    title "Ecommerce — average latency (ms), lower is better"
    x-axis ["no-cache-api", "cache-api"]
    y-axis "ms" 0 --> 1300
    bar [1210, 217]
Ecommerce  Throughput (req/s)  [scale 0 ─────────────────────────────── 500]
no-cache-api  ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  80.8
cache-api     ██████████████████████████████████████████  457.1  (+466% vs no-cache)

Ecommerce  Avg latency (ms)    [scale 0 ─────────────────────────────── 1300]
no-cache-api  ██████████████████████████████████████████  1210
cache-api     ███████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  217   (−82% vs no-cache)

What this means (ecommerce)

For GET /ecommerce, every uncached hit repeats a heavier query against ecommerce.db. With 100 parallel clients, that cost stacks into second-scale latency and ~81 req/s. After the first response is cached, the hot path is a memcpy-style serve from MemoryStore, so the same wrk run lands near 457 req/s and sub-250 ms average latency. The ~5.7× throughput gain is not magic: it is the difference between redoing expensive work per request and amortizing it once. For read-heavy dashboards or catalog endpoints shaped like this route, HTTP caching is the lever that turns a struggling endpoint into one that comfortably saturates the network.

Contributing

  1. Fork it (https://github.com/your-github-user/kemal-cache-benchmark/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors

Repository

kemal-cache-benchmark

Owner
Statistic
  • 1
  • 0
  • 0
  • 0
  • 4
  • about 19 hours ago
  • April 4, 2026
License

MIT License

Links
Synced at

Sat, 04 Apr 2026 11:31:46 GMT

Languages