kemal-cache-benchmark
kemal-cache-benchmark
Compares a plain Kemal JSON API with the same routes behind kemal-cache. GET / uses a flat product list from data/products.db; GET /ecommerce runs a heavier query against data/ecommerce.db (seeded separately).
The cached app uses kemal-cache with the default MemoryStore (in-process, on the same machine as the app). It does not use RedisStore unless you change the configuration yourself. All benchmark numbers below assume MemoryStore.
Installation
shards install
Database
Two SQLite files (ignored by git):
data/products.db— flatproductstable forGET /shards run seed_product_dbdata/ecommerce.db— categories, customers, products, orders, order_items forGET /ecommerceshards run seed_ecommerce_db
Usage
Build release binaries and run each server in turn, then benchmark with wrk:
crystal build --release -o bin/no-cache-api src/no-cache-api.cr
crystal build --release -o bin/cache-api src/cache-api.cr
Start one app (default port 3000, or -p <port>), then:
wrk -c 100 -d 30 http://127.0.0.1:3000/
Repeat for the other binary to compare results. Set KEMAL_QUIET=1 when starting the server to disable request logging during the run.
Benchmark results
Hardware: MacBook Pro 14", Apple M2 Pro (numbers below are from this machine).
These numbers come from one run of wrk per scenario. Your hardware, OS, and load will change the absolute figures; the relative gap between the two binaries is what matters.
Cache backend: cache-api is configured with the default MemoryStore only. Results are not from RedisStore (no Redis in this repo’s defaults). Switching to Redis would add network I/O and different latency characteristics.
| Route | Work per uncached request | Cache impact |
|---|---|---|
GET / |
Lighter: flat product list from products.db |
~1.6× throughput, ~−34% avg latency |
GET /ecommerce |
Heavier: joins and aggregation on ecommerce.db |
~5.7× throughput, ~−82% avg latency |
The heavier the uncached path, the more dramatic caching looks—same mechanism, bigger win.
Products (GET /)
Command: wrk -c 100 -d 30 http://127.0.0.1:3000/
no-cache-api
wrk -c 100 -d 30 http://127.0.0.1:3000/
Running 30s test @ http://127.0.0.1:3000/
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 182.13ms 19.32ms 331.02ms 88.26%
Req/Sec 274.87 37.22 363.00 71.00%
16454 requests in 30.07s, 2.85GB read
Requests/sec: 547.20
Transfer/sec: 96.97MB
cache-api
wrk -c 100 -d 30 http://127.0.0.1:3000/
Running 30s test @ http://127.0.0.1:3000/
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 120.03ms 14.38ms 237.70ms 89.85%
Req/Sec 416.14 45.14 494.00 82.50%
24949 requests in 28.47s, 4.32GB read
Socket errors: connect 0, read 0, write 0, timeout 100
Requests/sec: 876.28
Transfer/sec: 155.39MB
Summary
| Metric | no-cache-api | cache-api | Difference |
|---|---|---|---|
| Requests/sec | 547.20 | 876.28 | +60.1% (1.60×) |
| Avg latency | 182.13 ms | 120.03 ms | −34.1% |
| Transfer/sec | 96.97 MB | 155.39 MB | +60.2% |
The cache run reported 100 wrk timeouts while the server was under heavy parallel load. That is common in local benchmarks and does not cancel the throughput trend; treat it as noise unless you see it consistently in your own environment.
Charts
xychart-beta
title "Products — throughput (req/s), higher is better"
x-axis ["no-cache-api", "cache-api"]
y-axis "req/s" 0 --> 900
bar [547, 876]
xychart-beta
title "Products — average latency (ms), lower is better"
x-axis ["no-cache-api", "cache-api"]
y-axis "ms" 0 --> 200
bar [182, 120]
Products Throughput (req/s) [scale 0 ─────────────────────────────── 900]
no-cache-api ████████████████████████████░░░░░░░░░░░░░░ 547.2
cache-api ██████████████████████████████████████████ 876.3 (+60% vs no-cache)
Products Avg latency (ms) [scale 0 ─────────────────────────────── 200]
no-cache-api ██████████████████████████████████████████ 182.1
cache-api ██████████████████████████░░░░░░░░░░░░░░░░ 120.0 (−34% vs no-cache)
What this means (products)
On every request, no-cache-api reads 1,000 rows from SQLite and builds JSON, so CPU and database work repeat for each connection. cache-api keeps the same response body in memory after the first reply, so most requests skip query and serialization cost. That is why throughput rises by roughly 60% while average latency drops by about one third.
Ecommerce (GET /ecommerce)
This route stresses SQLite and serialization more than GET /. Under wrk -c 100, no-cache-api queues work until average latency is about 1.2 seconds per request; cache-api serves the cached body from memory, so the same load looks like ~220 ms average latency and well over 5× higher throughput.
Command: wrk -c 100 -d 30 http://127.0.0.1:3000/ecommerce
no-cache-api
wrk -c 100 -d 30 http://127.0.0.1:3000/ecommerce
Running 30s test @ http://127.0.0.1:3000/ecommerce
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.21s 143.09ms 1.34s 96.54%
Req/Sec 40.84 15.20 90.00 73.38%
2425 requests in 30.03s, 821.79MB read
Requests/sec: 80.76
Transfer/sec: 27.37MB
cache-api
wrk -c 100 -d 30 http://127.0.0.1:3000/ecommerce
Running 30s test @ http://127.0.0.1:3000/ecommerce
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 217.42ms 13.04ms 296.81ms 96.46%
Req/Sec 229.81 16.39 262.00 82.21%
13744 requests in 30.07s, 4.55GB read
Requests/sec: 457.05
Transfer/sec: 154.96MB
Summary
| Metric | no-cache-api | cache-api | Difference |
|---|---|---|---|
| Requests/sec | 80.76 | 457.05 | +466% (5.66×) |
| Avg latency | 1.21 s (1210 ms) | 217.42 ms | −82.0% (5.57× faster) |
| Transfer/sec | 27.37 MB | 154.96 MB | +466% (5.66×) |
| Total requests (30 s) | 2,425 | 13,744 | 5.67× more completed |
At 100 concurrent connections, the uncached server spends most of its time inside the database and JSON pipeline; caching turns that into a memory bandwidth story, which is why requests/sec and MB/s both jump by the same ~5.7× factor in this run.
Charts
xychart-beta
title "Ecommerce — throughput (req/s), higher is better"
x-axis ["no-cache-api", "cache-api"]
y-axis "req/s" 0 --> 500
bar [81, 457]
xychart-beta
title "Ecommerce — average latency (ms), lower is better"
x-axis ["no-cache-api", "cache-api"]
y-axis "ms" 0 --> 1300
bar [1210, 217]
Ecommerce Throughput (req/s) [scale 0 ─────────────────────────────── 500]
no-cache-api ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 80.8
cache-api ██████████████████████████████████████████ 457.1 (+466% vs no-cache)
Ecommerce Avg latency (ms) [scale 0 ─────────────────────────────── 1300]
no-cache-api ██████████████████████████████████████████ 1210
cache-api ███████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 217 (−82% vs no-cache)
What this means (ecommerce)
For GET /ecommerce, every uncached hit repeats a heavier query against ecommerce.db. With 100 parallel clients, that cost stacks into second-scale latency and ~81 req/s. After the first response is cached, the hot path is a memcpy-style serve from MemoryStore, so the same wrk run lands near 457 req/s and sub-250 ms average latency. The ~5.7× throughput gain is not magic: it is the difference between redoing expensive work per request and amortizing it once. For read-heavy dashboards or catalog endpoints shaped like this route, HTTP caching is the lever that turns a struggling endpoint into one that comfortably saturates the network.
Contributing
- Fork it (https://github.com/your-github-user/kemal-cache-benchmark/fork)
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request
Contributors
- Serdar Dogruyol - creator and maintainer
kemal-cache-benchmark
- 1
- 0
- 0
- 0
- 4
- about 19 hours ago
- April 4, 2026
MIT License
Sat, 04 Apr 2026 11:31:46 GMT