Remove global rmem page slab#392
Conversation
msgpack_rmem_* slab allocator
bb0785b to
b82fefc
Compare
|
So the memory arena is something I've always been dubious of the benefit of, so I wouldn't mind getting rid of it. That being said, MessagePack needs way more changes than this to be usable in Ractors. #390 is on my TODO list, I'll come around to it eventually. |
The page-recycling slab was a process-global msgpack_rmem_t mutated through an unsynchronized bitmask, so concurrent packing or unpacking could race on it. Drop the slab and serve rmem pages from xmalloc/xfree instead. Modern arena-based mallocs are good at recycling these allocations without maintaining process-global mutable state in msgpack-ruby.
|
@byroot Added micro-benchmarks to the PR body. On m4 pro with jemalloc, it's slighly faster to just plain xmalloc 🤷🏻 Also, I've removed all Ractor references from this PR, so it should be independently mergeable. Hope you are well! |
msgpack_rmem_* slab allocator
After looking through the C code a bit more, I didn't find any remaining global mutable state that would make The non-shareable I'm tempted to add |
The notable change here is the removal of themsgpack_rmem_*` routines, which serve as a mechanism for efficiently providing chunks of memory for decoding work.Why is it not ractor safe?
The old page-recycling slab is a process-global
msgpack_rmem_tmutated through an unsynchronized bitmask, so parallel Ractors would race on it.How did you address this?
Drop the global slab entirely and use plain
xmalloc/xfree. Modern arena-based mallocs (i.e. jemalloc) are good at recycling and avoiding thread contention, so maintaining a custom slab allocator is not worth it.Did you try alternatives?
Yes:
No:
malloc(3)Perf (local HTTP requests)
Single-threaded, jemalloc 5.3, decode-heavy workload
On a realistic, "real" HTTP request benchmark, bare-xmalloc is maybe a touch faster, but mostly noise in the diff:
RSS Impact
Memory usage seems fine as well, with the bare-xmalloc (this PR) having a higher peak as decay time increases (this is expected, and not a problem).
dirty_decay_ms:10000(default)dirty_decay_ms:1000Microbenchmarks (
ruby --yjit)Generated by
/tmp/msgpack_format_yjit_pr_body.pyfrom rawbenchmark/ipsoutput; table values were parsed, not hand-entered.ruby 4.0.4 (2026-05-12 revision b89eb1bcbf) +YJIT +PRISM [arm64-darwin25]bundle exec rake compile && bundle exec ruby --yjit -Ilib -Iext <generated bench>benchmark/ipswarmup 15s, measurement 60s per case2026-06-15T22:14:36Z; finished:2026-06-15T22:29:47Zorigin/master:09c914dPR branch:b82fefcRaw benchmark output
origin/master
PR branch