Skip to content

perf(lock): shared locker latch on lock-get hot path (2.7x at 24t)#28

Merged
gburd merged 1 commit into
masterfrom
perf/lock-shared-latch
Jun 20, 2026
Merged

perf(lock): shared locker latch on lock-get hot path (2.7x at 24t)#28
gburd merged 1 commit into
masterfrom
perf/lock-shared-latch

Conversation

@gburd

@gburd gburd commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

perf(lock): take the locker mutex shared on the lock-get hot path

Every DB_ENV->lock_get/lock_put resolves its locker through
__lock_getlocker_int under the region-global locker mutex mtx_lockers.
On the lock-get path that lookup is create=0 — a read-only walk of a
locker hash bucket — yet it was held exclusive, serializing every lock
acquisition across all cores even with objects fully partitioned (240-way)
and zero lock conflict.

Fix

Make mtx_lockers a DB_MUTEX_SHARED latch and take it shared for the
read-only locker lookup on the hot path (__lock_get_api). Locker
create/free, the deadlock detector's locker-list walk, failchk, and stat keep
it exclusive, so a reader never runs concurrently with a writer.

Measured (lab/bench/lock_bench distinct, no conflict, 24-thread box)

threads master this branch upper bound*
1 1.38M 1.27M 1.87M
8 3.03M 6.36M (2.1×) 10.1M
24 2.60M 7.01M (2.7×) 16.1M

Master plateaus and declines past 8 threads; the shared latch scales to
24 threads. *Upper bound = removing the mutex entirely (unsafe diagnostic);
the shared latch captures ~half, the rest needs partitioning the locker hash
(deferred — more invasive). Single-thread cost rises ~8% (shared vs plain
mutex, uncontended), dwarfed by the multi-core gain.

No regression on real workloads: rrand unchanged (btree-bound), tproc_b
flat (deadlock/disk-bound) — helps where the bottleneck is, costs nothing
elsewhere.

Verified

TCL lock001/002/003 (incl. the multi-process test), txn001/002,
test001, ssi001/002 pass; concurrent shared read-lock acquisition runs
clean; clean build (gcc via Nix, Apple clang).

The probe and benchmark fixes used to find this are in #27.

Every DB_ENV->lock_get / lock_put resolves its locker through
__lock_getlocker_int under the region-global locker mutex (mtx_lockers).
On the lock-get path the lookup is create=0 -- a read-only walk of the
locker hash bucket -- yet it was held *exclusive*, serializing every lock
acquisition across all cores even when objects are fully partitioned and
there is no lock conflict.

Make mtx_lockers a DB_MUTEX_SHARED latch and take it in shared mode for the
read-only locker lookup on the hot path (__lock_get_api).  Locker create,
free, the deadlock detector's locker-list walk, failchk, and stat continue
to hold it exclusive, so they never run concurrently with a reader.

Measured with lab/bench/lock_bench (distinct mode, no lock conflict, on a
24-thread box): master plateaus and then declines past 8 threads
(~3.0M ops/s peak, 2.6M at 24t); the shared latch scales to 7.0M at 24t --
2.1x at 8 threads, 2.7x at 24.  It captures roughly half the upper bound of
removing the mutex entirely; the remainder is the shared latch's own
reference-count cache line, which would require partitioning the locker
hash to recover (left for later -- this is the low-risk 80/20).  A small
single-thread regression (~8%) reflects the shared latch's slightly higher
uncontended cost and is dwarfed by the multi-core gain.

Verified: TCL lock001/002/003 (incl. multi-process), txn001/002, test001,
ssi001/002 pass; concurrent shared read-lock acquisition (lock_bench shared)
runs clean.
@gburd gburd merged commit 8f207cf into master Jun 20, 2026
36 of 39 checks passed
@gburd gburd deleted the perf/lock-shared-latch branch June 20, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant