perf: TCP_NODELAY + cheap allocs/lookups on the engine_* hot path#27
Open
KA-ROM wants to merge 23 commits into
Open
perf: TCP_NODELAY + cheap allocs/lookups on the engine_* hot path#27KA-ROM wants to merge 23 commits into
KA-ROM wants to merge 23 commits into
Conversation
This was referenced Jun 5, 2026
akundaz
reviewed
Jun 9, 2026
0x416e746f6e
reviewed
Jun 9, 2026
julio4
reviewed
Jun 15, 2026
julio4
left a comment
Member
There was a problem hiding this comment.
Nice; some comments could be more concise.
Comment on lines
+156
to
+158
| // it every request, uncached). Safe here only because the closed | ||
| // authrpc network keeps UA cardinality tiny — see the SAFETY note | ||
| // on `client_info_cache`. To actually bound cardinality on a |
Member
There was a problem hiding this comment.
It would be best to not make any assumptions on usage and make it safe by default.
| self.proxy_failure_count.inc(); | ||
| } | ||
|
|
||
| fn bump_client_info(&self, user_agent: &str) { |
Member
There was a problem hiding this comment.
nit: rename inc_client_info to match metric inc method
Member
There was a problem hiding this comment.
rather record_client_info I guess.
the increment is purely technical (it only gives an idea which of the user-agent labels is active r.n., the value doesn't matter only that it's being updated)
| conn_id: Uuid, | ||
| status: StatusCode, | ||
| headers: HeaderMap, // TODO: perhaps we don't need all headers, just select ones | ||
| content_encoding: Option<HeaderValue>, |
Member
There was a problem hiding this comment.
I think if this ProxyHttpResponseInfo is only used for the response body, and the headers are still cloned, this can be slightly confusing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hot-path tuning for the
engine_*proxy path: removes per-request allocations andredundant lookups, sets TCP_NODELAY on both client- and backend-facing connections,
and folds the cached metric handles into a single
ProxyHttpMetricsstruct. Severalpost-process and TCP-tuning experiments were explored and reverted in-branch
Removed from the hot path
is_hop_by_hop_headermatchesHeaderNameconstants directly; the per-call lowercasedStringis gone. (Set is unchanged from before: connection/host/keep-alive/transfer-encoding.)Gauge/Counter+proxy_failure_counthandles resolved once per worker, bundled intoProxyHttpMetricsbehindArcso cloning a handle to span anawaitis one ref-count bump (scc::HashMap::cloneis a deep copy).scc::HashMap<String, Counter>keyed by&str. The cache is soft-capped at 100 entries to bound the lookup map.get_or_createruns unconditionally on a miss). Safe because the authrpc client side is a closed network (sequencer/peer-mirror) with tiny UA cardinality. A safety comment on the field flags that a public-facing endpoint must gateget_or_createinstead.connection_info()bound once/request instead of 5x: skips redundant RefCell re-borrow + extensions lookup per call.headers_mut().append(...)skips reparsing names through awc's builder. Also fixes a latent multi-value drop (insert_headerreplaced same-name values).HeaderName::from_str(backend names already valid; the removed branch could only ever silently drop a header).HeaderMapclone: capture onlycontent-encoding(the one field read downstream for decompress-on-logging). Client-facing headers are still forwarded verbatim.Behavior changes (observable on wire / in ops)
ConnectionGuard::on_connectalongside the existing keepalive setup.Refactor (no behavior change)
ProxyHttpMetricsstruct: extracts cached handles + UA cache out ofProxyHttpinto anArc<ProxyHttpMetrics>withbump_client_info,inc/dec_in_flight_*,record_proxy_failure.Explored and reverted (kept in history)
Each showed no measurable benefit on our envs / flag sets, or introduced a structural concern. Kept as separate revert commits so any can be re-explored.
JrpcRequestMetaMaybeBatchinfinalise_proxying: only fires on the dedup-on path (off in all our envs), so dead code in practice.postprocess_client_requestoff the critical path: introduced a dependency on actix-web's body-poll ordering; structural fragility not worth the unmeasured saving.disconnect_timeoutto 100ms: awc callsno_disconnect_timeout()for plaintext TCP pools, so the knob was inert for ourhttp://backends. A configurable flag is deferred to a follow-up PR.conn_keep_alive(2 * backend_timeout)is also 60s, so the pool drops the idle conn at the same idleness threshold; no-op on loopback.Unchanged