Skip to content

perf: TCP_NODELAY + cheap allocs/lookups on the engine_* hot path#27

Open
KA-ROM wants to merge 23 commits into
mainfrom
perf/cheap-allocs
Open

perf: TCP_NODELAY + cheap allocs/lookups on the engine_* hot path#27
KA-ROM wants to merge 23 commits into
mainfrom
perf/cheap-allocs

Conversation

@KA-ROM

@KA-ROM KA-ROM commented Jun 5, 2026

Copy link
Copy Markdown

Summary

Hot-path tuning for the engine_* proxy path: removes per-request allocations and
redundant lookups, sets TCP_NODELAY on both client- and backend-facing connections,
and folds the cached metric handles into a single ProxyHttpMetrics struct. Several
post-process and TCP-tuning experiments were explored and reverted in-branch

Removed from the hot path

  • No heap alloc per header check: is_hop_by_hop_header matches HeaderName constants directly; the per-call lowercased String is gone. (Set is unchanged from before: connection/host/keep-alive/transfer-encoding.)
  • ~5 fewer hashmap lookups per request: in-flight Gauge/Counter + proxy_failure_count handles resolved once per worker, bundled into ProxyHttpMetrics behind Arc so cloning a handle to span an await is one ref-count bump (scc::HashMap::clone is a deep copy).
  • No metric lookup on UA hit path: per-worker scc::HashMap<String, Counter> keyed by &str. The cache is soft-capped at 100 entries to bound the lookup map.
    • Note: the cap bounds the cache only, not the Prometheus series (get_or_create runs unconditionally on a miss). Safe because the authrpc client side is a closed network (sequencer/peer-mirror) with tiny UA cardinality. A safety comment on the field flags that a public-facing endpoint must gate get_or_create instead.
  • connection_info() bound once/request instead of 5x: skips redundant RefCell re-borrow + extensions lookup per call.
  • No per-header revalidation on append: headers_mut().append(...) skips reparsing names through awc's builder. Also fixes a latent multi-value drop (insert_header replaced same-name values).
  • No re-parse on response forwarding: dropped HeaderName::from_str (backend names already valid; the removed branch could only ever silently drop a header).
  • No full HeaderMap clone: capture only content-encoding (the one field read downstream for decompress-on-logging). Client-facing headers are still forwarded verbatim.

Behavior changes (observable on wire / in ops)

  • TCP_NODELAY on backend conns: disables Nagle on the backend leg; logs at debug on failure.
  • TCP_NODELAY on client conns: same fix on the sequencer-facing side, in ConnectionGuard::on_connect alongside the existing keepalive setup.

Refactor (no behavior change)

  • ProxyHttpMetrics struct: extracts cached handles + UA cache out of ProxyHttp into an Arc<ProxyHttpMetrics> with bump_client_info, inc/dec_in_flight_*, record_proxy_failure.

Explored and reverted (kept in history)

Each showed no measurable benefit on our envs / flag sets, or introduced a structural concern. Kept as separate revert commits so any can be re-explored.

  • Reuse parsed JrpcRequestMetaMaybeBatch in finalise_proxying: only fires on the dedup-on path (off in all our envs), so dead code in practice.
  • Skip postprocess entirely for non-logged, non-mirrored requests: gated on logging-fully-off + no-mirroring; never fires in current envs.
  • Defer postprocess_client_request off the critical path: introduced a dependency on actix-web's body-poll ordering; structural fragility not worth the unmeasured saving.
  • Tighten awc disconnect_timeout to 100ms: awc calls no_disconnect_timeout() for plaintext TCP pools, so the knob was inert for our http:// backends. A configurable flag is deferred to a follow-up PR.
  • TCP_KEEPALIVE on backend conns (60s): awc conn_keep_alive(2 * backend_timeout) is also 60s, so the pool drops the idle conn at the same idleness threshold; no-op on loopback.

Unchanged

  • Which requests get intercepted, mirrored, or logged.
  • No new CLI flag, env var, or config surface.

@KA-ROM KA-ROM marked this pull request as draft June 5, 2026 13:10
@KA-ROM KA-ROM changed the title Perf/cheap allocs Small optimizations Jun 5, 2026
@KA-ROM KA-ROM force-pushed the perf/cheap-allocs branch from d7d7f94 to f86fd2d Compare June 9, 2026 12:40
@KA-ROM KA-ROM changed the title Small optimizations perf: set TCP_NODELAY, cheap allocs, socket tuning, postprocess deferral on engine_* path Jun 9, 2026
@KA-ROM KA-ROM requested a review from 0x416e746f6e June 9, 2026 12:54
@KA-ROM KA-ROM marked this pull request as ready for review June 9, 2026 12:54
@KA-ROM KA-ROM requested review from akundaz, avalonche and julio4 June 9, 2026 12:56
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/utils/utils_http.rs
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs
@KA-ROM KA-ROM changed the title perf: set TCP_NODELAY, cheap allocs, socket tuning, postprocess deferral on engine_* path perf: TCP_NODELAY + cheap allocs/lookups on the engine_* hot path Jun 12, 2026

@julio4 julio4 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice; some comments could be more concise.

Comment on lines +156 to +158
// it every request, uncached). Safe here only because the closed
// authrpc network keeps UA cardinality tiny — see the SAFETY note
// on `client_info_cache`. To actually bound cardinality on a

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to not make any assumptions on usage and make it safe by default.

self.proxy_failure_count.inc();
}

fn bump_client_info(&self, user_agent: &str) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename inc_client_info to match metric inc method

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather record_client_info I guess.

the increment is purely technical (it only gives an idea which of the user-agent labels is active r.n., the value doesn't matter only that it's being updated)

conn_id: Uuid,
status: StatusCode,
headers: HeaderMap, // TODO: perhaps we don't need all headers, just select ones
content_encoding: Option<HeaderValue>,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if this ProxyHttpResponseInfo is only used for the response body, and the headers are still cloned, this can be slightly confusing

@0x416e746f6e 0x416e746f6e left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great one. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants