SSLIOSession: Fix regression in backpressure handling#674
Conversation
A regression was introduced by PR apache#513, submitted as a fix for HTTPCORE-775, which claimed that `SSLIOSession::write` was failing to "handle" `SSLEngineResult#BUFFER_OVERFLOW`. The issue was encountered by Apache CXF, who provided a test case demonstrating the problem: https://github.com/apache/cxf/pull/2214/changes I investigated this reproducer and found that CXF was actually hanging due to not signaling the availability of more data: https://github.com/apache/cxf/blob/f1b2c37bd9f3606cf7ca2d5d39cc1168e94fd9e4/rt/transports/http-hc5/src/main/java/org/apache/cxf/transport/http/asyncclient/hc5/CXFHttpAsyncRequestProducer.java#L97 when `SSLIOSession#write` signaled backpressure here: https://github.com/apache/cxf/blob/f1b2c37bd9f3606cf7ca2d5d39cc1168e94fd9e4/rt/transports/http-hc5/src/main/java/org/apache/cxf/transport/http/asyncclient/hc5/CXFHttpAsyncRequestProducer.java#L75 The return value of `buf.produceContent` (which is ultimately `SSLEngineResult#bytesConsumed` -- 0, in this case) is simply being discarded. Changing the `CXFHttpAsyncRequestProducer#available` method as follows causes the problematic test cases to succeed instantly: ``` @OverRide public int available() { if (buffer != null && buffer.hasRemaining()) { return buffer.remaining(); } else if (buf != null && buf.length() > 0) { return buf.length(); } return 0; } ``` Additionally, that PR appears to have introduced dysfunctional backpressure handling into `SSLIOSession` by simply expanding the buffer as much as necessary to hold the encrypted `src` buffer. This in turn has caused a significant regression in heap usage and latency, hence the need to revert. I've retained the test changes, since they did introduce more coverage for TLSv1.3.
|
@rschmitt Thank you for the thorough analysis. There is a single test failing on MacOS only. Ideally this should be fixed. Otherwise, I have to agree I have made a mistake reviewing the proposed change-set |
|
@ok2c The |
@ok2c (not you to blame) sorry about that folks, I think what derailed me from suspecting CXF is that the issue manifested suddenly with just bumping the HttpCore5 version, and only with TLSv1.3 (coming from JDK defaults). |
A regression was introduced by PR #513, submitted as a fix for HTTPCORE-775, which claimed that
SSLIOSession::writewas failing to "handle"SSLEngineResult#BUFFER_OVERFLOW. The issue was encountered by Apache CXF, who provided a test case demonstrating the problem:https://github.com/apache/cxf/pull/2214/changes
I investigated this reproducer and found that CXF was actually hanging due to not signaling the availability of more data:
https://github.com/apache/cxf/blob/f1b2c37bd9f3606cf7ca2d5d39cc1168e94fd9e4/rt/transports/http-hc5/src/main/java/org/apache/cxf/transport/http/asyncclient/hc5/CXFHttpAsyncRequestProducer.java#L97
when
SSLIOSession#writesignaled backpressure here:https://github.com/apache/cxf/blob/f1b2c37bd9f3606cf7ca2d5d39cc1168e94fd9e4/rt/transports/http-hc5/src/main/java/org/apache/cxf/transport/http/asyncclient/hc5/CXFHttpAsyncRequestProducer.java#L75
The return value of
buf.produceContent(which is ultimatelySSLEngineResult#bytesConsumed-- 0, in this case) is simply being discarded. Changing theCXFHttpAsyncRequestProducer#availablemethod as follows causes the problematic test cases to succeed instantly:Additionally, that PR appears to have introduced dysfunctional backpressure handling into
SSLIOSessionby simply expanding the buffer as much as necessary to hold the encryptedsrcbuffer. This in turn has caused a significant regression in heap usage and latency, hence the need to revert. I've retained the test changes, since they did introduce more coverage for TLSv1.3.