Skip to content

SSLIOSession: Fix regression in backpressure handling#674

Open
rschmitt wants to merge 1 commit into
apache:masterfrom
rschmitt:backpressure
Open

SSLIOSession: Fix regression in backpressure handling#674
rschmitt wants to merge 1 commit into
apache:masterfrom
rschmitt:backpressure

Conversation

@rschmitt

Copy link
Copy Markdown
Contributor

A regression was introduced by PR #513, submitted as a fix for HTTPCORE-775, which claimed that SSLIOSession::write was failing to "handle" SSLEngineResult#BUFFER_OVERFLOW. The issue was encountered by Apache CXF, who provided a test case demonstrating the problem:

https://github.com/apache/cxf/pull/2214/changes

I investigated this reproducer and found that CXF was actually hanging due to not signaling the availability of more data:

https://github.com/apache/cxf/blob/f1b2c37bd9f3606cf7ca2d5d39cc1168e94fd9e4/rt/transports/http-hc5/src/main/java/org/apache/cxf/transport/http/asyncclient/hc5/CXFHttpAsyncRequestProducer.java#L97

when SSLIOSession#write signaled backpressure here:

https://github.com/apache/cxf/blob/f1b2c37bd9f3606cf7ca2d5d39cc1168e94fd9e4/rt/transports/http-hc5/src/main/java/org/apache/cxf/transport/http/asyncclient/hc5/CXFHttpAsyncRequestProducer.java#L75

The return value of buf.produceContent (which is ultimately SSLEngineResult#bytesConsumed -- 0, in this case) is simply being discarded. Changing the CXFHttpAsyncRequestProducer#available method as follows causes the problematic test cases to succeed instantly:

    @Override
    public int available() {
        if (buffer != null && buffer.hasRemaining()) {
            return buffer.remaining();
        } else if (buf != null && buf.length() > 0) {
            return buf.length();
        }
        return 0;
    }

Additionally, that PR appears to have introduced dysfunctional backpressure handling into SSLIOSession by simply expanding the buffer as much as necessary to hold the encrypted src buffer. This in turn has caused a significant regression in heap usage and latency, hence the need to revert. I've retained the test changes, since they did introduce more coverage for TLSv1.3.

A regression was introduced by PR apache#513, submitted as a fix for
HTTPCORE-775, which claimed that `SSLIOSession::write` was failing to
"handle" `SSLEngineResult#BUFFER_OVERFLOW`. The issue was encountered by
Apache CXF, who provided a test case demonstrating the problem:

https://github.com/apache/cxf/pull/2214/changes

I investigated this reproducer and found that CXF was actually hanging
due to not signaling the availability of more data:

https://github.com/apache/cxf/blob/f1b2c37bd9f3606cf7ca2d5d39cc1168e94fd9e4/rt/transports/http-hc5/src/main/java/org/apache/cxf/transport/http/asyncclient/hc5/CXFHttpAsyncRequestProducer.java#L97

when `SSLIOSession#write` signaled backpressure here:

https://github.com/apache/cxf/blob/f1b2c37bd9f3606cf7ca2d5d39cc1168e94fd9e4/rt/transports/http-hc5/src/main/java/org/apache/cxf/transport/http/asyncclient/hc5/CXFHttpAsyncRequestProducer.java#L75

The return value of `buf.produceContent` (which is ultimately
`SSLEngineResult#bytesConsumed` -- 0, in this case) is simply being
discarded. Changing the `CXFHttpAsyncRequestProducer#available` method
as follows causes the problematic test cases to succeed instantly:

```
    @OverRide
    public int available() {
        if (buffer != null && buffer.hasRemaining()) {
            return buffer.remaining();
        } else if (buf != null && buf.length() > 0) {
            return buf.length();
        }
        return 0;
    }
```

Additionally, that PR appears to have introduced dysfunctional
backpressure handling into `SSLIOSession` by simply expanding the buffer
as much as necessary to hold the encrypted `src` buffer. This in turn
has caused a significant regression in heap usage and latency, hence the
need to revert. I've retained the test changes, since they did introduce
more coverage for TLSv1.3.
@ok2c

ok2c commented Jun 17, 2026

Copy link
Copy Markdown
Member

@rschmitt Thank you for the thorough analysis. There is a single test failing on MacOS only. Ideally this should be fixed. Otherwise, I have to agree I have made a mistake reviewing the proposed change-set

@rschmitt

Copy link
Copy Markdown
Contributor Author

@ok2c The Http1IntegrationTest.testTruncatedChunk flakiness is 100% related to this PR somehow. I'll get to the bottom of it.

@reta

reta commented Jun 17, 2026

Copy link
Copy Markdown
Member

@rschmitt Thank you for the thorough analysis. There is a single test failing on MacOS only. Ideally this should be fixed. Otherwise, I have to agree I have made a mistake reviewing the proposed change-set

@ok2c (not you to blame) sorry about that folks, I think what derailed me from suspecting CXF is that the issue manifested suddenly with just bumping the HttpCore5 version, and only with TLSv1.3 (coming from JDK defaults).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants