Skip to content

feat(clipping): add clipRadius prop for rounded corner clipping#820

Open
wouterlucas wants to merge 5 commits into
mainfrom
feat/rounded-corner-clipping
Open

feat(clipping): add clipRadius prop for rounded corner clipping#820
wouterlucas wants to merge 5 commits into
mainfrom
feat/rounded-corner-clipping

Conversation

@wouterlucas

@wouterlucas wouterlucas commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Adds a new clipRadius prop to nodes that enables rounded corner clipping when clipping: true is set. clipRadius: 0 (default) is a no-op and leaves the existing rectangular gl.scissor path completely unchanged.

uses WebGL stencils and existing Canvas2D arcTo

Additional fixes:

Bug 1: Redundant stencil pass on every inherited-clip descendant
Bug 2: intersectRect zero-area leaves clippingRect.valid = true
Bug 3: createRenderBounds skips preload zone (textures load too late)

cc: @DouweCnossen for Bug 3 ⚡

Warning !!

This needs some proper performance testing before we do anything.

Adds a new `clipRadius` prop to nodes that enables rounded corner
clipping when `clipping: true` is set. `clipRadius: 0` (default)
is a no-op and leaves the existing rectangular gl.scissor path
completely unchanged.

WebGL implementation:
- Uses the stencil buffer (already allocated at context creation with
  `stencil: true`) to mask children to a rounded-rect SDF shape
- New `StencilClip` GLSL shader handles the stencil write pass
- Stencil state is cached in `WebGlContextWrapper` (same pattern as
  the existing scissor state cache) — new methods: `setStencilTest`,
  `stencilFunc`, `stencilOp`, `stencilMask`, `clearStencil`,
  `colorMask`
- `WebGlRenderer` gains `beginRoundedClip` / `endRoundedClip` which
  push pre-allocated `StencilClipRenderOp` sentinels into `renderOps`
  bracketing the subtree; the render loop dispatches them to the stencil
  write and restore passes
- Nested rounded clips increment the stencil ref level
- Scissor test is still applied as a coarse fast-reject rectangle

Canvas2D implementation:
- `CanvasRenderer.addQuad` already uses `Path2D` + `ctx.clip()`; when
  `clipRadius > 0` a rounded-rect path built with `arcTo` is used
  instead of `path.rect`

Other changes:
- `RectWithValid` gains a `clipRadius` field (0 by default)
- `CoreNodeProps` gains `clipRadius: number` with getter/setter
- `Stage.addQuads` calls `beginRoundedClip` before rendering the
  container node's own quad so it is also clipped to the rounded shape
- `CoreRenderer` exposes no-op `beginRoundedClip` / `endRoundedClip`
  so `Stage` needs no renderer-specific import (tree-shaking safe)
- `GlContextWrapper` declares the new stencil enum constants as
  abstract readonly properties

Tests:
- 9 new unit tests in `CoreNode.test.ts` covering clipRadius defaults,
  setter, propagation, rotation disable, and clippingRect calculation
- New visual regression test `examples/tests/clipping-rounded.ts` with
  5 pages covering overflow on all sides, rectangular fallback (regression
  guard), ancestor children, nested clips, and combined with Rounded shader
@wouterlucas wouterlucas force-pushed the feat/rounded-corner-clipping branch from 31b9972 to 3b6f429 Compare June 26, 2026 14:06
@wouterlucas wouterlucas marked this pull request as ready for review June 26, 2026 14:09
Three animated scenes to exercise the WebGL stencil-buffer clipRadius path:

Scene 0 — Scrolling list
  Vertical list of rounded-clip portlets scrolling continuously.
  Models a typical TV UI where many stencil regions translate in
  lock-step. Node count: 5 × perfMultiplier cols × 8 rows.

Scene 1 — Flying cards
  20 × perfMultiplier rounded-clip cards flying independently with
  random velocity + bounce. Tests many independent stencil regions
  being opened/closed every frame with no spatial coherence.

Scene 2 — Nested rounded clips
  Outer rounded-clip row scrolling horizontally; each cell contains
  an inner rounded-clip thumbnail zooming in/out with a staggered
  phase. Tests nested stencil ref counting (outer ref + inner ref
  per visible cell).

Scenes auto-cycle every 6 s; press LEFT/RIGHT to switch manually.
perfMultiplier scales node density for targeted load testing.
…s test

Press SPACE to flip clippingEnabled on all tracked clip nodes live:
- clipping: false + clipRadius: 0  → plain unclipped render (baseline)
- clipping: true  + clipRadius: N  → full stencil path (default)

A HUD label in the top-right corner shows the current state so FPS
difference is immediately visible on-screen alongside the renderer's
own FPS counter.

Plain-scissor viewport nodes (the list/row containers) are intentionally
excluded from the toggle so the scene layout stays intact either way.
Three bottlenecks identified by profiling at multiplier=20 (18fps with
clipping vs 33fps without):

1. 39 MB/frame of redundant GPU uploads
   The stencil write pass overwrote the shared quad VBO with scratch
   geometry, then the render loop had to re-upload the full 100 KB main
   buffer after every stencil node. At 400 stencil nodes this was 801
   arrayBufferData calls and ~39 MB transferred to the GPU every frame.

   Fix: allocate a dedicated 128-byte DYNAMIC_DRAW VBO for the stencil
   quad (stencilQuadBufferCollection). The main quad buffer is never
   touched during a stencil pass — the restore upload is gone entirely.
   Total arrayBufferData calls: 1 per frame regardless of node count.

2. 800 gl.useProgram pipeline stalls per frame
   shManager.useShader(stencilProgram) triggered a full detach/attach
   cycle (disabling the scene shader's vertex attrib arrays, then
   re-enabling them after the pass). With 400 stencil nodes this was
   2x gl.useProgram per node = 800 program switches per frame.

   Fix: new WebGlShaderProgram.bindForStencil() activates the stencil
   program directly via glw.useProgram without going through the shader
   manager cycle. After the pass shManager.releaseShader() (new method
   on CoreShaderManager) marks the cache dirty so the next real draw
   triggers exactly one gl.useProgram. Net: 1 switch per stencil node
   instead of 2.

3. Batching destroyed after every stencil region
   endRoundedClip() set curRenderOp = null, preventing any nodes after
   a stencil region from joining the previous batch. With 400 stencil
   regions interleaved, every card and child got its own draw call.

   Fix: remove the curRenderOp = null from endRoundedClip — stencil
   state is restored by drawStencilEnd so subsequent nodes can batch
   normally. A new stencilDepth field on CoreNode (stamped by
   newRenderOp) is checked in reuseRenderOp to prevent nodes at
   different stencil depths from being incorrectly merged.

Result at multiplier=20:
  Average FPS: 18.81 → 27.40  (+46%)
  Median FPS:  10    → 21     (+110%)
  No-clipping also improved: 33.83 → 38.85 (+15%) from restored batching
… clip

Three bugs found and fixed in the rounded corner clipping implementation:

Bug 1 — Redundant stencil pass on every inherited-clip descendant
  Stage.addQuads() triggered beginRoundedClip whenever clippingRect.valid
  && clippingRect.clipRadius > 0, but children inherit their parent's
  clippingRect (including clipRadius). This caused every descendant of a
  rounded-clip node to independently re-draw the stencil mask at an
  ever-incrementing stencilDepth — O(depth) GPU stencil quads per child,
  and stencilDepth growing unboundedly on deep trees.

  Fix: check node.props.clipping && node.props.clipRadius > 0 instead of
  the inherited clippingRect. Only the node that declares the clip region
  arms the stencil; descendants pass through unchanged.

Bug 2 — intersectRect zero-area leaves clippingRect.valid = true
  When a child with clipping:true had a clip rect that did not overlap the
  parent's clip rect, intersectRect wrote {0,0,0,0} into clippingRect but
  never set valid = false. The zero-area valid clippingRect cascaded to
  all grandchildren: beginRoundedClip was called with w=0/h=0, the stencil
  covered nothing, and every grandchild was invisible even if InViewport.

  Fix: after intersectRect, if w <= 0 || h <= 0 set valid = false and
  clipRadius = 0 so the node and its descendants are correctly unclipped.

Bug 3 — createRenderBounds skips preload zone (textures load too late)
  createRenderBounds() returned early when a clipping parent's renderBound
  was outside strictBound (viewport), even if the parent was inside
  preloadBound (the boundsMargin zone). This prevented the parent from
  narrowing strictBound for its children, so children used the full-stage
  preloadBound rather than the parent-clipped preload area. Any child
  outside the stage's own preload margin but inside the parent's was never
  preloaded — textures only started loading once the parent entered the
  actual viewport, causing visible pop-in.

  Fix: guard against preloadBound instead of strictBound so the parent
  narrows strictBound for its children whenever it is in the preload zone,
  not only when it is fully in the viewport.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant