Skip to content

HDDS-15651. Test case for DiskBalancer when markContainerForDelete fails#10593

Open
arunsarin85 wants to merge 2 commits into
apache:masterfrom
arunsarin85:HDDS-15651
Open

HDDS-15651. Test case for DiskBalancer when markContainerForDelete fails#10593
arunsarin85 wants to merge 2 commits into
apache:masterfrom
arunsarin85:HDDS-15651

Conversation

@arunsarin85

@arunsarin85 arunsarin85 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Test-only PR for HDDS-15651. Adds two unit tests in TestDiskBalancerTask to document the intended DiskBalancer move/cleanup behavior when markContainerForDelete() fails or when lazy deletion fails.

Please describe your PR in detail:
DiskBalancer treats container move and source cleanup as separate phases. Once import and ContainerSet update succeed, the move is reported as success even if marking the old source replica fails. The old replica is queued in pendingDeletionContainers and removed after replica.deletion.delay.

This PR adds tests to lock in that behavior and document a known gap when lazy deletion fails.

Test 1: moveSucceedsWhenMarkContainerForDeleteFails

  • Simulates markContainerForDelete() failure on the source replica after a successful move.
  • Verifies the move is still reported as success (success metrics updated, no rollback).
  • Verifies ContainerSet points to the destination replica.
  • Verifies the source replica stays on disk temporarily and is queued for lazy deletion.
  • After the delay, verifies the source replica is removed via cleanupPendingDeletionContainers().

Test 2: lazyDeletionFailureDoesNotRetry

  • Runs a successful move and advances the clock past the deletion delay.
  • Mocks KeyValueContainerUtil.removeContainer() to fail during lazy deletion.
  • Verifies the source replica remains on disk, the pending queue entry is dropped, and deletion is not retried on a second cleanup attempt.
  • Documents current behavior when lazy deletion fails (recovery depends on other paths such as DN restart for Ratis).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15651

How was this patch tested?

mvn test -pl hadoop-hdds/container-service -am
-Dtest=TestDiskBalancerTask#moveSucceedsWhenMarkContainerForDeleteFails,TestDiskBalancerTask#lazyDeletionFailureDoesNotRetry
-DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false

@arunsarin85 arunsarin85 marked this pull request as draft June 24, 2026 15:40

@Gargi-jais11 Gargi-jais11 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arunsarin85 for raisin the concern. I have left comments below to discuss on this.

@arunsarin85 arunsarin85 marked this pull request as ready for review June 25, 2026 08:10
@arunsarin85

Copy link
Copy Markdown
Contributor Author

Thanks @Gargi-jais11 for the comments !
As per the design and feature flow explained in the jira https://issues.apache.org/jira/browse/HDDS-15651
I have modified this PR to be a test only [added 2 additional tests.] . Will modify the description accordingly.

@adoroszlai adoroszlai changed the title HDDS-15651. Roll back DiskBalancer move when markContainerForDelete fails HDDS-15651. Test case for DiskBalancer when markContainerForDelete fails Jun 25, 2026
@adoroszlai adoroszlai requested a review from Gargi-jais11 June 25, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants