[CI] Add Java Format CI check for pull requests#2527
Conversation
Add a workflow that enforces dev/CodeStyle_eclipse.xml on the Java files changed by a pull request. It applies the Eclipse formatter (via formatter-maven-plugin) to only the changed src/main/java and src/test/java files and fails if that produces any diff, printing the required changes. Also add dev/format-changed.sh, a helper that runs the same scoped formatting locally on the files you changed relative to the base branch (plus staged/unstaged/untracked edits). This avoids a bare `mvn formatter:format`, which would reformat the entire tree. The check is scoped to the PR diff rather than the whole tree because the existing sources are not yet fully formatter-clean; this lets the codebase converge file-by-file without failing unrelated PRs.
Not for merge. Three scenarios to observe the check: 1. DMLRuntimeException.java: pre-existing file that is not formatter-clean, with a correctly-formatted method appended. 2. FormatProbeBroken.java: correctly-formatted file with one method that has blatant style violations. 3. FormatProbeClean.java: fully correctly-formatted new file.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2527 +/- ##
=========================================
Coverage 71.66% 71.66%
- Complexity 49338 49343 +5
=========================================
Files 1580 1580
Lines 190516 190517 +1
Branches 37364 37364
=========================================
+ Hits 136525 136536 +11
+ Misses 43464 43456 -8
+ Partials 10527 10525 -2 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
I like this approach, but I'm wondering if it is too strict. It would imply that as a contributor, any slightly modified file must be completely reformatted by the PR. Second, different IDEs may apply slightly different formatting under the same xml so the only reliable way to reformat is using the exact same formatter. |
Ya, the current solution is a bit too strict, i wanted to do something where we only fail if the there is a diff in conflict with the changes proposed, and then the CI job here would give a diff back that only edits those lines, not anything else in the file. I will try some more to make it streamlined. About different IDEs, I think it should just be consistent and one source of truth, this seems Reliable with maven and this proposed solution. |
The Eclipse formatter only operates on whole files, so checking changed files reformatted their pre-existing (not-yet-clean) lines and produced an over-aggressive diff on lines the PR never touched. Add dev/format_changed.py, which formats each changed file but keeps only the formatting changes that fall on the lines the PR actually edited (via git diff line ranges intersected with difflib opcodes). It supports: --check: fail only if an edited line is mis-formatted (used by CI) --fix: apply formatting to only the edited-line regions Wire javaFormat.yml to run the --check mode and make dev/format-changed.sh delegate to --fix, so both CI and the local helper share identical line-scoped behavior.
Add varied scenarios to verify the Java Format check flags only PR-edited lines: - FormatProbeErrors: new file with four distinct code-style violations - FormatProbeOk2: new fully-compliant file (must pass) - FormatProbeClean: append one clean and one broken method (only the broken one must be flagged) - DMLRuntimeException: break only the previously-clean describe line so its pre-existing unformatted lines must remain unflagged
Introduce dev/format-exclude.txt, a list of glob patterns for files whose non-standard layout is intentional and must not be reformatted (e.g. DMLConfig.java's hand-aligned constants). dev/format_changed.py loads the list and skips matching files in both --check and --fix, matching each pattern against the repo-relative path and the bare file name.
Address review findings on the line-scoped format check: - Always restore working-tree files via try/finally so an mvn failure or interrupt never leaves whole-file reformatting on disk (previously the restore only ran on the happy path, risking loss of uncommitted/untracked content in --check and --fix). - Fail loudly on git errors instead of swallowing stderr and reporting a false "clean" pass; validate the resolved base ref up front. - Reject unknown flags and extra positional args instead of silently treating them as the base ref. - Extract pure helpers (parse_hunks, line_scoped_result, overlaps) and add dev/tests/test_format_changed.py; run it in CI via a pytest step. Batch the per-file git diff into a single call. - Byte-exact file IO (newline="") so restores never rewrite line endings; use context-managed reads/writes; dedupe the src-root prefix; emit a friendly message when mvn is unavailable. - Add permissions: contents: read and expand the workflow trigger paths to cover the checker and its tests.
Consolidate the two Java style gates into one workflow file with two jobs so each still reports its own status: - java_checkstyle: whole-tree checkstyle, on push and pull_request - java_format: line-scoped Eclipse-formatter check plus its pytest, gated to pull_request since it needs the PR base commit Remove the standalone javaFormat.yml.
|
@janniklinde want to take a look? now it is targeting the exact diff you have as subparts of the files you edited. |
The Apache RAT license check flagged dev/format-exclude.txt for a missing license header. Add the standard ASF header as comment lines; the exclude parser already ignores '#' lines, so behavior is unchanged.
Add a workflow that enforces dev/CodeStyle_eclipse.xml on the Java files changed by a pull request. It applies the Eclipse formatter (via formatter-maven-plugin) to only the changed src/main/java and src/test/java files and fails if that produces any diff, printing the required changes.
Also add dev/format-changed.sh, a helper that runs the same scoped formatting locally on the files you changed relative to the base branch (plus staged/unstaged/untracked edits). This avoids a bare
mvn formatter:format, which would reformat the entire tree.The check is scoped to the PR diff rather than the whole tree because the existing sources are not yet fully formatter-clean; this lets the codebase converge file-by-file without failing unrelated PRs.