perf: add fast path for simple recursive globs#2884
Draft
Napolitain wants to merge 5 commits into
Draft
Conversation
Add Issue go-task#2853 benchmarks comparing checksum, timestamp, and uncached tasks across many-small and few-large sparse YAML source sets. Baseline on Intel i7-14700K, go test -run '^$' -bench 'BenchmarkIssue2853.*SparseYAMLFiles' -benchtime=3x -count=3 ./ Many small sparse YAML files (20,000 x 5 bytes): checksum 440-451 ms/op, timestamp 140-148 ms/op, none 1.1-1.3 ms/op. Few large sparse YAML files (4 x 128 MiB): checksum 60-61 ms/op, timestamp 213-239 us/op, none 1.1-1.3 ms/op. Sparse files avoid bulk data writes while preserving logical file size for checksum/timestamp comparisons.
Add an OS-native mtime reference point for the Issue go-task#2853 filesystem benchmarks. The reference walks the same sparse YAML source tree with filepath.WalkDir, stats YAML files through DirEntry.Info, and compares mtimes against a generated output file. The benchmark is available under the fsbench build tag alongside the Task checksum, timestamp, and uncached cases.
Rename the fsbench benchmark entry points from Issue-2853-specific names to BenchmarkManySmallFiles and BenchmarkFewLargeFiles. The benchmark output is now easier to scan while the PR and commit history still carry the issue context. Helper and constant names were updated to match; benchmark behavior is unchanged.
Recognize simple literal-root recursive glob patterns such as path/to/folder/**/*.yaml and expand them with filepath.WalkDir plus filepath.Match. Patterns with complex shell features continue to use the existing mvdan shell expander, and symlinked directories explicitly fall back so existing recursive symlink behavior is preserved. Many-small benchmark before this branch was about 420-450 ms/op checksum and 137 ms/op timestamp for 20,000 tiny files. After this change, repeated local runs measured about 386-391 ms/op checksum and 88-89 ms/op timestamp, with timestamp allocs dropping to about 311k/op. Verification: go test ./...; go test -tags fsbench -run '^$' -bench 'BenchmarkManySmallFiles/(checksum|timestamp)$' -benchtime=5x -count=3 -benchmem ./
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a stacked optimization PR on top of the filesystem benchmark work in #2881. Please review or merge #2881 first; this branch intentionally depends on those benchmarks so the performance impact can be evaluated with the same many-small and few-large fixtures.
This is independent from #2883.
This PR optimizes expansion for simple recursive source globs like
path/to/folder/**/*.yaml.The fast path only handles narrow literal-root recursive globs. More complex shell-style patterns continue to use the existing expander, and symlinked directories fall back to the existing path to preserve current behavior.