Add csv_utilise.sh, sample CSVs, and arbitrary-shape tests#1
Merged
Merged
Conversation
The existing csv_loader.sh already accepts any CSV file generically, but nothing in the repo helps a user query, list, or export what they loaded — they had to know psql and the auto-created schema. This change closes that gap and proves the path end-to-end: - build/csv_utilise.sh: list / describe / peek / export / drop subcommands scoped to tables carrying the loader's _csv_row_id + _loaded_at markers, so the te_core_schema tables can never be touched by accident - build/csv/samples/: three off-domain CSVs (customers, orders, inventory) so the loader has something to demo against - Makefile: csv-load, csv-list, csv-demo targets - tests/test_csv_utilise.py: 9 unit tests for arg parsing and engine guard - tests/test_csv_loader_arbitrary_shapes.py: parameterised regression (2x3, 10x50, 1x100) covering CSV -> validator -> Postgres -> query, skipped cleanly when PostgreSQL is unreachable - README.md / ARCHITECTURE.md: 'Load any CSV' subsection and new build/ rows Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01VBrxqChRJtxdvSpFhiUWUy
The Windows CI runner uses `python -m unittest discover` to find test files, but every test module imports pytest at top-level for `pytestmark` markers. Without pytest installed, all tests error at import time with ModuleNotFoundError. `requirements-dev.txt` already pins pytest + pytest-cov + flake8 + bandit; install it before the test step runs. Also: scripts/test.sh and scripts/test.ps1 used to treat a missing pytest as a silent PASS (`SKIP`), which masks real failures during local development. Both now fail with a one-line install hint pointing at requirements-dev.txt — consistent with what CI now does.
On windows-latest GitHub Actions runners, `subprocess.run(["bash", ...])` resolves to C:\Windows\System32\bash.exe — the WSL shim — which fails with "Windows Subsystem for Linux has no installed distributions." Git Bash is preinstalled at C:\Program Files\Git\bin\bash.exe; prefer it explicitly and skip the test class only if no real bash is found. Same helper added to test_csv_loader_arbitrary_shapes.py for consistency (those tests already skip without Postgres, so the impact is preemptive).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The existing csv_loader.sh already accepts any CSV file generically, but nothing in the repo helps a user query, list, or export what they loaded — they had to know psql and the auto-created schema. This change closes that gap and proves the path end-to-end:
Claude-Session: https://claude.ai/code/session_01VBrxqChRJtxdvSpFhiUWUy