Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
23cc82b
Clear superseded dataset-driven-fit-modes review history
AndrewSazonov Jun 17, 2026
6446994
Refine dataset-driven-fit-modes ADR from review cycle
AndrewSazonov Jun 17, 2026
59a5fed
Add dataset-driven-fit-modes implementation plan
AndrewSazonov Jun 17, 2026
2cb7d33
Offer fit modes by project applicability
AndrewSazonov Jun 17, 2026
0c22b7b
Validate fit mode against loaded experiments
AndrewSazonov Jun 17, 2026
0b8fcbe
Fit a single experiment in single mode
AndrewSazonov Jun 17, 2026
0c9ad84
Remove single-mode parameter snapshot fallback
AndrewSazonov Jun 17, 2026
5bdcfc0
Add copy_data and clearer sequential data resolution
AndrewSazonov Jun 17, 2026
99311af
Confirm category visibility already matches fit modes
AndrewSazonov Jun 17, 2026
133c503
Confirm no tutorial relies on single-with-N
AndrewSazonov Jun 17, 2026
68a60c7
Close issue 85 and accept dataset-driven fit modes ADR
AndrewSazonov Jun 17, 2026
566bc3b
Reach Phase 1 review gate
AndrewSazonov Jun 17, 2026
2588560
Raise ValueError for unresolvable sequential data dir
AndrewSazonov Jun 17, 2026
354e96d
Refresh copy_data archive to current matched files
AndrewSazonov Jun 17, 2026
c040a1d
Clarify ADR applicability excludes measured-data
AndrewSazonov Jun 17, 2026
be8c598
Fix plan Phase 1 status and issue 85 link
AndrewSazonov Jun 17, 2026
8e1a714
Reject sequential data_dir overlapping the copy archive
AndrewSazonov Jun 17, 2026
08ef820
Satisfy ruff lint in fit-mode availability code
AndrewSazonov Jun 17, 2026
08fa430
Apply pixi run fix auto-fixes
AndrewSazonov Jun 17, 2026
6595be2
Remove completed adp-beta-tensor and beba-rename plans
AndrewSazonov Jun 17, 2026
c1a57d5
Order mode-applicability check after request validation
AndrewSazonov Jun 17, 2026
b189be7
Update and add unit tests for dataset-driven fit modes
AndrewSazonov Jun 17, 2026
28767a0
Update category-support integration test for fit-mode availability
AndrewSazonov Jun 17, 2026
6865178
Update sequential and joint-fit integration tests for new fit-mode rules
AndrewSazonov Jun 17, 2026
89e1461
Mark Phase 2 verification complete in plan
AndrewSazonov Jun 17, 2026
6f2b8c6
Add ADR for Notebook-Owned Verification Regression Gating
AndrewSazonov Jun 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Status

Proposed.
Accepted.

## Date

Expand All @@ -16,9 +16,9 @@ Analysis and fitting.

The analysis layer offers three fit modes through the `fitting_mode`
switchable category established by
[`fit-mode-categories`](accepted/fit-mode-categories.md): `single`,
`joint`, and `sequential`. Two problems make the current surface
confusing and partly incorrect.
[`fit-mode-categories`](fit-mode-categories.md): `single`, `joint`, and
`sequential`. Two problems make the current surface confusing and partly
incorrect.

**`single` is overloaded.** It is the friendly name for the one-dataset
case, but it _also_ silently loops over multiple loaded experiments,
Expand Down Expand Up @@ -53,7 +53,7 @@ one-dataset case — which removes the buggy multi-loop and closes issue
85 — (3) keeps `sequential` as the folder sweep it already is, and (4)
tidies the `sequential` data-source configuration (sensible defaults, an
optional copy-into-project flag, and room for a future remote source).
It extends [`fit-mode-categories`](accepted/fit-mode-categories.md).
It extends [`fit-mode-categories`](fit-mode-categories.md).

An earlier draft of this ADR proposed redefining `sequential` to fit the
loaded datasets in turn; that direction was dropped (see Alternatives
Expand All @@ -64,39 +64,62 @@ behaviour and solving issue 85 by restricting `single`.

### 1. Mode availability is precondition-based, not a static list

Each fit mode declares a **precondition predicate** — "can I run on this
project right now?" — and `fitting_mode.show_supported()` lists exactly
the modes whose preconditions the current project satisfies. This is
wired through the existing switchable-category selector
Two distinct concepts are separated explicitly so that offering a mode
and being able to run it do not collapse into one rule:

- **Applicability** — "could this mode apply to the project as loaded?"
This drives `fitting_mode.show_supported()`.
- **Readiness** — "is this mode fully configured to run right now?" This
is checked only at `fit()` time and produces the Decision 6 errors.

`fitting_mode.show_supported()` lists exactly the modes whose
**applicability** predicate the current project satisfies. This is wired
through the existing switchable-category selector
(`FittingMode._supported_types(filters)`, which today ignores its
`filters`); it now consumes project state. No new owner-level setter is
added — the category-owned-selector contract from
[`switchable-category-owned-selectors`](accepted/switchable-category-owned-selectors.md)
[`switchable-category-owned-selectors`](switchable-category-owned-selectors.md)
is preserved.

Preconditions:

- `single` → exactly one experiment with measured data.
- `joint` → two or more experiments with measured data.
- `sequential` → exactly one experiment with measured data (the
template) plus a resolvable data source (checked fully at fit time;
see Decision 4).

The availability table is a **consequence** of these predicates, not a
hard-coded rule:
**Applicability** predicates (drive `show_supported()`) are by **total
loaded-experiment count**:

- `single` → exactly one loaded experiment.
- `joint` → two or more loaded experiments.
- `sequential` → exactly one loaded experiment (the template). It
deliberately does **not** require a configured data source, so
`sequential` is offered as soon as one dataset is loaded — preserving
the intended workflow of switching to it and _then_ pointing it at a
folder.

**Readiness** (checked at `fit()` time, see Decisions 4 and 6) covers
everything beyond the count: each scheduled experiment must have
measured data (enforced by the existing `Fitter._require_measured_data`
guard — a calculated-only experiment yields a clear fit-time error, not
a hidden mode), and `sequential` additionally needs a resolvable
`data_dir` that matches at least one file. An unconfigured or empty
source is a clear fit-time error, **not** a reason to hide the mode.
Counting _loaded_ (not _measured_) experiments for applicability keeps
`show_supported()` and `fit()` consistent for mixed measured/calculated
projects without any "schedule only the measured subset" filtering.

The availability table is a **consequence** of the applicability
predicates, not a hard-coded rule:

| Experiments loaded | Available modes |
| ------------------ | ---------------------- |
| 0 | — (nothing fittable) |
| 1 | `single`, `sequential` |
| ≥ 2 | `joint` |

Predicate-based detection is preferred over a central `if count >= 2`
switch because it is **honest** (it can also reflect, e.g., an
experiment with no measured data, not just a count), **extensible** (a
future remote data source simply becomes another way `sequential`'s
"resolvable data source" precondition is met — see Deferred Work), and
keeps the selector contract clean.
Per-mode predicates are preferred over a central `if count >= 2` switch
because they keep each mode's rule next to the mode and are
**extensible** (a future remote data source becomes another way
`sequential`'s readiness is satisfied — see Deferred Work) without
touching a shared branch. Measured-data presence is deliberately **not**
an applicability input — it is a fit-time readiness check — so
`show_supported()` never hides a mode because an experiment lacks
measured data.

### 2. Restrict `single` to exactly one loaded experiment

Expand All @@ -118,17 +141,53 @@ sweep itself.
The `sequential_fit` category keeps `data_dir`, `file_pattern`,
`max_workers`, `chunk_size`, and `reverse`, with these refinements:

- **`file_pattern` default derived from the template.** Default the glob
to the loaded template experiment's own data-file extension (load
`.xye` → default `*.xye`); fall back to `*` only when the extension is
unknown. Zero-config for the common case.
- **`file_pattern` keeps its `'*'` default (first step).** Deriving the
glob from the loaded template experiment's data-file extension is
**deferred** (see Deferred Work): the experiment model does not retain
its source data-file path today, so the extension is unavailable
without new source-path metadata. The shipped first-step default is
therefore `'*'` (the explicit fallback); the derived default is a
tracked follow-up.
- **No smart default for `data_dir`.** It stays unset by default; a
silent auto-pickup of files from a guessed folder would be surprising.
An unset `data_dir` is a clear fit-time error (Decision 6).
- **`copy_data` (new boolean, default `False`).** When `False`
(default), matched files are referenced in place; when `True`, the
matched files are copied into the project so it is self-contained.
Default-off avoids surprising large copies for thousand-file series,
while letting users opt into a portable, archived project.
while letting users opt into a portable, archived project. To avoid
shipping a decided field with an undefined contract, the **minimal
first-step behaviour is fully specified here**:
- **Timing.** The copy happens at `fit()` time, during `sequential`
readiness resolution, _before_ the sweep begins — not at config time
(so it always reflects the `data_dir`/`file_pattern` in effect for
that run).
- **Destination.** A fixed project-relative folder
(`<project>/data/sequential/`); the run then reads its inputs from
there. The destination is derived, not separately configurable.
- **Conflict policy.** Idempotent overwrite: a destination file of the
same name is overwritten so the in-project copy always matches the
current source. (Users with very large series leave `copy_data`
off.)
- **Round-trip / portability contract.** Once a copy succeeds, the
persisted `data_dir` is **rewritten to the project-relative copy
destination** (`data/sequential/`). The saved project is therefore
self-contained: on reload — even moved or shared, with the original
external source gone — `sequential` readiness resolves against the
in-project copy. `file_pattern` persists as set (the copied files
keep their names, so it still matches). The copy is **idempotent**:
when the resolved source directory is already the copy destination
(the post-reload case, where `data_dir` already points at
`data/sequential/`), the copy is skipped and the run uses the
archived files. Re-running `fit()` against a fresh external
`data_dir` re-copies and refreshes the archive.
- **What is serialized.** The `sequential_fit` fields — including
`copy_data` and the (possibly rewritten) `data_dir` / `file_pattern`
— are written to CIF as for any category. When `copy_data=False`,
`data_dir` persists exactly as the user set it (reference in place);
when `copy_data=True`, it persists as the in-project destination per
the round-trip contract above. The copied data files themselves are
project artifacts, not CIF content.

### 5. Close issue 85 by removing `single`-with-N

Expand Down Expand Up @@ -235,11 +294,6 @@ safer and more discoverable.

## Open Questions

- **`copy_data` mechanics.** When the copy happens (at config time vs at
fit time), what is stored after a copy (the in-project path vs the
original reference), and the overwrite/dedup policy. The field and its
default-off behaviour are decided here; the copy mechanism may be a
small follow-up.
- **Resume.** Resume is currently "single mode only"
(`_validate_fit_request`). Confirm `single` (one dataset) keeps resume
as today; any per-point resume for `sequential` is out of scope.
Expand All @@ -254,6 +308,14 @@ safer and more discoverable.
satisfy the `sequential` "resolvable data source" precondition,
reusing the same mode and `results.csv` evolution output without
reopening the mode design.
- The `copy_data` copy mechanism, if not implemented in the first step.
- Advanced `copy_data` policies beyond the first-step contract in
Decision 4 (e.g. content-hash dedup, incremental sync, a configurable
destination) — the default-off flag and minimal overwrite contract
ship in the first step.
- **Template-derived `file_pattern` default.** Deferred from Decision 4:
retain the template experiment's source data-file path as new
experiment metadata (with persistence and tests), then default the
glob to its extension (`.xye` → `*.xye`). The first step ships the
`'*'` default.
- Detailed result-file/export layout remains governed by
[`fit-output-files-and-data-exports`](suggestions/fit-output-files-and-data-exports.md).
[`fit-output-files-and-data-exports`](../suggestions/fit-output-files-and-data-exports.md).
2 changes: 0 additions & 2 deletions docs/dev/adrs/accepted/type-neutral-adp-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,5 +83,3 @@ from `beta`. Implications specific to β:
(`U_ij = β_ij/(2π²·a*_i·a*_j)`) during structure-CIF generation, then
β is restored. The round-trip is mathematically exact, so the net
behaviour is β-in/β-out.

Plan: [`adp-beta-tensor.md`](../../plans/adp-beta-tensor.md).
3 changes: 2 additions & 1 deletion docs/dev/adrs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ folders.
| Analysis and fitting | Accepted | Analysis CIF Fit State | Defines the persisted fit-state projection in `analysis/analysis.cif` and `analysis/mcmc.h5`. | [`analysis-cif-fit-state.md`](accepted/analysis-cif-fit-state.md) |
| Analysis and fitting | Accepted | Parameter Correlation Persistence | Persists deterministic and posterior correlation summaries in `_fit_parameter_correlation` | [`parameter-correlation-persistence.md`](accepted/parameter-correlation-persistence.md) |
| Analysis and fitting | Suggestion | Fit Output Files and Data Exports | Narrows remaining archive/export questions after adopting `results.csv` and `mcmc.h5`. | [`fit-output-files-and-data-exports.md`](suggestions/fit-output-files-and-data-exports.md) |
| Analysis and fitting | Suggestion | Dataset-Driven Fit Mode Availability | Offers fit modes by per-mode preconditions, restricts `single` to one dataset (closing issue 85), and keeps `sequential` as the folder sweep. | [`dataset-driven-fit-modes.md`](suggestions/dataset-driven-fit-modes.md) |
| Analysis and fitting | Accepted | Dataset-Driven Fit Mode Availability | Offers fit modes by per-mode preconditions, restricts `single` to one dataset (closing issue 85), and keeps `sequential` as the folder sweep. | [`dataset-driven-fit-modes.md`](accepted/dataset-driven-fit-modes.md) |
| Analysis and fitting | Accepted | Minimizer Category Consolidation | Collapses the seven Bayesian categories into one owner-level switchable `minimizer` category with HDF5 sidecar. | [`minimizer-category-consolidation.md`](accepted/minimizer-category-consolidation.md) |
| Analysis and fitting | Accepted | Minimizer Input/Output Split | Keeps `analysis.minimizer` input-only and moves scalar fit outputs to paired `analysis.fit_result` classes. | [`minimizer-input-output-split.md`](accepted/minimizer-input-output-split.md) |
| Analysis and fitting | Superseded | Parameter-Level Posterior Projection | Superseded by minimizer-category consolidation; kept as historical context for `parameter.posterior`. | [`parameter-posterior-summary.md`](suggestions/parameter-posterior-summary.md) |
Expand Down Expand Up @@ -55,6 +55,7 @@ folders.
| Quality | Accepted | Lint Rule Scope and Test-File Exceptions | Records the standing tests/\*\* PLR/N812 ignores and CIF-aligned `id`/`type` builtin exception from the lint audit. | [`lint-rule-exceptions.md`](accepted/lint-rule-exceptions.md) |
| Quality | Accepted | Test Strategy | Defines layered unit, functional, integration, script, and notebook testing. | [`test-strategy.md`](accepted/test-strategy.md) |
| Quality | Accepted | Test Suite and Validation Strategy | Strict test layers, cost tiers, coverage/codecov policy, cross-engine verification docs, and a nightly validation harness. | [`test-suite-and-validation.md`](accepted/test-suite-and-validation.md) |
| Quality | Suggestion | Notebook-Owned Verification Regression Gating | Replaces the external `ci_skip.txt` list with a single in-notebook `regression=False` flag (plus a cell tag for pre-flag crashes). | [`verification-regression-flag.md`](suggestions/verification-regression-flag.md) |
| Structure model | Accepted | Type-Neutral ADP Parameters | Keeps ADP parameter object identities stable across B/U and iso/ani switches. | [`type-neutral-adp-parameters.md`](accepted/type-neutral-adp-parameters.md) |
| Structure model | Accepted | Automatic Wyckoff Position Detection | Detects Wyckoff letter, multiplicity, and site symmetry from space group and coordinates; calculators consume them. | [`wyckoff-letter-detection.md`](accepted/wyckoff-letter-detection.md) |
| Structure model | Accepted | Complete Space-Group Reference Database | One-time build of a complete space_groups.json.gz (all 230 groups) from cctbx, verified against multiple sources. | [`space-group-database.md`](accepted/space-group-database.md) |
Expand Down
80 changes: 0 additions & 80 deletions docs/dev/adrs/suggestions/dataset-driven-fit-modes_reply-1.md

This file was deleted.

Loading
Loading