Skip to content

Ship Material facet hierarchy tree to isamples.org (#281/#282) + PR4b + #285#289

Merged
rdhyee merged 6 commits into
isamplesorg:mainfrom
rdhyee:promote/facet-tree-prod
Jun 17, 2026
Merged

Ship Material facet hierarchy tree to isamples.org (#281/#282) + PR4b + #285#289
rdhyee merged 6 commits into
isamplesorg:mainfrom
rdhyee:promote/facet-tree-prod

Conversation

@rdhyee

@rdhyee rdhyee commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Ship the Material facet hierarchy to isamples.org (#281/#282)

Promotes the accumulated, reviewed explorer work to production. The headline change: the Material facet is now an expandable concept tree (default on), so users can explore by broad category and drill down — Eric's #281/#282 ask.

This PR carries (all previously Codex-reviewed + merged to the fork, verified live):

Data

The 2 hierarchy parquets are already published to R2 (data.isamples.org, additive — they don't touch the existing flat experience). The flat path is fully retained (and is the fallback if the tree data is ever unreachable).

Verified

  • Live on rdhyee.github.io/isamplesorg.github.io (same R2 data prod uses): 18-node tree, selecting "Natural Solid Material" → 4,091,133 (its subtree).
  • 5 facet-tree specs + URL round-trip + flat-fallback + flag-off smoke green; quarto render clean.
  • Codex: facet-hierarchy data (2 rounds), tree UI polish (3 rounds), ship-flip — all LGTM.

Known caveats (accepted; fast-follow, in succession)

  • Material legend counts are static (global totals) while filtering — the map/table filtering is fully live + correct; live per-viewport tree counts are the next increment.
  • Material only — Sampled Feature / Specimen Type stay flat (same machinery; fast-follow).
  • A few [data]-tagged characterization/facet-viewport tests assume flat material and need updating for tree-default (they're workflow_dispatch, NOT on the deploy smoke gate).

🤖 Generated with Claude Code

rdhyee and others added 6 commits June 17, 2026 16:27
…Camera() (#14)

Extract the shared settled-camera reconciliation both globe listeners run
once the camera comes to rest — the cluster "Samples in View" stat refresh
+ the URL-hash write (writeGlobeHash) — into reconcileSettledCamera(v), and
call it from BOTH camera.changed and moveEnd (isamplesorg#208 smell 1b).

This gives moveEnd the same cluster-stat refresh camera.changed already did,
closing the sub-10%-pan gap: a small cluster-mode drag fired moveEnd (which
updated the URL via isamplesorg#204) but NOT camera.changed (debounced away by
percentageChanged=0.1), leaving the "Samples in View" count stale.

Scope is deliberately minimal (Codex Q3 / REFACTOR_PR4_PLAN.md §3): NOT the
full handler merge. Mode-transition + resolution-reload stays camera.changed-
only; facet/heatmap/point-exit stays moveEnd-only; isamplesorg#262 stays a separate
tracked sibling. reconcileSettledCamera is a local fn (closes over
getMode/currentRes/countInViewport), not top-level like writeGlobeHash.

Behavior:
- camera.changed: behavior-neutral (same order — cluster-stat then hash).
- moveEnd: adds the cluster-stat refresh (point mode skips it via the
  getMode()==='cluster' guard, so point mode is unchanged). The stat read is
  synchronous, guarded by _clusterData, and writes no mode/selection/URL/
  facet/heatmap state.

Verification: smoke 4 + characterization 7 + url-roundtrip 5 all green
(behavior-neutral URL contract from both handlers preserved); render clean;
Codex review of the diff found no blocking issues. A dedicated headless
regression for the moveEnd cluster-stat refresh proved unreliable (OJS cell
re-evaluation yields multiple viewer instances in the harness; the one
reachable at interaction time often has no camera listeners) — documented
inline in url-roundtrip.spec.js rather than shipped flaky.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ty with row-click) (#15)

A `#pid=` deep-link (cold boot OR back/forward hashchange) populated only the
sidebar card (updateSampleCard → #clusterSection) but never opened the floating
in-map detail card (#inMapCard) a table row-click shows. Same pid, two paths,
two UI states (isamplesorg#239-family divergence).

Fix: shared openInMapCardForSample(meta, isStale) helper in the zoomWatcher
cell, mirroring activateRow's tail — showInMapCard at canvas centre + the
identical rich wide-table detail query (material/specimen/thumbnail) →
updateSampleDetail + populateInMapCardDetail, with isStale guards. Wired into
both the boot pid path and the hashchange pid path. The boot path's old
description-only sidebar query is superseded by the helper's richer query.
activateRow is in a different OJS cell and is intentionally left untouched.

No camera flyTo here (deliberate, documented): both callers already frame the
camera to the URL coords (boot setView / hashchange flyTo), and a #pid= link
settles on the sample view (activateRow's pushState is replaced by the
post-flight moveEnd replaceState). Card anchors at canvas centre — the
isamplesorg#226-correct anchor that dodges the lazy-load race.

Codex review (2 rounds) → no blocking findings. Addressed:
- boot path re-checks isStale() after the helper (don't continue stale boot
  hydration into mode/heatmap);
- hideInMapCard() on the hashchange h3 / no-selection / pid-not-found branches
  so navigating away from a pid doesn't strand the floating card.

Tests: new characterization (d3) asserts a pid deep-link opens #inMapCard with
the exact known material AND that a hashchange to a bare view re-hides it.
Full smoke(4)+url-roundtrip(5)+characterization(8) = 17 green; render clean.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e (tree + membership + counts) (#16)

* plan: facet hierarchy (isamplesorg#281/isamplesorg#282/isamplesorg#276) — design + proven PoC

Implementation plan for the tree facet display, grounded in the actual 202608
data + pipeline (not the "ancestry is free" folklore) and Codex-reviewed.

Key grounding findings:
- The facet UI is entirely flat; sample_facets_v3 stores a single "first
  non-root" URI per dim; the wide arrays are a SET of asserted concepts (NOT a
  clean ancestry path) — full ancestry must be COMPUTED from SKOS broader.
- The canonical tree is derivable from the SKOS TTLs build_vocab_labels.py
  already fetches (but drops broader from its output). Trees are small/shallow
  (≤21 core concepts, depth 3).

scripts/poc_facet_hierarchy.py proves Half (a) on live 202608 data (material):
membership 15.08M rows over 5.83M located samples; parent>=child PASS;
root==located-with-material PASS; non-additive confirmed.

Codex corrections folded in: distinct-pid-UNION counting (not additive),
located-universe (samp_geo) membership, data-form URI normalization (TTLs are
un-versioned, data is /1.0/), DAG/multi-parent handling, extract a selected-facet
state model + sql-builders.js helpers, closure-table option, material-first.

Two halves: (a) data/pipeline is independent of the isamplesorg#249 refactor and can start
now; (b) tree UI rides on the PR4a/PR4b/isamplesorg#285 merges. No production code changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* isamplesorg#281/isamplesorg#282 Half(a): facet hierarchy pipeline (tree + membership + counts)

Backend half of the facet-hierarchy feature — no explorer.qmd/UI changes (Half b
rides on the isamplesorg#249 refactor). Derives the SKOS concept tree, per-sample membership
over the ancestry, and hierarchical counts, all validated by algebra.

build_vocab_labels.py
  - Emit `broader` (canonical primary parent) + `broader_count` per concept, in
    BOTH vocab-form and data-form (/1.0/) rows (reusing _data_form_uris) so
    uri↔broader join within each uri_form. Surface multi-parent (DAG) count as a
    lossy-projection note.

build_frontend_derived.py
  - concept_tree (uri, parent_uri, depth) + concept_closure (recursive) from
    vocab_labels' data-form broader edges.
  - node_dim: assign each concept to the dim whose canonical root it reaches —
    drop ONLY the explicit per-dim root (not every parentless concept), and keep
    exactly one root per dim.
  - sample_facet_membership(pid, facet_type, concept_uri, depth): located universe
    (samp_geo), full wide arrays expanded to ancestors, restricted to each dim's
    tree. Concepts with no path to their dim root are EXCLUDED + reported (flat
    facet_summaries still counts them).
  - facet_tree_summaries(facet_type, concept_uri, parent_uri, depth, count):
    COUNT(DISTINCT pid) per node — distinct-pid UNION, NOT additive.
  - --vocab-labels arg; fail-loud when a hierarchy artifact is requested without
    it; deterministic ORDER BY tie-breakers.

validate_frontend_derived.py
  - Tree gate: parent>=child, every parent resolves, one root per dim, all 3 dims
    present, cross-file algebra (material root == facets_v2 non-root material),
    membership grain unique, symmetric tree==GROUP BY(membership).

Verified on live 202608: 209-node tree, 38.9M membership rows; material root
5,829,436 == facets_v2 non-root material; all validator checks PASS; 53 unit
tests green. PoC (scripts/poc_facet_hierarchy.py) + design (FACET_HIERARCHY_PLAN.md)
included. Codex-reviewed (2 rounds); HIGH root/orphan findings fixed.

Deferred (documented): per-dim DAG paths (we keep one canonical parent), a
SQL-literal helper for path interpolation, and materializing the wide-array
projection once (3× reread) — all follow-ups, not blockers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* isamplesorg#281/isamplesorg#282 Half(a): close Codex r2 residuals (closure cycle guard + validator note)

- concept_closure recursive CTE: cap distance < 64 so a future broader-cycle in
  the vocab can't recurse forever (today's projection is acyclic; this is a guard,
  not a behavior change — output byte-identical, live max depth is 3).
- document the material-root cross-file check's current-data invariant
  (excluded material = 0): if a future vintage adds a material concept absent from
  the SKOS tree, the check correctly fails — revisit then.

Re-verified: build + validator ALL CHECKS PASS; 14 unit tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ree (preview flag) (#17)

* isamplesorg#281/isamplesorg#282 Half(b) increment 1: Material facet tree behind ?facets=tree preview

The user-facing tree, gated behind a preview flag (default OFF → the flat Material
list is byte-identical for everyone; ?facets=tree opts in, ?facets=flat forces off,
localStorage ISAMPLES_FACET_TREE=1 is sticky). Reversible = one switch.

What works (verified headless on the 202608 data):
- Material renders as an expandable tree from facet_tree_summaries: non-selectable
  root group ("All materials"), first two levels unfolded (isamplesorg#281), deeper collapsed
  behind carets, alphabetical within level (isamplesorg#282).
- Material baseline counts come from facet_tree_summaries (not the flat summaries).
- Subtree FILTERING via membership: selecting a parent node filters the table/map to
  its whole subtree by filtering on the parent URI alone (membership encodes every
  ancestor — no client-side descendant expansion). Verified: selecting `earthmaterial`
  → table = 4,091,133 (exactly its subtree count).
- facetFilterSQL is the shared predicate (table + map + point-mode all route through
  it); material → membership subquery when the flag is on, AND-combined with the flat
  context/object_type subquery. context/object_type/source stay flat.
- Flag OFF path is unchanged: facetFilterSQL emits the identical single facets_v3
  subquery; describeCrossFilters reads material as before. Smoke gate green.

Scope / deferred to increment 2 (documented):
- Live viewport- & cross-filtered Material counts: in tree mode Material is excluded
  from the live count engine and shows STATIC tree baseline counts (facets_v3 can't
  answer parent-node counts). Table/map filtering is fully live + correct.
- Tri-state parent display + auto-check descendants; accessibility (role=tree/aria);
  context/object_type trees; latency probe + optional cube; R2 publish of the 3 files.

Tests: tests/playwright/facet-tree.spec.js (flag-off flat, flag-on tree + subtree
filter). Gated on FACET_TREE_LOCAL=1 + the docs/data mirror until the hierarchy files
are on R2 (skipped in CI so it stays green). Render clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* isamplesorg#281/isamplesorg#282 Half(b): compact the Material tree rows (RY feedback)

Override .filter-body label{display:block;padding:2px 0} for tree nodes so
caret+label+count sit on one tight line; smaller carets/indent. Flag-gated
(tree only); flat list spacing unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* isamplesorg#281/isamplesorg#282 Half(b) increment 2: Material tree selection polish (tri-state, inherit, OR)

Polish so the preview is feedback-ready before Eric (UI-only; no data/query/pipeline
changes; flag-gated; flat path unchanged).

- materialTreeActive() — single shared predicate (FACET_TREE && a tree actually
  rendered) used by materialSelection / facetFilterSQL / describeCrossFilters /
  syncMaterialTreeVisual, so a degraded flat fallback behaves FULLY flat (Codex r2/r3).
- materialSelection() — the MINIMAL selection (top-most checked nodes); a checked node
  under a checked ancestor is redundant. Used for filtering + URL so the membership
  filter on a parent covers its subtree with no client-side expansion.
- syncMaterialTreeVisual() — checking a parent inherits descendants (checked+disabled);
  unchecking reverts them; a node with checked descendants but unchecked itself shows
  the indeterminate "–". Multi-peer selection = OR/union (already native to the IN()
  membership filter).
- URL: writeQueryState serializes the minimal nodes; applyQueryToFacetFilters restores
  + re-syncs inherited/indeterminate state.
- Compact tree row spacing (RY feedback).

Verified (202608, headless): parent→child {checked,disabled}, table=4,091,133 subtree;
two peers→parent indeterminate, table=333,253 (OR union); URL carries only the minimal
node and round-trips; ?facets=tree with the tree data 404'd → flat fallback still
filters. 5 facet-tree specs + flag-off smoke green; render clean. Codex: 3-round LGTM.

Deferred (next): live viewport/cross-filtered Material counts (static tree baseline
today); accessibility (role=tree/aria); context/object_type trees.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lt ON (?facets=flat kill-switch)

The hierarchy preview is good enough to ship (RY). Flip FACET_TREE default to true
so all users get the expandable Material tree; ?facets=flat (or localStorage
ISAMPLES_FACET_TREE=0) reverts for a user without a redeploy. context/object_type
stay flat. Verified: default→tree (18 nodes), ?facets=flat→flat, smoke green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ream fix)

fixture-tests (pipeline-tests.yml) has been red on upstream main since 2026-06-11:
conftest.py hard-imports playwright.sync_api, which the data-only pipeline CI job
doesn't install. The fork already fixed this (try/except → skip browser fixtures);
bring it upstream so the pipeline gate is green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rdhyee rdhyee merged commit 89c6982 into isamplesorg:main Jun 17, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant