Skip to content

feat(centroids): add return_area + SpatialData persist to get_centroids#1150

Open
timtreis wants to merge 7 commits into
chore/python-3.12-3.14from
feature/get-centroids-area-persist
Open

feat(centroids): add return_area + SpatialData persist to get_centroids#1150
timtreis wants to merge 7 commits into
chore/python-3.12-3.14from
feature/get-centroids-area-persist

Conversation

@timtreis

@timtreis timtreis commented Jun 19, 2026

Copy link
Copy Markdown
Member

Motivation

get_centroids already computes per-label pixel counts (np.bincount) when finding label centroids, then discards them, we could store as area. This PR surfaces that area into an element's annotating table, squidpy-style (obsm["spatial"] / obs["area"]), so downstream tools (e.g. squidpy) and repeated renders can reuse them instead of recomputing.

stacked on #1151

timtreis and others added 2 commits June 21, 2026 14:34
Surface the per-instance area that the labels bincount already computes
(and geometric area for shapes), and add a SpatialData dispatch that
writes centroids/area into the element's annotating table, squidpy-style.

- `return_area` on all element overloads: labels return the pixel/voxel
  count (free, already computed); shapes return `pi*r**2` for circles and
  `geometry.area` for polygons; points raise. Carried as a feature column
  on the returned Points element.
- Harmonise the three element overload signatures (they previously
  disagreed on `return_background`).
- New `get_centroids(sdata, element_name, ..., persist_as="adata")`
  dispatch: resolves the element's annotating table and writes
  `obsm["spatial"]` (x, y[, z]) and `obs["area"]` at the element's rows,
  in place. Raises (pointing to `persist_as="Points"`) when no table
  annotates the element. No standalone AnnData is ever created.
- Element-level `persist_as="adata"` raises, directing to the SpatialData
  form (which can resolve the table).
- Fast path: the table-writing path applies the coordinate transform to
  the tiny per-instance centroid array in-memory and short-circuits when
  it is an identity, avoiding the dask `transform()` round-trip.
  `coordinate_system=None` returns intrinsic coordinates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis timtreis force-pushed the feature/get-centroids-area-persist branch from 13bde10 to a784941 Compare June 21, 2026 12:35
@timtreis timtreis changed the base branch from main to chore/python-3.12-3.14 June 21, 2026 12:35
@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.24590% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.29%. Comparing base (f84be26) to head (a784941).

Files with missing lines Patch % Lines
src/spatialdata/_core/centroids.py 85.24% 18 Missing ⚠️
Additional details and impacted files
@@                    Coverage Diff                     @@
##           chore/python-3.12-3.14    #1150      +/-   ##
==========================================================
- Coverage                   92.43%   92.29%   -0.14%     
==========================================================
  Files                          51       51              
  Lines                        7814     7906      +92     
==========================================================
+ Hits                         7223     7297      +74     
- Misses                        591      609      +18     
Files with missing lines Coverage Δ
src/spatialdata/_core/centroids.py 89.15% <85.24%> (-10.85%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…turn type

- narrow transformation to BaseTransformation before to_affine_matrix
- assert coordinate_system is not None after _validate_persist_args in the
  element overloads (None is rejected there for persist_as != 'adata')
- cast element-dispatch get_centroids results to DaskDataFrame at the two
  callers (vectorize, datasets); only the SpatialData overload returns SpatialData
@timtreis timtreis marked this pull request as ready for review June 21, 2026 12:41
timtreis added 4 commits June 21, 2026 15:16
…trim docs

Correctness:
- _write_centroids_into_table: align instance ids via a single get_indexer;
  raise on a total miss (instance_key dtype != element index) instead of
  silently writing an all-NaN obsm['spatial']; partial misses (background,
  filtered instances) still NaN-fill as documented.
- refuse to overwrite obsm['spatial'] when an existing array has a different
  width (would wipe coordinates of other regions sharing the table).
- validate coordinate_system on the persist_as='adata' path too.

Cleanup (no behavior change):
- collapse the three near-identical element overloads into one stacked
  singledispatch registration delegating to _intrinsic_centroid_frame.
- trim verbose docstrings/comments.

Tests: instance_key dtype mismatch raises; dimension mismatch refuses to
overwrite; invalid-element message updated.
…ds method

- persist_as='adata' gains inplace: bool = True (mirrors sanitize_table):
  inplace=True mutates the resolved table and returns None; inplace=False
  copies only that AnnData, writes into the copy, returns it (sdata untouched).
  No whole-SpatialData copy.
- add SpatialData.get_centroids(element_name, ...) method delegating to the
  module function (mirrors sdata.aggregate), so both sd.get_centroids(sdata, ...)
  and sdata.get_centroids(...) work. Calls the named _get_centroids_sdata
  overload directly so the full signature type-checks.
- widen return type to DaskDataFrame | AnnData | None; update tests.
…edup scatter

Correctness hardening (review round 2):
- raise a clear error when the element index is non-unique (was a cryptic
  pandas InvalidIndexError from get_indexer on e.g. duplicate-index shapes).
- obsm width-mismatch guard is now 1-D-safe (compare full .shape instead of
  indexing .shape[1], which raised IndexError on a 1-D obsm['spatial']).
- reword the total-miss error (it's 'no shared instances', not only a dtype bug).

Cleanup:
- reuse the existing _affine_matrix_multiplication helper in
  _transform_centroid_coords (drops the inline matmul; gives the dead helper a caller).
- collapse the duplicated coords/area NaN-fill into a local _scatter().
- _points_from_centroids: df.assign instead of copy-then-mutate.
- name the squidpy storage keys (_SPATIAL_KEY/_AREA_KEY) instead of magic strings.
…test assertion

- _resolve_annotating_table returns annotators[0] directly (get_element_annotators
  already yields set[str]; the str() wrap was noise).
- extract _assert_obsm_matches_points test helper to remove two near-identical
  written-obsm vs element-Points comparison blocks.
@LucaMarconato LucaMarconato self-requested a review June 21, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant