Skip to content

Forward and reverse Enzyme tests and rules for linalg#449

Open
kshyatt wants to merge 18 commits into
mainfrom
ksh/enz_linalg
Open

Forward and reverse Enzyme tests and rules for linalg#449
kshyatt wants to merge 18 commits into
mainfrom
ksh/enz_linalg

Conversation

@kshyatt

@kshyatt kshyatt commented Jun 10, 2026

Copy link
Copy Markdown
Member

Trying to make these a little more manageable and pick up the fwd rules where possible

@kshyatt kshyatt requested review from Jutho and lkdvos June 10, 2026 13:30
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 53.97490% with 110 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
ext/TensorKitEnzymeExt/linalg.jl 64.17% 48 Missing ⚠️
ext/TensorKitEnzymeExt/utility.jl 15.00% 17 Missing ⚠️
ext/TensorKitMooncakeExt/tensoroperations.jl 0.00% 11 Missing ⚠️
ext/TensorKitEnzymeTestUtilsExt.jl 74.35% 10 Missing ⚠️
src/auxiliary/ad.jl 25.00% 9 Missing ⚠️
ext/TensorKitMooncakeExt/linalg.jl 0.00% 7 Missing ⚠️
ext/TensorKitMooncakeExt/indexmanipulations.jl 0.00% 4 Missing ⚠️
ext/TensorKitMooncakeExt/vectorinterface.jl 0.00% 4 Missing ⚠️
Files with missing lines Coverage Δ
ext/TensorKitChainRulesCoreExt/tensoroperations.jl 88.99% <100.00%> (+0.91%) ⬆️
ext/TensorKitChainRulesCoreExt/utility.jl 100.00% <ø> (+20.00%) ⬆️
ext/TensorKitEnzymeExt/TensorKitEnzymeExt.jl 100.00% <100.00%> (ø)
ext/TensorKitMooncakeExt/utility.jl 0.00% <ø> (-71.43%) ⬇️
src/TensorKit.jl 13.79% <ø> (ø)
ext/TensorKitMooncakeExt/indexmanipulations.jl 0.00% <0.00%> (-96.12%) ⬇️
ext/TensorKitMooncakeExt/vectorinterface.jl 0.00% <0.00%> (-100.00%) ⬇️
ext/TensorKitMooncakeExt/linalg.jl 0.00% <0.00%> (-99.11%) ⬇️
src/auxiliary/ad.jl 25.00% <25.00%> (ø)
ext/TensorKitEnzymeTestUtilsExt.jl 74.35% <74.35%> (ø)
... and 3 more

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread ext/TensorKitEnzymeExt/linalg.jl
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Your PR no longer requires formatting changes. Thank you for your contribution!

@kshyatt kshyatt marked this pull request as draft June 11, 2026 07:18
@kshyatt kshyatt marked this pull request as ready for review June 11, 2026 09:26
@kshyatt

kshyatt commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

The test on 1.12 is passing locally for me! I assume it's getting OOMed or something...

@kshyatt

kshyatt commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

OK, everything looks happy now except the GPU stuff which is unrelated. Are we good to go?

Comment thread ext/TensorKitEnzymeExt/utility.jl Outdated
Comment thread ext/TensorKitEnzymeExt/utility.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
@lkdvos

lkdvos commented Jun 16, 2026

Copy link
Copy Markdown
Member

Do we think the test failure is a problem with how LRU interacts with Enzyme?

From the stacktrace, I seem to read this as not finding a key, even though being in an if-clause that explicitly checks this: https://github.com/JuliaCollections/LRUCache.jl/blob/1dad9fef75fef51ea1b7e984e5850ad4e374a7e0/src/LRUCache.jl#L172-L175

The really confusing part to me is that it seems to originate from a forward call, which should just be a regular function call, so I'm not sure what is really going on there. I also don't think this can really be a race condition, since 1) I don't think we are multithreading, 2) LRU protects against this?

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

It also seems to only happen in the CompatCheck tests, not the main ones

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Let me just see if bumping the LRUCache compat helps at all...

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

OK that makes CompatCheck pass and the min test fail. It seems the failures only happen on 1.10 regardless, but they are intermittent. Really annoying.

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Also locally I can see it happen in reverse calls so I think it's not to do with fwd mode really

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Removing the @cached in front of the definition of degeneracystructure does seem to fix this. Maybe there's a way to fill the cache before the Enzyme tests run? I'll take a look.

@kshyatt

kshyatt commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

OK so after more digging, it looks like the problem here is 1.10 + Enzyme + the @cached macro. I'm ok with disabling this set of tests on 1.10 for now while I try to work with the Enzyme people to figure out what's going on. Does that sound reasonable?

Comment thread ext/TensorKitEnzymeExt/linalg.jl
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
ΔAr = NoRData()

Δαr = isnothing(Ap) ? NoRData() : project_scalar(α, inner(Ap, ΔC))
Δαr = isnothing(Ap) ? NoRData() : TO.project_scalar(α, inner(Ap, ΔC))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lost track of where our new project_ methods live. Is TO a logical place for project_scalar? We probably also have it in VI? Or is TO just importing the VI version? Where is project_mul!, is that in TO? Or TK?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lukas suggested pointing to TO. We have duplicated versions of these in both Mooncake and Enzyme extensions over at VI 🫠

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

project_mul lives here at TK

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jutho did you ever read those books "Where's Waldo?"? We need that for ad helpers

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I originally just wrote these as small helper functions, but then it turned out we kind of need them everywhere, and then they just kind of proliferated

@Jutho Jutho Jun 16, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you read "Where's Waldo"? Maybe we can just create one super package VIMAKTOTK.jl, and this solves all of our problems. And then we can keep up with the QMC guys in terms of acronyms 😄 .

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All depends on your definition of "reading" 😉

Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
Comment thread ext/TensorKitEnzymeExt/linalg.jl Outdated
@kshyatt

kshyatt commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Some more strangeness: the missing key error only ever occurs with Vtr, I cannot trigger it with VRepU₁ or any of the other AD spaces. This makes me wonder if the @cached stuff is kind of a red herring -- I'll dig into this more today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants