Skip to content

Add decommission command for repos transferred out of the org#58

Open
dev-milos wants to merge 6 commits into
mainfrom
decommission-transferred-out-repos
Open

Add decommission command for repos transferred out of the org#58
dev-milos wants to merge 6 commits into
mainfrom
decommission-transferred-out-repos

Conversation

@dev-milos

Copy link
Copy Markdown
Collaborator

What

Adds a decommission subcommand to the importer CLI that produces the exact terraform state rm commands needed to stop managing a repository without destroying it, plus a runbook (docs/decommissioning.md).

Refs #15.

Why

When a repository is transferred out of the org but its repos/<repo>.yaml stays in GCSS, Terraform still tracks it in state. On the next run the provider can no longer read the repo (it belongs to another org), so plan/apply fails — and because the workspace state is shared, one orphaned repo blocks PRs for every repository.

Deleting the YAML on its own does not fix this: the repo is then in state but not in config, so Terraform plans to destroy it — either erroring against a repo the org no longer owns, or acting on a repo that now belongs to someone else. The correct action is to remove the resources from state (forget) rather than destroy them.

What it does

For each --repo, the command reads the YAML and derives every state address the repo owns:

  • module.repository["<repo>"] — the repo and everything nested in it
  • github_repository_ruleset.ruleset["<sha256("<repo>/<name>")>"] — one per ruleset
  • github_repository_custom_property.custom_property["<repo>/<name>"] — one per custom property

The ruleset key is a hash, so it can't be found by grepping the repo name in terraform state list — deriving it from the YAML is the point. It then emits a ready-to-run terraform state rm script (optionally --output <file> and --delete-yaml).

Because the commands mutate state only and never Terraform configuration, they cannot break plans for other repositories.

Why state-only instead of a removed block

A removed { … destroy = false } block is the declarative equivalent, but it must live in the Terraform root (sourced from this repo) while the YAML lives in the config repo. A removed block for an address still declared in config (i.e. while the YAML exists) is a hard error for every plan, and there is no atomic way to land the two changes across two repos. A state-only operation sidesteps that.

Testing

  • Unit tests cover address derivation, including that the ruleset key matches Terraform's sha256() hex output, ordering, and YAML resolution.
  • Verified the mechanism end-to-end against a live org: applied a for_each module instance + repo, ran terraform state rm 'module.repository["<name>"]', and confirmed the resource left state while the GitHub repo still existed (no destroy).

Follow-up (separate, needs prod state access)

The original incident was cleaned up manually with terraform state list | grep "Pulp-manager", which would not have matched hash-keyed rulesets. Worth checking the live state for an orphaned github_repository_ruleset still pointing at Pulp-manager.

When a repository leaves GCSS management (e.g. transferred to another org),
its YAML must be removed without Terraform destroying the now-foreign repo.
Deleting the YAML alone makes Terraform plan a destroy, which fails against
a repo the org no longer owns and blocks the shared workspace's plans for
every repository.

Add a `decommission` subcommand that derives every Terraform state address a
repository owns from its YAML — the module instance, rulesets (keyed by
sha256("<repo>/<name>"), which a `terraform state list | grep <repo>` cannot
find) and custom properties — and emits the `terraform state rm` commands to
forget them without destroying the underlying GitHub objects. State-only
operations do not touch Terraform configuration, so they never break plans
for other repositories. Includes unit tests and a runbook
(docs/decommissioning.md).

Refs #15
@dev-milos dev-milos marked this pull request as draft June 29, 2026 10:49
Remove --output (redundant with shell redirection) and --delete-yaml (the
only file-mutating behavior). The command now only prints terraform state rm
commands and makes no changes itself.
- Generated script now uses an idempotent rm_state helper (skips addresses
  already absent from state), so it is safe to re-run or resume after a
  partial run; set -euo pipefail still aborts on genuine errors.
- Runbook adds an explicit workspace-lock step covering the inconsistent
  window between state rm and YAML deletion, where a triggered run would
  otherwise plan to CREATE (and could recreate an empty repo in the org).
- Clarify that decommissioning is for repos that have left the org
  (transferred out / deleted), not for archiving a repo you still own.
The generated script's rm_state helper used `grep -qxF` (whole-line match)
to test whether an address is in state. `terraform state list` never prints
a bare `module.repository["<repo>"]` line — only the resource instances
nested inside it — so the whole-line match never found the module address,
the guard reported it "already absent", and `terraform state rm` was never
run on it. The script silently no-opped on the repository and all its nested
resources (the exact thing #15 is about).

Switch the guard to `grep -qF -- "$1"` (fixed-string substring match), which
matches both the nested module lines and the verbatim top-level
ruleset/custom-property lines. No false positives: every emitted address
ends in `"]`, so `module.repository["foo"]` cannot match
`module.repository["foobar"]...`.

Factor the match semantics into stateAddressPresent in Go and add a
regression test exercising it (module-via-nested-lines, verbatim top-level,
genuinely absent, and the foo/foobar false-positive guard), since the
existing tests only covered address derivation.
cobra's cmd.Print/Printf/Println default to OutOrStderr(), so the generated
script went to stderr. `decommission ... > script.sh` produced an empty file
and piping to a shell got nothing — the script was unusable for its primary
purpose. Write it to cmd.OutOrStdout() explicitly, and add a test asserting
the script lands on stdout (and nothing on stderr).
@dev-milos dev-milos marked this pull request as ready for review June 29, 2026 13:31
@dev-milos dev-milos requested a review from pavlovic-ivan June 29, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant