Add decommission command for repos transferred out of the org#58
Open
dev-milos wants to merge 6 commits into
Open
Add decommission command for repos transferred out of the org#58dev-milos wants to merge 6 commits into
dev-milos wants to merge 6 commits into
Conversation
When a repository leaves GCSS management (e.g. transferred to another org),
its YAML must be removed without Terraform destroying the now-foreign repo.
Deleting the YAML alone makes Terraform plan a destroy, which fails against
a repo the org no longer owns and blocks the shared workspace's plans for
every repository.
Add a `decommission` subcommand that derives every Terraform state address a
repository owns from its YAML — the module instance, rulesets (keyed by
sha256("<repo>/<name>"), which a `terraform state list | grep <repo>` cannot
find) and custom properties — and emits the `terraform state rm` commands to
forget them without destroying the underlying GitHub objects. State-only
operations do not touch Terraform configuration, so they never break plans
for other repositories. Includes unit tests and a runbook
(docs/decommissioning.md).
Refs #15
Remove --output (redundant with shell redirection) and --delete-yaml (the only file-mutating behavior). The command now only prints terraform state rm commands and makes no changes itself.
- Generated script now uses an idempotent rm_state helper (skips addresses already absent from state), so it is safe to re-run or resume after a partial run; set -euo pipefail still aborts on genuine errors. - Runbook adds an explicit workspace-lock step covering the inconsistent window between state rm and YAML deletion, where a triggered run would otherwise plan to CREATE (and could recreate an empty repo in the org). - Clarify that decommissioning is for repos that have left the org (transferred out / deleted), not for archiving a repo you still own.
The generated script's rm_state helper used `grep -qxF` (whole-line match) to test whether an address is in state. `terraform state list` never prints a bare `module.repository["<repo>"]` line — only the resource instances nested inside it — so the whole-line match never found the module address, the guard reported it "already absent", and `terraform state rm` was never run on it. The script silently no-opped on the repository and all its nested resources (the exact thing #15 is about). Switch the guard to `grep -qF -- "$1"` (fixed-string substring match), which matches both the nested module lines and the verbatim top-level ruleset/custom-property lines. No false positives: every emitted address ends in `"]`, so `module.repository["foo"]` cannot match `module.repository["foobar"]...`. Factor the match semantics into stateAddressPresent in Go and add a regression test exercising it (module-via-nested-lines, verbatim top-level, genuinely absent, and the foo/foobar false-positive guard), since the existing tests only covered address derivation.
cobra's cmd.Print/Printf/Println default to OutOrStderr(), so the generated script went to stderr. `decommission ... > script.sh` produced an empty file and piping to a shell got nothing — the script was unusable for its primary purpose. Write it to cmd.OutOrStdout() explicitly, and add a test asserting the script lands on stdout (and nothing on stderr).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a
decommissionsubcommand to the importer CLI that produces the exactterraform state rmcommands needed to stop managing a repository without destroying it, plus a runbook (docs/decommissioning.md).Refs #15.
Why
When a repository is transferred out of the org but its
repos/<repo>.yamlstays in GCSS, Terraform still tracks it in state. On the next run the provider can no longer read the repo (it belongs to another org), so plan/apply fails — and because the workspace state is shared, one orphaned repo blocks PRs for every repository.Deleting the YAML on its own does not fix this: the repo is then in state but not in config, so Terraform plans to destroy it — either erroring against a repo the org no longer owns, or acting on a repo that now belongs to someone else. The correct action is to remove the resources from state (forget) rather than destroy them.
What it does
For each
--repo, the command reads the YAML and derives every state address the repo owns:module.repository["<repo>"]— the repo and everything nested in itgithub_repository_ruleset.ruleset["<sha256("<repo>/<name>")>"]— one per rulesetgithub_repository_custom_property.custom_property["<repo>/<name>"]— one per custom propertyThe ruleset key is a hash, so it can't be found by grepping the repo name in
terraform state list— deriving it from the YAML is the point. It then emits a ready-to-runterraform state rmscript (optionally--output <file>and--delete-yaml).Because the commands mutate state only and never Terraform configuration, they cannot break plans for other repositories.
Why state-only instead of a
removedblockA
removed { … destroy = false }block is the declarative equivalent, but it must live in the Terraform root (sourced from this repo) while the YAML lives in the config repo. Aremovedblock for an address still declared in config (i.e. while the YAML exists) is a hard error for every plan, and there is no atomic way to land the two changes across two repos. A state-only operation sidesteps that.Testing
sha256()hex output, ordering, and YAML resolution.for_eachmodule instance + repo, ranterraform state rm 'module.repository["<name>"]', and confirmed the resource left state while the GitHub repo still existed (no destroy).Follow-up (separate, needs prod state access)
The original incident was cleaned up manually with
terraform state list | grep "Pulp-manager", which would not have matched hash-keyed rulesets. Worth checking the live state for an orphanedgithub_repository_rulesetstill pointing atPulp-manager.