Skip to content

fix: security hardening and stability fixes across orchestrator, transport, and services#13

Merged
mkumatag merged 3 commits into
IBM:mainfrom
sudeeshjohn:fix/security-and-stability-hardening
Jul 1, 2026
Merged

fix: security hardening and stability fixes across orchestrator, transport, and services#13
mkumatag merged 3 commits into
IBM:mainfrom
sudeeshjohn:fix/security-and-stability-hardening

Conversation

@sudeeshjohn

@sudeeshjohn sudeeshjohn commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR addresses a set of critical security flaws, concurrency bugs, and logical issues identified across the ShiftLaunch codebase.


Changes

🔒 Security

infra/hmclog.go — Session token leak in HMC debug transport

maskSession only checked the request header for X-API-Session. During client.Login() the token is returned in the response header and XML body, so it was written to deployment.log in plain text.

  • Added response-header scrubbing via resp.Header.Get("X-API-Session")
  • Added regex catch-all scrubbing of <X-API-Session>…</X-API-Session> in the XML body
  • Added package-level sync.Mutex (logMu) to gate raw *os.File writes against concurrent goroutine interleaving

services/registry.go — Bash injection in pull-secret update

username and password were interpolated directly into the shell command string. A password containing ', ", or $ would silently corrupt pull-secret-updated.json.

  • Credentials wrapped in shellQuote helper (POSIX single-quote escaping)
  • Token passed to jq via env.REG_TOKEN — never appears in the filter string
  • registryURL passed via jq --arg host as an opaque bound variable ($host)
  • Filesystem paths wrapped in shellQuote to handle spaces in workspace paths
  • Fixed error string casing and trailing punctuation (staticcheck ST1005)

🚨 Critical Bugs

validation/validator.go, orchestrator/orchestrator.go — Nil pointer dereference panic

LoadBalancer is *ServiceLoadBalancer. Omitting load_balancer: from config leaves it nil. Direct .VIP field access panics immediately. Replaced all accesses with the existing nil-safe .GetVIP() getter from types.go.

localexec/client.go — Cross-cluster race condition in WriteFile

Temp file path was filename.tmp — identical across concurrent processes. Two simultaneous shiftlaunch create runs overwrite each other's staged configs. Fixed by embedding os.Getpid() in the temp path.

cmd/create.go — File descriptor leak on logger reassignment

defer orch.GetLogger().Close() evaluates the pointer at registration, not at exit. If orch is replaced later (deleted workspace path), the new logger's file descriptor leaks. Fixed with defer func() { orch.GetLogger().Close() }().


⚠️ Logical Fixes

services/downloader.go — Partial download corruption

test -s accepted truncated files left by an interrupted download (Ctrl+C), causing kernel panics on LPAR boot. Removed the skip guard — curl -C - handles the already-complete case natively. Also wired the previously dead extractHashFromManifest into both download functions.

infra/controller/network.go — Fragile awk column parsing

awk '{print $4}' breaks when ip -o output columns shift due to interface labels or routing flags, orphaning VIP aliases on teardown. Replaced with grep -oP '\d+\.\d+\.\d+\.\d+/\d+'.


Dependency

Depends on IBM/infra-go-sdk#12 (now merged). go.mod is pinned to the merged commit v0.0.0-20260630034014-da6ea8cba64d with no replace directive.

Signed-off-by: SUDEESH JOHN sudeeshjohn@in.ibm.com

@sudeeshjohn sudeeshjohn requested a review from mkumatag June 29, 2026 15:15
Comment thread go.mod Outdated

// TODO: remove this replace directive once IBM/infra-go-sdk#12 is merged.
// Depends on: https://github.com/IBM/infra-go-sdk/pull/12
replace github.com/IBM/infra-go-sdk => ../infra-go-sdk

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge after merging IBM/infra-go-sdk#12

…sport, and services

- infra/hmclog.go: add sync.Mutex to gate raw file writes against concurrent
  goroutine interleaving; add response-header and regex scrubbing to maskSession
  so X-API-Session tokens are redacted on login responses and in XML bodies

- validation/validator.go, orchestrator/orchestrator.go: replace raw
  .LoadBalancer.VIP field access with nil-safe .GetVIP() calls in
  findClusterUsingVIP and all call-sites to prevent nil pointer dereference
  panic when load_balancer block is omitted from config.yaml

- localexec/client.go: embed os.Getpid() in the WriteFile temp path to prevent
  cross-cluster race condition when two shiftlaunch processes run concurrently

- cmd/create.go: wrap deferred logger Close in anonymous func so it evaluates
  the current orchestrator pointer at exit time rather than at defer registration

- services/downloader.go: wire extractHashFromManifest into DownloadRHCOSImages
  and DownloadOpenShiftTools; remove test -s skip guard so curl -C - resumes
  partial downloads instead of accepting truncated files after Ctrl+C

- services/registry.go: rewrite pull-secret update command to pass credentials
  via env variable and jq --arg to prevent bash injection from special chars;
  add shellQuote helper; fix error string casing and punctuation

- infra/controller/network.go: replace awk column-split with grep -oP regex
  in RemoveVIPAlias to be robust against shifted ip -o output columns

- go.mod: add replace directive for IBM/infra-go-sdk pending IBM/infra-go-sdk#12

Signed-off-by: SUDEESH JOHN <sudeeshjohn@in.ibm.com>
@sudeeshjohn sudeeshjohn force-pushed the fix/security-and-stability-hardening branch from 3c58edf to c6cd9d8 Compare June 30, 2026 03:46
@sudeeshjohn sudeeshjohn marked this pull request as ready for review June 30, 2026 03:47
Both CVEs are HIGH severity fixes in Go stdlib 1.26.4:
- CVE-2026-27145: x509 VerifyHostname hostname matching flaw
- CVE-2026-42504: MIME header decoding DoS via malformed encoded-words

Signed-off-by: SUDEESH JOHN <sudeeshjohn@in.ibm.com>

@mkumatag mkumatag left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Overall this is a well-structured hardening and stability pass. The transport middleware refactor, session-token masking, shell-injection fix in registry.go, nil-guard on LoadBalancer, and the defer func(){}() fix are all clean and correct.

I've left inline comments on 4 items that need attention before merge: 2 correctness bugs and 2 nits.

Comment thread infra/hmclog.go
Comment thread infra/hmclog.go Outdated
Comment thread services/downloader.go
Comment thread logger/logger.go
@mkumatag

Copy link
Copy Markdown
Member

@sudeeshjohn these are the bob's review comments, please take a look..

@sudeeshjohn sudeeshjohn requested a review from mkumatag June 30, 2026 11:37
@sudeeshjohn sudeeshjohn force-pushed the fix/security-and-stability-hardening branch from 2a53fc3 to 5d66757 Compare June 30, 2026 11:40
logger/logger.go:
- New() no longer propagates log-file open error to caller; file log
  is optional — warn to stderr and continue rather than aborting
- Remove duplicate UpdateText() call in Info() spinner path (first call
  was dead code, immediately overwritten by the second)
- Simplify 'err == nil && file != nil' guard to 'err == nil' (the nil
  file check is always redundant when err is nil)
- TerminalOnly() returned os.Stdout; corrected to os.Stderr to match
  every other output path in the logger
- Fix gofmt indentation on fileOpts struct literal (fields were
  double-indented relative to their enclosing block)

services/downloader.go:
- Replace silent exec.Execute(mkdir -p) with os.MkdirAll so directory
  creation failures are surfaced immediately with a clear error
- Quote all paths and URLs passed to shell via shellQuote() to prevent
  shell injection and handle paths with spaces or special characters;
  add -- end-of-options to curl invocations
- Replace shell-out sha256sum|awk with pure-Go crypto/sha256+io.Copy
  in verifyFileHash — no injection surface, portable, faster
- Replace shell-out test -f + grep|awk in extractHashFromManifest with
  os.Stat + os.ReadFile + strings.Fields — eliminates regex metachar
  injection risk from filenames
- Replace exec rm -f for stale manifest with os.Remove
- Fix extractOpenShiftTools to use 'tar -xzf <archive> -C <dir>'
  instead of 'cd <dir> && tar -xzf <name>' — safer with quoted paths
- Reuse rhcosDir in manifestPath join (was duplicating the base path)
- Remove stale commit-message comment from DownloadAll
- Fix missing space after // in two inline comments (golint)
- Drop 'FATAL:' prefix from checksum mismatch error (non-idiomatic)

infra/hmclog.go:
- Check io.ReadAll error before restoring resp.Body; previously a
  partial read would hand a corrupt body back to the caller before the
  error was eventually checked and returned

Signed-off-by: SUDEESH JOHN <sudeeshjohn@in.ibm.com>
@sudeeshjohn sudeeshjohn force-pushed the fix/security-and-stability-hardening branch from 5d66757 to 4944c42 Compare June 30, 2026 11:44
@mkumatag mkumatag merged commit b859342 into IBM:main Jul 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants