Backup registry auth secret must not be owned by any workspace#1631
Backup registry auth secret must not be owned by any workspace#1631akurinnoy wants to merge 7 commits into
Conversation
The backup registry auth secret (devworkspace-backup-registry-auth) is a namespace singleton shared by all workspaces. Setting a controller ownerReference to a single workspace caused Kubernetes garbage collection to delete the secret when that workspace was deleted, breaking backup/restore for all remaining workspaces in the namespace. Remove the SetControllerReference call so the secret persists independently of any workspace lifecycle. The secret is cleaned up naturally when the namespace is deleted. Assisted-by: Claude Code Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
When the backup registry auth secret is missing from the workspace namespace (e.g. after GC on upgrade), the restore path now resolves the operator namespace via infrastructure.GetNamespace() and copies the secret from there, matching the backup path behavior. Previously the restore path returned nil when the secret was missing, causing restore init containers to fail on private registries. Assisted-by: Claude Code Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR fixes backup registry auth secret lifecycle management by removing workspace-level ownership references, adding operator-namespace fallback resolution during restore, and validating the changes through updated and new test suites. An ADR documents the architectural decision and problem statement. ChangesBackup Auth Secret Lifecycle
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/secrets/backup_test.go`:
- Around line 279-287: BeforeEach currently calls
os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS) without checking the
error and AfterEach unconditionally calls os.Unsetenv; instead, in BeforeEach
capture the prior value with os.LookupEnv, set the env using os.Setenv and
handle any error (fail the test via the test framework), and in AfterEach
restore the original state: if the prior value existed, call
os.Setenv(originalKey, originalVal) and check the error, otherwise call
os.Unsetenv and check the error; reference the BeforeEach/AfterEach blocks and
the use of infrastructure.WatchNamespaceEnvVar and operatorNS to locate where to
add the lookup, error checks, and restoration logic.
In `@pkg/secrets/backup.go`:
- Around line 64-69: The code currently swallows namespace resolution failures
by returning nil, nil when infrastructure.GetNamespace() returns an error;
update the error path in pkg/secrets/backup.go so that when nsErr != nil you
return the error (or a wrapped error) instead of nil, nil, and ensure
operatorConfigNamespace is only set after a successful GetNamespace() call;
reference GetNamespace(), nsErr, and operatorConfigNamespace to locate and fix
the failing branch so restore fails fast with a clear cause.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a4ce1a03-5397-4980-a568-430bf14ef1e2
📒 Files selected for processing (3)
docs/adr-backup-auth-secret-lifecycle.mdpkg/secrets/backup.gopkg/secrets/backup_test.go
Return an error instead of silently returning nil when infrastructure.GetNamespace() fails on the restore path. This makes auth failures visible immediately rather than causing a confusing image pull error later. Also properly save and restore the WATCH_NAMESPACE env var in tests. Assisted-by: Claude Code Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
pkg/secrets/backup_test.go (1)
286-287:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle and assert errors from env mutation calls.
os.Setenv/os.Unsetenverrors are still ignored in setup/teardown, which breaks errcheck and weakens test isolation guarantees.#!/bin/bash # Verify unchecked env mutation calls in this test file rg -n -C2 'os\.(Setenv|Unsetenv)\(' pkg/secrets/backup_test.goAs per coding guidelines, "Don't ignore errors. Always handle or propagate errors explicitly."
Also applies to: 290-294
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/secrets/backup_test.go` around lines 286 - 287, The test currently ignores errors from os.Setenv/os.Unsetenv (e.g., the call setting infrastructure.WatchNamespaceEnvVar to operatorNS), which fails errcheck; update the test to either use t.Setenv(...) (preferred) or check the returned error and call t.Fatalf/require.NoError to fail the test on failure, and do the same for the corresponding Unsetenv calls (and other occurrences around the same block at the 290-294 region). Ensure you reference the environment variable symbol infrastructure.WatchNamespaceEnvVar and the operatorNS value when updating the setup/teardown so errors are handled/asserted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/secrets/backup_test.go`:
- Around line 120-129: The test is flaky because it assumes WATCH_NAMESPACE is
unset and real infrastructure detection; make it deterministic by explicitly
setting the env var to an empty string (or saving and restoring it) within the
test and by initializing test infrastructure via
infrastructure.InitializeForTesting() so the test does not consult real
environment/infrastructure; update the spec around the call to
secrets.HandleRegistryAuthSecret (and helper calls makeWorkspace/makeConfig if
needed) to call infrastructure.InitializeForTesting() at start and ensure
WATCH_NAMESPACE is explicitly cleared/controlled for the duration of the test,
restoring prior state afterwards.
In `@pkg/secrets/backup.go`:
- Around line 63-69: The code resolves operatorConfigNamespace unconditionally
which causes failures even when no auth is needed; change the logic so
infrastructure.GetNamespace() is only called when AuthSecret is non-empty: wrap
the operatorConfigNamespace resolution inside the branch that checks
cfg.AuthSecret (or AuthSecret variable) and only attempt to resolve/set
operatorConfigNamespace when AuthSecret != ""; apply the same change for the
later block that currently resolves namespace (the code around
operatorConfigNamespace and infrastructure.GetNamespace) so anonymous (no-auth)
flows skip namespace resolution entirely.
---
Duplicate comments:
In `@pkg/secrets/backup_test.go`:
- Around line 286-287: The test currently ignores errors from
os.Setenv/os.Unsetenv (e.g., the call setting
infrastructure.WatchNamespaceEnvVar to operatorNS), which fails errcheck; update
the test to either use t.Setenv(...) (preferred) or check the returned error and
call t.Fatalf/require.NoError to fail the test on failure, and do the same for
the corresponding Unsetenv calls (and other occurrences around the same block at
the 290-294 region). Ensure you reference the environment variable symbol
infrastructure.WatchNamespaceEnvVar and the operatorNS value when updating the
setup/teardown so errors are handled/asserted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d993c0c2-89ec-4642-997c-d9ed96e27aa5
📒 Files selected for processing (3)
docs/adr-backup-auth-secret-lifecycle.mdpkg/secrets/backup.gopkg/secrets/backup_test.go
✅ Files skipped from review due to trivial changes (1)
- docs/adr-backup-auth-secret-lifecycle.md
| @@ -0,0 +1,97 @@ | |||
| # Backup Auth Secret Must Not Be Owned by Any Workspace | |||
There was a problem hiding this comment.
This seems like an implementation proposal document. Is it meant to be committed?
There was a problem hiding this comment.
There was a problem hiding this comment.
Umm, I think it's okay to keep it, but not in docs/ directory. I think docs/ should only contain user-facing documentation.
There was a problem hiding this comment.
Right, but what about docs/adr/? Or, you prefer adr/ in the project root?
There was a problem hiding this comment.
I'm okay with adr/ in project root, but let's wait for others to comment too.
There was a problem hiding this comment.
ADR moved to a separate PR: #1633. The discussion about ADR directory placement continues there.
|
I tested it and it works as expected: Secret doesn't get cleaned up after DevWorkspace deletion: |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: akurinnoy, rohanKanojia The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
|
New changes are detected. LGTM label has been removed. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@adr/adr-backup-auth-secret-lifecycle.md`:
- Around line 89-91: Update the Neutral section to remove the incorrect claim
that CopySecret (and SyncObjectWithCluster) will overwrite existing secrets;
instead state that CopySecret explicitly avoids modifying existing secrets and
will fetch and return them as-is, so secrets created by older DWO versions will
keep their stale ownerReference until explicitly replaced or deleted by other
logic. Mention the CopySecret function name and that there is no
SyncObjectWithCluster call modifying existing secrets to make the correction
precise.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 62ea655f-5662-4e9d-bdb0-9cac39ce3a6a
📒 Files selected for processing (1)
adr/adr-backup-auth-secret-lifecycle.md
|
Hi! I'm che-ai-assistant — I help with your pull requests. Available commands:
|
The ADR framework is introduced in a separate PR to keep the bugfix focused. Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
|
The ADR discussion from this PR continues in #1633, which introduces the ADR framework to the project. |
Move the AuthSecret check before the infrastructure.GetNamespace() call so anonymous registries (no AuthSecret configured) don't fail with a namespace resolution error. Also explicitly unset WATCH_NAMESPACE in the restore-path test suite to avoid environment-dependent flakiness. Assisted-by: Claude Code Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
|
/che-ai-assistant ok-pr-review |
What does this PR do?
This PR removes the controller ownerReference from the backup registry auth secret so it is not garbage-collected when a workspace is deleted. Also makes the restore path fall back to copying the secret from the operator namespace when it is missing in the workspace namespace.
The PR includes an ADR documenting why the auth secret must not be owned by any workspace.
What issues does this PR fix or reference?
Fixes https://redhat.atlassian.net/browse/CRW-10760
Is it tested? How?
New unit tests added. Validated manually on CRC cluster (DWO 0.40.1, quay.io private registry):
PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-pathto trigger)v8-devworkspace-operator-e2e: DevWorkspace e2e testv8-che-happy-path: Happy path for verification integration with CheSummary by CodeRabbit
Release Notes
Bug Fixes
Documentation