Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions docs/adr-backup-auth-secret-lifecycle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Backup Auth Secret Must Not Be Owned by Any Workspace

**Status**: Accepted
**Date**: 2026-05-11
**Deciders**: DevWorkspace Operator maintainers
**Related Issue**: [CRW-10760](https://redhat.atlassian.net/browse/CRW-10760)

## Context

The backup system copies the registry authentication secret (e.g., quay.io credentials) from the operator namespace into each workspace namespace as `devworkspace-backup-registry-auth`. The original implementation set a Kubernetes controller ownerReference from this secret to the DevWorkspace that triggered the copy:

```go
controllerutil.SetControllerReference(workspace, desiredSecret, scheme)
```

This was likely intended to clean up the secret when no workspaces need it anymore — standard Kubernetes garbage collection pattern.

However, two properties of this secret make ownerReference-based lifecycle management incorrect:

1. **The secret is a namespace singleton**: All workspaces in a namespace share the same `devworkspace-backup-registry-auth` secret, but a Kubernetes object can have only one controller owner. Whichever workspace's backup job ran last "wins" ownership.

2. **The secret is needed after all workspaces are deleted**: The primary restore use case is creating a new workspace from a backup of a deleted one. If the auth secret is garbage-collected when the last workspace is deleted, the user cannot authenticate to the private registry to pull the backup image.

**Bug observed (CRW-10760)**: When using quay.io (private registry) for backups, deleting a workspace caused backup entries to disappear from the Dashboard backup list for ALL workspaces in the namespace. The auth secret was garbage-collected, and the Dashboard could no longer query the registry.

**Validated on CRC cluster** (DWO 0.40.1, quay.io/okurinny):
- The `devworkspace-backup-registry-auth` secret was confirmed to have ownerReference to a single workspace (`nodejs`)
- Deleting that workspace triggered K8s GC, removing the secret
- The secret was not re-created by subsequent backup cycles (remaining workspaces already had recent backups)
- Backup listing in the Dashboard failed for all workspaces

## Decision

**Remove the ownerReference from the backup registry auth secret.** The secret becomes a namespace-scoped resource with no ownership tie to any workspace.

### What Changes

- `pkg/secrets/backup.go`: Remove the `controllerutil.SetControllerReference()` call in `CopySecret()`
- The secret is still created and synced via `SyncObjectWithCluster`, just without an ownerReference

### What Doesn't Change

- Per-workspace resources (job runner ServiceAccount, image-builder RoleBinding) retain their ownerReferences — their GC on workspace deletion is correct and expected
- The backup Job itself retains its ownerReference (short-lived, TTL-cleaned)
- The `CopySecret` function signature stays the same

## Considered Alternatives

### Alternative 1: Multi-owner references (non-controller)

Add each workspace as a non-controller owner of the secret. K8s GC would only delete the secret when ALL owning workspaces are gone.

**Rejected because**:
- The secret must survive deletion of ALL workspaces (for restore)
- Adds complexity to track and merge owner lists
- Non-controller ownerReferences have subtle GC semantics

### Alternative 2: Finalizer-based cleanup

Remove ownerReference but add a cleanup mechanism (e.g., a controller that deletes the secret when the namespace has zero DevWorkspaces).

**Rejected because**:
- Adds complexity for a marginal benefit (one small secret in an otherwise-empty namespace)
- Could race with restore operations (user creates a workspace from backup right after the last one is deleted)
- Namespace deletion already cleans up all resources

### Alternative 3: Per-workspace auth secrets

Create a unique auth secret per workspace (e.g., `devworkspace-backup-registry-auth-{workspace-id}`).

**Rejected because**:
- Multiplies secrets unnecessarily (all contain the same credentials)
- The restore path expects the predefined name `devworkspace-backup-registry-auth`
- Still wouldn't survive workspace deletion for restore use case

## Consequences

### Positive

1. **Backups survive workspace deletion**: Users can delete all workspaces and still restore from backups
2. **No cross-workspace interference**: Deleting one workspace no longer affects other workspaces' backup capabilities
3. **Simpler lifecycle**: No ownership tracking needed for a namespace-scoped singleton

### Negative

1. **Secret persists in empty namespaces**: If all workspaces are deleted and the user never restores, the auth secret remains until the namespace is deleted. This is a minor leak — one small secret per namespace.

### Neutral

1. **Existing secrets on upgraded clusters**: Secrets created by older DWO versions will retain their stale ownerReference until the next `CopySecret` call overwrites them (via `SyncObjectWithCluster`). In the worst case, one more GC event occurs before the fix takes effect.

## References

- `pkg/secrets/backup.go` — `CopySecret()` function
- `controllers/backupcronjob/rbac.go` — Per-workspace SA/RoleBinding (unchanged)
- `pkg/constants/metadata.go:204` — `DevWorkspaceBackupAuthSecretName`
14 changes: 7 additions & 7 deletions pkg/secrets/backup.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ import (
dw "github.com/devfile/api/v2/pkg/apis/workspaces/v1alpha2"
controllerv1alpha1 "github.com/devfile/devworkspace-operator/apis/controller/v1alpha1"
"github.com/devfile/devworkspace-operator/pkg/constants"
"github.com/devfile/devworkspace-operator/pkg/infrastructure"
"github.com/go-logr/logr"
corev1 "k8s.io/api/core/v1"
k8sErrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
)

// GetRegistryAuthSecret retrieves the registry authentication secret for accessing backup images
Expand Down Expand Up @@ -60,9 +60,13 @@ func HandleRegistryAuthSecret(ctx context.Context, c client.Client, workspace *d
if client.IgnoreNotFound(err) != nil {
return nil, err
}
// If we don't provide an operator namespace, don't attempt to look there.
if operatorConfigNamespace == "" {
return nil, nil
resolvedNS, nsErr := infrastructure.GetNamespace()
if nsErr != nil {
log.Info("Cannot resolve operator namespace for auth secret fallback", "error", nsErr)
return nil, nil
}
operatorConfigNamespace = resolvedNS
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
}

// Check if AuthSecret is configured in operator config
Expand Down Expand Up @@ -111,10 +115,6 @@ func CopySecret(ctx context.Context, c client.Client, workspace *dw.DevWorkspace
Type: sourceSecret.Type,
}

if err := controllerutil.SetControllerReference(workspace, desiredSecret, scheme); err != nil {
return nil, err
}

err = c.Create(ctx, desiredSecret)
if err != nil {
if k8sErrors.IsAlreadyExists(err) {
Expand Down
117 changes: 110 additions & 7 deletions pkg/secrets/backup_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ package secrets_test
import (
"context"
"errors"
"os"
"testing"

. "github.com/onsi/ginkgo/v2"
Expand All @@ -34,6 +35,7 @@ import (
"sigs.k8s.io/controller-runtime/pkg/log/zap"

"github.com/devfile/devworkspace-operator/pkg/constants"
"github.com/devfile/devworkspace-operator/pkg/infrastructure"
"github.com/devfile/devworkspace-operator/pkg/secrets"
)

Expand Down Expand Up @@ -188,7 +190,7 @@ var _ = Describe("HandleRegistryAuthSecret (backup path: operatorConfigNamespace
Expect(result).To(BeNil())
})

It("copies secret from operator namespace when AuthSecret is configured and secret not found in workspace namespace", func() {
It("copies secret from operator namespace without ownerReferences", func() {
By("creating a secret in the operator namespace")
operatorSecret := makeSecret(constants.DevWorkspaceBackupAuthSecretName, operatorNS)
operatorSecret.Data = map[string][]byte{"auth": []byte("operator-credentials")}
Expand Down Expand Up @@ -216,12 +218,8 @@ var _ = Describe("HandleRegistryAuthSecret (backup path: operatorConfigNamespace
By("verifying the copied secret has the watch-secret label")
Expect(copiedSecret.Labels).To(HaveKeyWithValue(constants.DevWorkspaceWatchSecretLabel, "true"))

By("verifying the copied secret has an owner reference to the workspace")
Expect(copiedSecret.OwnerReferences).To(HaveLen(1))
Expect(copiedSecret.OwnerReferences[0].Name).To(Equal(workspace.Name))
Expect(copiedSecret.OwnerReferences[0].Kind).To(Equal("DevWorkspace"))
Expect(copiedSecret.OwnerReferences[0].Controller).NotTo(BeNil())
Expect(*copiedSecret.OwnerReferences[0].Controller).To(BeTrue())
By("verifying the copied secret has no ownerReferences")
Expect(copiedSecret.OwnerReferences).To(BeEmpty())
})

It("NEVER overwrites user-provided secret even if operator has different credentials", func() {
Expand Down Expand Up @@ -266,6 +264,52 @@ var _ = Describe("HandleRegistryAuthSecret (backup path: operatorConfigNamespace
})
})

var _ = Describe("HandleRegistryAuthSecret (restore path: fallback to operator namespace)", func() {
const (
workspaceNS = "user-namespace"
operatorNS = "operator-namespace"
)

var (
ctx context.Context
scheme *runtime.Scheme
log = zap.New(zap.UseDevMode(true)).WithName("SecretsTest")
)

BeforeEach(func() {
ctx = context.Background()
scheme = buildScheme()
os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS)
})

AfterEach(func() {
os.Unsetenv(infrastructure.WatchNamespaceEnvVar)
})
Comment thread
coderabbitai[bot] marked this conversation as resolved.

It("copies the secret from operator namespace when missing in workspace namespace", func() {
By("creating the auth secret only in the operator namespace")
operatorSecret := makeSecret("quay-backup-auth", operatorNS)

fakeClient := fake.NewClientBuilder().WithScheme(scheme).WithObjects(operatorSecret).Build()
workspace := makeWorkspace(workspaceNS)
config := makeConfig("quay-backup-auth")

result, err := secrets.HandleRegistryAuthSecret(ctx, fakeClient, workspace, config, "", scheme, log)
Expect(err).NotTo(HaveOccurred())
Expect(result).NotTo(BeNil())
Expect(result.Name).To(Equal(constants.DevWorkspaceBackupAuthSecretName))
Expect(result.Namespace).To(Equal(workspaceNS))

By("verifying the secret was copied to the workspace namespace")
copied := &corev1.Secret{}
err = fakeClient.Get(ctx, client.ObjectKey{
Name: constants.DevWorkspaceBackupAuthSecretName,
Namespace: workspaceNS,
}, copied)
Expect(err).NotTo(HaveOccurred())
})
})

// errorOnNameClient is a thin client wrapper that injects an error for a specific secret name.
type errorOnNameClient struct {
client.Client
Expand All @@ -285,3 +329,62 @@ func (e *errorOnNameClient) Get(ctx context.Context, key client.ObjectKey, obj c

// Ensure errorOnNameClient satisfies client.Client at compile time.
var _ client.Client = &errorOnNameClient{}

var _ = Describe("CopySecret", func() {
const (
workspaceNS = "user-namespace"
operatorNS = "operator-namespace"
)

var (
ctx context.Context
scheme *runtime.Scheme
log = zap.New(zap.UseDevMode(true)).WithName("SecretsTest")
)

BeforeEach(func() {
ctx = context.Background()
scheme = buildScheme()
})

It("creates the secret without ownerReferences", func() {
By("copying a source secret into the workspace namespace")
sourceSecret := makeSecret("quay-push-secret", operatorNS)
workspace := makeWorkspace(workspaceNS)

fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build()

result, err := secrets.CopySecret(ctx, fakeClient, workspace, sourceSecret, scheme, log)
Expect(err).NotTo(HaveOccurred())
Expect(result).NotTo(BeNil())
Expect(result.Name).To(Equal(constants.DevWorkspaceBackupAuthSecretName))
Expect(result.Namespace).To(Equal(workspaceNS))

By("verifying the created secret has no ownerReferences")
created := &corev1.Secret{}
err = fakeClient.Get(ctx, client.ObjectKey{
Name: constants.DevWorkspaceBackupAuthSecretName,
Namespace: workspaceNS,
}, created)
Expect(err).NotTo(HaveOccurred())
Expect(created.OwnerReferences).To(BeEmpty())
})

It("preserves the secret data and type from the source", func() {
sourceSecret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: "quay-push-secret",
Namespace: operatorNS,
},
Data: map[string][]byte{".dockerconfigjson": []byte(`{"auths":{}}`)},
Type: corev1.SecretTypeDockerConfigJson,
}
workspace := makeWorkspace(workspaceNS)
fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build()

result, err := secrets.CopySecret(ctx, fakeClient, workspace, sourceSecret, scheme, log)
Expect(err).NotTo(HaveOccurred())
Expect(result.Data).To(HaveKey(".dockerconfigjson"))
Expect(result.Type).To(Equal(corev1.SecretTypeDockerConfigJson))
})
})
Loading