Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,7 @@
See @../AGENTS.md for project guidelines.

See @../redhat-compliance-and-responsible-ai.md for Red Hat compliance and responsible AI rules.

## Architecture Decision Records

Before marking work complete, check: did you reject an alternative or accept a trade-off? If yes, suggest creating an ADR in `adr/`. Before proposing changes, check existing ADRs for prior decisions that might be affected. See [adr/README.md](../adr/README.md) for conventions.
22 changes: 22 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -748,11 +748,33 @@ Use this to trigger a reconciliation without changing the spec.
kubectl patch dw <name> --type merge -p "{\"metadata\": {\"annotations\": {\"force-update\": \"$(date +%s)\"}}}"
```

## Architecture Decision Records (ADRs)

ADRs live in the [`adr/`](adr/) directory. They capture the **why** behind non-trivial design decisions — information that is lost in code and PR descriptions over time.

### When to Create an ADR

Before marking work complete, ask: **did I reject an alternative or accept a trade-off?** If yes, create an ADR.

Specifically, write an ADR when your change involves any of these:

1. **You rejected an alternative** — There were 2+ reasonable approaches and you picked one
2. **You accepted a trade-off** — Something got worse in exchange for something more important
3. **You changed a resource lifecycle or ownership model** — Who creates, owns, or cleans up a Kubernetes resource
4. **You changed an external contract** — API shape, CRD fields, Secret/ConfigMap naming, image paths

**Don't** write an ADR for: bug fixes, dependency bumps, refactors preserving behavior, test additions, docs, or performance optimizations with no trade-offs.

### How to Create an ADR

Copy [`adr/TEMPLATE.md`](adr/TEMPLATE.md) to `adr/YYYY-MM-DD-short-slug.md` and fill in all sections. See [`adr/AGENTS.md`](adr/AGENTS.md) for maintenance conventions.

## See Also

- **[README.md](README.md)** - Project overview, installation, what is DevWorkspace Operator
- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Development setup, debugging, testing, contribution guidelines
- **[redhat-compliance-and-responsible-ai.md](redhat-compliance-and-responsible-ai.md)** - Red Hat compliance & responsible AI rules
- **[adr/](adr/)** - Architecture Decision Records
- **[docs/additional-configuration.adoc](docs/additional-configuration.adoc)** - DevWorkspace configuration options
- **[docs/unsupported-devfile-api.adoc](docs/unsupported-devfile-api.adoc)** - Unsupported Devfile API features
- **[Devfile Documentation](https://devfile.io/)** - Devfile specification
Expand Down
97 changes: 97 additions & 0 deletions adr/2026-05-11-backup-auth-secret-lifecycle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Backup Auth Secret Must Not Be Owned by Any Workspace

**Status**: Accepted
**Date**: 2026-05-11
**Deciders**: DevWorkspace Operator maintainers
**Related Issue**: [CRW-10760](https://redhat.atlassian.net/browse/CRW-10760)

## Context

The backup system copies the registry authentication secret (e.g., quay.io credentials) from the operator namespace into each workspace namespace as `devworkspace-backup-registry-auth`. The original implementation set a Kubernetes controller ownerReference from this secret to the DevWorkspace that triggered the copy:

```go
controllerutil.SetControllerReference(workspace, desiredSecret, scheme)
```

This was likely intended to clean up the secret when no workspaces need it anymore — standard Kubernetes garbage collection pattern.

However, two properties of this secret make ownerReference-based lifecycle management incorrect:

1. **The secret is a namespace singleton**: All workspaces in a namespace share the same `devworkspace-backup-registry-auth` secret, but a Kubernetes object can have only one controller owner. Whichever workspace's backup job ran last "wins" ownership.

2. **The secret is needed after all workspaces are deleted**: The primary restore use case is creating a new workspace from a backup of a deleted one. If the auth secret is garbage-collected when the last workspace is deleted, the user cannot authenticate to the private registry to pull the backup image.

**Bug observed (CRW-10760)**: When using quay.io (private registry) for backups, deleting a workspace caused backup entries to disappear from the Dashboard backup list for ALL workspaces in the namespace. The auth secret was garbage-collected, and the Dashboard could no longer query the registry.

**Validated on CRC cluster** (DWO 0.40.1, quay.io/okurinny):
- The `devworkspace-backup-registry-auth` secret was confirmed to have ownerReference to a single workspace (`nodejs`)
- Deleting that workspace triggered K8s GC, removing the secret
- The secret was not re-created by subsequent backup cycles (remaining workspaces already had recent backups)
- Backup listing in the Dashboard failed for all workspaces

## Decision

**Remove the ownerReference from the backup registry auth secret.** The secret becomes a namespace-scoped resource with no ownership tie to any workspace.

### What Changes

- `pkg/secrets/backup.go`: Remove the `controllerutil.SetControllerReference()` call in `CopySecret()`
- The secret is still created via `c.Create()` with `AlreadyExists` handling, just without an ownerReference
- The restore path now resolves the operator namespace via `infrastructure.GetNamespace()` to copy the secret on demand when it is missing

### What Doesn't Change

- Per-workspace resources (job runner ServiceAccount, image-builder RoleBinding) retain their ownerReferences — their GC on workspace deletion is correct and expected
- The backup Job itself retains its ownerReference (short-lived, TTL-cleaned)
- The `CopySecret` function signature stays the same

## Considered Alternatives

### Alternative 1: Multi-owner references (non-controller)

Add each workspace as a non-controller owner of the secret. K8s GC would only delete the secret when ALL owning workspaces are gone.

**Rejected because**:
- The secret must survive deletion of ALL workspaces (for restore)
- Adds complexity to track and merge owner lists
- Non-controller ownerReferences have subtle GC semantics

### Alternative 2: Finalizer-based cleanup

Remove ownerReference but add a cleanup mechanism (e.g., a controller that deletes the secret when the namespace has zero DevWorkspaces).

**Rejected because**:
- Adds complexity for a marginal benefit (one small secret in an otherwise-empty namespace)
- Could race with restore operations (user creates a workspace from backup right after the last one is deleted)
- Namespace deletion already cleans up all resources

### Alternative 3: Per-workspace auth secrets

Create a unique auth secret per workspace (e.g., `devworkspace-backup-registry-auth-{workspace-id}`).

**Rejected because**:
- Multiplies secrets unnecessarily (all contain the same credentials)
- The restore path expects the predefined name `devworkspace-backup-registry-auth`
- Still wouldn't survive workspace deletion for restore use case

## Consequences

### Positive

1. **Backups survive workspace deletion**: Users can delete all workspaces and still restore from backups
2. **No cross-workspace interference**: Deleting one workspace no longer affects other workspaces' backup capabilities
3. **Simpler lifecycle**: No ownership tracking needed for a namespace-scoped singleton

### Negative

1. **Secret persists in empty namespaces**: If all workspaces are deleted and the user never restores, the auth secret remains until the namespace is deleted. This is a minor leak — one small secret per namespace.

### Neutral

1. **Existing secrets on upgraded clusters**: Secrets created by older DWO versions will retain their stale ownerReference until the next `CopySecret` call overwrites them (via `SyncObjectWithCluster`). In the worst case, one more GC event occurs before the fix takes effect.

## References

- `pkg/secrets/backup.go` — `CopySecret()` function
- `controllers/backupcronjob/rbac.go` — Per-workspace SA/RoleBinding (unchanged)
- `pkg/constants/metadata.go:204` — `DevWorkspaceBackupAuthSecretName`
32 changes: 32 additions & 0 deletions adr/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# ADR Maintenance Guide for AI Agents

This guide covers how to create and maintain ADRs. For when to create an ADR, see the root [AGENTS.md](../AGENTS.md).

## Creating a New ADR

1. Copy [TEMPLATE.md](TEMPLATE.md) to a new file named `YYYY-MM-DD-short-slug.md`
2. Use today's date and a descriptive lowercase slug with hyphens
3. Fill in all sections — the "Considered Alternatives" section is the most valuable part
4. Set status to **Proposed** if seeking feedback, or **Accepted** if the decision is final

## Naming Convention

`YYYY-MM-DD-short-description.md`

- Date prefix for chronological ordering
- Lowercase, hyphens only, no special characters
- Keep the slug short but descriptive

## Superseding an ADR

When a decision is replaced:

1. Update the old ADR's status line: `**Status**: Superseded by [New Title](YYYY-MM-DD-new-slug.md)`
2. Create the new ADR with its own context explaining why the previous decision changed
3. Reference the old ADR in the new one's Context section

## Format Rules

- Follow the structure in [TEMPLATE.md](TEMPLATE.md)
- End every file with a trailing newline
- Use the existing ADRs as examples for tone and level of detail
39 changes: 39 additions & 0 deletions adr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Architecture Decision Records (ADRs)

This directory contains Architecture Decision Records for the DevWorkspace Operator.

## What is an ADR?

An ADR captures a significant design decision along with its context, alternatives considered, and trade-offs accepted. ADRs preserve the **why** behind decisions — information that is lost in code, commit messages, and PR descriptions over time.

## When to Write an ADR

Write an ADR when your change involves any of these:

1. **You rejected an alternative** — There were 2+ reasonable approaches and you picked one. The code shows what you chose but not what you didn't.
2. **You accepted a trade-off** — Something got worse (performance, complexity, a minor leak) in exchange for something more important.
3. **You changed a resource lifecycle or ownership model** — Who creates, owns, or cleans up a Kubernetes resource.
4. **You changed an external contract** — API shape, CRD fields, Secret/ConfigMap naming conventions, image paths.

**Don't** write an ADR for: bug fixes, dependency bumps, refactors preserving behavior, test additions, docs updates, or performance optimizations with no trade-offs.

## Lifecycle

| Status | Meaning |
|--------|---------|
| **Proposed** | Under discussion, not yet accepted |
| **Accepted** | Decision is in effect |
| **Deprecated** | Decision is outdated but not yet replaced |
| **Superseded** | Replaced by a newer ADR (link to it in the status line) |

To supersede an ADR, update its status to `Superseded by [new ADR title](new-adr-file.md)` and create a new ADR.

## Naming Convention

Files use date-prefixed slugs: `YYYY-MM-DD-short-description.md`

Example: `2026-05-11-backup-auth-secret-lifecycle.md`

## Creating a New ADR

Copy [TEMPLATE.md](TEMPLATE.md) and fill in the sections. See existing ADRs for examples.
51 changes: 51 additions & 0 deletions adr/TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# [Title of Decision]

**Status**: Proposed | Accepted | Deprecated | Superseded by [title](link)
**Date**: YYYY-MM-DD
**Deciders**: DevWorkspace Operator maintainers
**Related Issue**: [ISSUE-ID](link) *(optional)*

## Context

What is the problem or need that prompted this decision? Include enough background for someone unfamiliar with the issue to understand why a decision was needed.

## Decision

What did we decide to do? Be specific about what changes and what stays the same.

## Considered Alternatives

### Alternative 1: [Name]

Brief description.

**Rejected because**:
- Reason 1
- Reason 2

### Alternative 2: [Name]

Brief description.

**Rejected because**:
- Reason 1
- Reason 2

## Consequences

### Positive

1. Benefit 1
2. Benefit 2

### Negative

1. Downside or accepted trade-off

### Neutral

1. Side effects that are neither positive nor negative

## References

- Links to relevant code, docs, PRs, or external resources
Loading