Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,7 @@
See @../AGENTS.md for project guidelines.

See @../redhat-compliance-and-responsible-ai.md for Red Hat compliance and responsible AI rules.

## Architecture Decision Records

Before marking work complete, check: did you reject an alternative or accept a trade-off? If yes, suggest creating an ADR in `adr/`. Before proposing changes, check existing ADRs for prior decisions that might be affected. See [adr/README.md](../adr/README.md) for conventions.
24 changes: 23 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -748,16 +748,38 @@ Use this to trigger a reconciliation without changing the spec.
kubectl patch dw <name> --type merge -p "{\"metadata\": {\"annotations\": {\"force-update\": \"$(date +%s)\"}}}"
```

## Architecture Decision Records (ADRs)

ADRs live in the [`adr/`](adr/) directory. They capture the **why** behind non-trivial design decisions — information that is lost in code and PR descriptions over time.

### When to Create an ADR

Before marking work complete, ask: **did I reject an alternative or accept a trade-off?** If yes, create an ADR.

Specifically, write an ADR when your change involves any of these:

1. **You rejected an alternative** — There were 2+ reasonable approaches and you picked one
2. **You accepted a trade-off** — Something got worse in exchange for something more important
3. **You changed a resource lifecycle or ownership model** — Who creates, owns, or cleans up a Kubernetes resource
4. **You changed an external contract** — API shape, CRD fields, Secret/ConfigMap naming, image paths

**Don't** write an ADR for: bug fixes, dependency bumps, refactors preserving behavior, test additions, docs, or performance optimizations with no trade-offs.

### How to Create an ADR

Copy [`adr/TEMPLATE.md`](adr/TEMPLATE.md) to `adr/YYYY-MM-DD-short-slug.md` and fill in all sections. See [`adr/AGENTS.md`](adr/AGENTS.md) for maintenance conventions.

## See Also

- **[README.md](README.md)** - Project overview, installation, what is DevWorkspace Operator
- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Development setup, debugging, testing, contribution guidelines
- **[redhat-compliance-and-responsible-ai.md](redhat-compliance-and-responsible-ai.md)** - Red Hat compliance & responsible AI rules
- **[adr/](adr/)** - Architecture Decision Records
- **[docs/additional-configuration.adoc](docs/additional-configuration.adoc)** - DevWorkspace configuration options
- **[docs/unsupported-devfile-api.adoc](docs/unsupported-devfile-api.adoc)** - Unsupported Devfile API features
- **[Devfile Documentation](https://devfile.io/)** - Devfile specification
- **[Kubebuilder Book](https://book.kubebuilder.io/)** - Controller patterns and best practices

---

**Last Updated**: 2025-12-24
**Last Updated**: 2026-05-15
97 changes: 97 additions & 0 deletions adr/2026-05-11-backup-auth-secret-lifecycle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Backup Auth Secret Must Not Be Owned by Any Workspace

**Status**: Accepted
**Date**: 2026-05-11
**Deciders**: DevWorkspace Operator maintainers
**Related Issue**: [CRW-10760](https://redhat.atlassian.net/browse/CRW-10760)

## Context

The backup system copies the registry authentication secret (e.g., quay.io credentials) from the operator namespace into each workspace namespace as `devworkspace-backup-registry-auth`. The original implementation set a Kubernetes controller ownerReference from this secret to the DevWorkspace that triggered the copy:

```go
controllerutil.SetControllerReference(workspace, desiredSecret, scheme)
```

This was likely intended to clean up the secret when no workspaces need it anymore — standard Kubernetes garbage collection pattern.

However, two properties of this secret make ownerReference-based lifecycle management incorrect:

1. **The secret is a namespace singleton**: All workspaces in a namespace share the same `devworkspace-backup-registry-auth` secret, but a Kubernetes object can have only one controller owner. Whichever workspace's backup job ran last "wins" ownership.

2. **The secret is needed after all workspaces are deleted**: The primary restore use case is creating a new workspace from a backup of a deleted one. If the auth secret is garbage-collected when the last workspace is deleted, the user cannot authenticate to the private registry to pull the backup image.

**Bug observed (CRW-10760)**: When using quay.io (private registry) for backups, deleting a workspace caused backup entries to disappear from the Dashboard backup list for ALL workspaces in the namespace. The auth secret was garbage-collected, and the Dashboard could no longer query the registry.

**Validated on CRC cluster** (DWO 0.40.1, quay.io/okurinny):
- The `devworkspace-backup-registry-auth` secret was confirmed to have ownerReference to a single workspace (`nodejs`)
- Deleting that workspace triggered K8s GC, removing the secret
- The secret was not re-created by subsequent backup cycles (remaining workspaces already had recent backups)
- Backup listing in the Dashboard failed for all workspaces

## Decision

**Remove the ownerReference from the backup registry auth secret.** The secret becomes a namespace-scoped resource with no ownership tie to any workspace.

### What Changes

- `pkg/secrets/backup.go`: Remove the `controllerutil.SetControllerReference()` call in `CopySecret()`
- The secret is still created via `c.Create()` with `AlreadyExists` handling, just without an ownerReference
- The restore path now resolves the operator namespace via `infrastructure.GetNamespace()` to copy the secret on demand when it is missing

### What Doesn't Change

- Per-workspace resources (job runner ServiceAccount, image-builder RoleBinding) retain their ownerReferences — their GC on workspace deletion is correct and expected
- The backup Job itself retains its ownerReference (short-lived, TTL-cleaned)
- The `CopySecret` function signature stays the same

## Considered Alternatives

### Alternative 1: Multi-owner references (non-controller)

Add each workspace as a non-controller owner of the secret. K8s GC would only delete the secret when ALL owning workspaces are gone.

**Rejected because**:
- The secret must survive deletion of ALL workspaces (for restore)
- Adds complexity to track and merge owner lists
- Non-controller ownerReferences have subtle GC semantics

### Alternative 2: Finalizer-based cleanup

Remove ownerReference but add a cleanup mechanism (e.g., a controller that deletes the secret when the namespace has zero DevWorkspaces).

**Rejected because**:
- Adds complexity for a marginal benefit (one small secret in an otherwise-empty namespace)
- Could race with restore operations (user creates a workspace from backup right after the last one is deleted)
- Namespace deletion already cleans up all resources

### Alternative 3: Per-workspace auth secrets

Create a unique auth secret per workspace (e.g., `devworkspace-backup-registry-auth-{workspace-id}`).

**Rejected because**:
- Multiplies secrets unnecessarily (all contain the same credentials)
- The restore path expects the predefined name `devworkspace-backup-registry-auth`
- Still wouldn't survive workspace deletion for restore use case

## Consequences

### Positive

1. **Backups survive workspace deletion**: Users can delete all workspaces and still restore from backups
2. **No cross-workspace interference**: Deleting one workspace no longer affects other workspaces' backup capabilities
3. **Simpler lifecycle**: No ownership tracking needed for a namespace-scoped singleton

### Negative

1. **Secret persists in empty namespaces**: If all workspaces are deleted and the user never restores, the auth secret remains until the namespace is deleted. This is a minor leak — one small secret per namespace.

### Neutral

1. **Existing secrets on upgraded clusters**: Secrets created by older DWO versions will retain their stale ownerReference until the next `CopySecret` call overwrites them (via `SyncObjectWithCluster`). In the worst case, one more GC event occurs before the fix takes effect.

## References

- `pkg/secrets/backup.go` — `CopySecret()` function
- `controllers/backupcronjob/rbac.go` — Per-workspace SA/RoleBinding (unchanged)
- `pkg/constants/metadata.go:204` — `DevWorkspaceBackupAuthSecretName`
32 changes: 32 additions & 0 deletions adr/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# ADR Maintenance Guide for AI Agents

This guide covers how to create and maintain ADRs. For when to create an ADR, see the root [AGENTS.md](../AGENTS.md).

## Creating a New ADR

1. Copy [TEMPLATE.md](TEMPLATE.md) to a new file named `YYYY-MM-DD-short-slug.md`
2. Use today's date and a descriptive lowercase slug with hyphens
3. Fill in all sections — the "Considered Alternatives" section is the most valuable part
4. Set status to **Proposed** if seeking feedback, or **Accepted** if the decision is final

## Naming Convention

`YYYY-MM-DD-short-description.md`

- Date prefix for chronological ordering
- Lowercase, hyphens only, no special characters
- Keep the slug short but descriptive

## Superseding an ADR

When a decision is replaced:

1. Update the old ADR's status line: `**Status**: Superseded by [New Title](YYYY-MM-DD-new-slug.md)`
2. Create the new ADR with its own context explaining why the previous decision changed
3. Reference the old ADR in the new one's Context section

## Format Rules

- Follow the structure in [TEMPLATE.md](TEMPLATE.md)
- End every file with a trailing newline
- Use the existing ADRs as examples for tone and level of detail
39 changes: 39 additions & 0 deletions adr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Architecture Decision Records (ADRs)

This directory contains Architecture Decision Records for the DevWorkspace Operator.

## What is an ADR?

An ADR captures a significant design decision along with its context, alternatives considered, and trade-offs accepted. ADRs preserve the **why** behind decisions — information that is lost in code, commit messages, and PR descriptions over time.

## When to Write an ADR

Write an ADR when your change involves any of these:

1. **You rejected an alternative** — There were 2+ reasonable approaches and you picked one. The code shows what you chose but not what you didn't.
2. **You accepted a trade-off** — Something got worse (performance, complexity, a minor leak) in exchange for something more important.
3. **You changed a resource lifecycle or ownership model** — Who creates, owns, or cleans up a Kubernetes resource.
4. **You changed an external contract** — API shape, CRD fields, Secret/ConfigMap naming conventions, image paths.

**Don't** write an ADR for: bug fixes, dependency bumps, refactors preserving behavior, test additions, docs updates, or performance optimizations with no trade-offs.

## Lifecycle

| Status | Meaning |
|--------|---------|
| **Proposed** | Under discussion, not yet accepted |
| **Accepted** | Decision is in effect |
| **Deprecated** | Decision is outdated but not yet replaced |
| **Superseded** | Replaced by a newer ADR (link to it in the status line) |

To supersede an ADR, update its status to `Superseded by [new ADR title](new-adr-file.md)` and create a new ADR.

## Naming Convention

Files use date-prefixed slugs: `YYYY-MM-DD-short-description.md`

Example: `2026-05-11-backup-auth-secret-lifecycle.md`

## Creating a New ADR

Copy [TEMPLATE.md](TEMPLATE.md) and fill in the sections. See existing ADRs for examples.
51 changes: 51 additions & 0 deletions adr/TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# [Title of Decision]

**Status**: Proposed | Accepted | Deprecated | Superseded by [title](link)
**Date**: YYYY-MM-DD
**Deciders**: DevWorkspace Operator maintainers
**Related Issue**: [ISSUE-ID](link) *(optional)*

## Context

What is the problem or need that prompted this decision? Include enough background for someone unfamiliar with the issue to understand why a decision was needed.

## Decision

What did we decide to do? Be specific about what changes and what stays the same.

## Considered Alternatives

### Alternative 1: [Name]

Brief description.

**Rejected because**:
- Reason 1
- Reason 2

### Alternative 2: [Name]

Brief description.

**Rejected because**:
- Reason 1
- Reason 2

## Consequences

### Positive

1. Benefit 1
2. Benefit 2

### Negative

1. Downside or accepted trade-off

### Neutral

1. Side effects that are neither positive nor negative

## References

- Links to relevant code, docs, PRs, or external resources
Loading