Skip to content

Add cleanup job support for pre-deletion hooks via Kubernetes Jobs#48

Merged
ullbergm merged 10 commits intomainfrom
copilot/support-custom-cleanup-scripts
Nov 16, 2025
Merged

Add cleanup job support for pre-deletion hooks via Kubernetes Jobs#48
ullbergm merged 10 commits intomainfrom
copilot/support-custom-cleanup-scripts

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Oct 29, 2025

Enables custom cleanup scripts to run before deleting expired lease objects. Users can configure Jobs via annotations to backup data, notify webhooks, or clean up related resources.

Changes

Core Implementation

  • New annotations (7 total): on-delete-job (ConfigMap/script reference), job-service-account, job-image, job-wait, job-timeout, job-ttl, job-backoff-limit
  • pkg/util/cleanup_job.go: Job creation, config parsing, and completion polling with timeout
  • pkg/controllers/lease_controller.go: Modified handleExpired() to check for cleanup config and execute jobs before deletion
    • Best-effort semantics: cleanup failures never block deletion
    • Supports sync (wait) and async (fire-and-forget) modes
  • RBAC: Added batch/jobs permissions to ClusterRole

Observability

  • Metrics: cleanup_jobs_created_total, cleanup_jobs_failed_total, cleanup_jobs_completed_total, cleanup_job_duration_seconds
  • Events: CleanupJobCreated, CleanupJobCompleted, CleanupJobFailed, CleanupJobTimeout

Environment Variables

Jobs receive 11 env vars: OBJECT_NAME, OBJECT_NAMESPACE, OBJECT_KIND, OBJECT_GROUP, OBJECT_VERSION, OBJECT_UID, OBJECT_RESOURCE_VERSION, LEASE_STARTED_AT, LEASE_EXPIRED_AT, OBJECT_LABELS (JSON), OBJECT_ANNOTATIONS (JSON)

Example Usage

apiVersion: startpunkt.ullberg.us/v1alpha2
kind: Application
metadata:
  name: demo-app
  annotations:
    object-lease-controller.ullberg.io/ttl: "2h"
    object-lease-controller.ullberg.io/on-delete-job: "cleanup-scripts/backup.sh"
    object-lease-controller.ullberg.io/job-service-account: "backup-sa"
    object-lease-controller.ullberg.io/job-image: "amazon/aws-cli:latest"
    object-lease-controller.ullberg.io/job-wait: "true"
    object-lease-controller.ullberg.io/job-timeout: "10m"
spec:
  name: Demo Application

ConfigMap script has access to all object metadata via env vars. Jobs auto-cleanup via ttlSecondsAfterFinished. See examples/cleanup/ for S3 backup, webhook, and multi-resource cleanup examples.

Testing

  • 10 unit tests covering config parsing, job creation, and completion polling
  • Zero regressions, CodeQL clean
Original prompt

This section details on the original issue you should resolve

<issue_title>Feature: Support custom cleanup scripts via Kubernetes Jobs before object deletion</issue_title>
<issue_description>## Summary

Add support for executing custom cleanup scripts before deleting expired lease objects. Scripts would run as Kubernetes Jobs with proper RBAC and secret access via ServiceAccount bindings.

Motivation

Users often need to perform cleanup actions before an object is deleted, such as:

  • Backing up data to external storage (S3, GCS, etc.)
  • Notifying external systems or webhooks
  • Cleaning up dependent resources not covered by owner references
  • Archiving logs or metrics
  • Deregistering from external registries or service meshes
  • Graceful shutdown procedures

Currently, when a lease expires, the object is immediately deleted without any opportunity for custom cleanup logic.

Proposed Solution

User Experience

1. Create a cleanup script in a ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: cleanup-scripts
  namespace: my-namespace
data:
  backup-to-s3.sh: |
    #!/bin/bash
    set -e
    echo "Backing up $OBJECT_NAME from namespace $OBJECT_NAMESPACE"
    
    # Fetch the object
    kubectl get $OBJECT_KIND $OBJECT_NAME -n $OBJECT_NAMESPACE -o yaml > /tmp/backup.yaml
    
    # Upload to S3 (credentials from ServiceAccount)
    aws s3 cp /tmp/backup.yaml s3://backups/$OBJECT_NAMESPACE/$OBJECT_KIND/$OBJECT_NAME-$(date +%s).yaml
    
    echo "Backup complete"

2. Create a ServiceAccount with necessary permissions and secrets

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cleanup-sa
  namespace: my-namespace
---
apiVersion: v1
kind: Secret
metadata:
  name: aws-credentials
  namespace: my-namespace
  annotations:
    kubernetes.io/service-account.name: cleanup-sa
type: Opaque
data:
  aws-access-key-id: <base64-encoded>
  aws-secret-access-key: <base64-encoded>
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cleanup-role
  namespace: my-namespace
rules:
- apiGroups: ["startpunkt.ullberg.us"]
  resources: ["applications"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["configmaps", "secrets", "pods"]
  verbs: ["get", "list", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cleanup-binding
  namespace: my-namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cleanup-role
subjects:
- kind: ServiceAccount
  name: cleanup-sa
  namespace: my-namespace

3. Annotate the resource with cleanup configuration

apiVersion: startpunkt.ullberg.us/v1alpha2
kind: Application
metadata:
  name: my-app
  namespace: my-namespace
  annotations:
    # Existing lease configuration
    object-lease-controller.ullberg.io/ttl: "2h"
    
    # New cleanup job configuration
    object-lease-controller.ullberg.io/on-delete-job: "cleanup-scripts/backup-to-s3.sh"
    object-lease-controller.ullberg.io/job-image: "amazon/aws-cli:latest"
    object-lease-controller.ullberg.io/job-service-account: "cleanup-sa"
    object-lease-controller.ullberg.io/job-wait: "true"
    object-lease-controller.ullberg.io/job-timeout: "5m"
    object-lease-controller.ullberg.io/job-ttl: "300"
spec:
  name: My Application
  url: https://example.com

Behavior

When a lease expires:

  1. Controller checks for on-delete-job annotation
  2. If present, creates a Kubernetes Job:
    • Mounts the script from the specified ConfigMap
    • Runs with the specified ServiceAccount (secrets bound via SA)
    • Passes object metadata as environment variables
    • Optionally waits for completion (configurable)
  3. After Job completes (or if no hook configured), deletes the object
  4. Job auto-cleans up after job-ttl seconds (via ttlSecondsAfterFinished)

New Annotations

Annotation Required Default Description
object-lease-controller.ullberg.io/on-delete-job Yes* - ConfigMap reference in format configmap-name/script-key
object-lease-controller.ullberg.io/job-service-account No default ServiceAccount to run the Job as (with bound secrets)
object-lease-controller.ullberg.io/job-image No bitnami/kubectl:latest Container image for running the script
object-lease-controller.ullberg.io/job-wait No false Wait for Job completion before deleting object
object-lease-controller.ullberg.io/job-timeout No 5m Maximum time to wait for Job completion
object-lease-controller.ullberg.io/job-ttl No 300 TTL in seconds for Job cleanup (ttlSecondsAfterFinished)
object-lease-controller.ullberg.io/job-backoff-limit No 3 Number of retries for failed Jobs

* Only if cleanup hook is desired

Environment Variables Available in Script

The cleanup script receives these environment variables:

  • OBJECT_NAME - Name of the object being deleted
  • `OBJEC...

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@github-actions github-actions Bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 29, 2025
@ullbergm ullbergm added the enhancement Enhancement of the code, not introducing new features. label Oct 29, 2025
@github-actions github-actions Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/source Changes to the source. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/docs Changes made to the documentation. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 29, 2025
Copilot AI changed the title [WIP] Add support for custom cleanup scripts via Kubernetes Jobs Add cleanup job support for pre-deletion hooks via Kubernetes Jobs Oct 29, 2025
Copilot AI requested a review from ullbergm October 29, 2025 12:42
@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 29, 2025

Codecov Report

❌ Patch coverage is 98.44358% with 4 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
cmd/main.go 80.95% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@ullbergm ullbergm requested a review from Copilot October 29, 2025 14:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds cleanup job functionality to the object-lease-controller, enabling custom scripts to run via Kubernetes Jobs before expired objects are deleted. This feature allows users to perform tasks like backing up data, notifying external systems, or cleaning up related resources.

  • Introduces a new CleanupJobConfig struct and utility functions for parsing cleanup job annotations and creating Kubernetes Jobs
  • Integrates cleanup job execution into the lease expiration workflow with optional synchronous/asynchronous modes
  • Adds new Prometheus metrics for tracking cleanup job lifecycle (created, completed, failed, duration)

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/util/cleanup_job.go Core implementation for parsing cleanup job config, creating Jobs, and waiting for completion
pkg/util/cleanup_job_test.go Comprehensive test coverage for cleanup job utilities
pkg/controllers/lease_controller.go Integrates cleanup job execution into lease expiration handling
pkg/metrics/metrics.go Adds four new Prometheus metrics for cleanup job monitoring
cmd/main.go Registers cleanup job annotations and adds batch/v1 scheme
object-lease-operator/helm-charts/leasecontroller/templates/role.yaml Adds RBAC permissions for Job operations
examples/cleanup/*.yaml Three complete example scenarios with documentation
README.md Documents cleanup job feature, annotations, and environment variables
go.mod, go.sum Dependency updates

Comment thread pkg/util/cleanup_job.go Outdated
Comment thread pkg/controllers/lease_controller.go
Comment thread examples/cleanup/cleanup-related-resources.yaml
Comment thread pkg/util/cleanup_job.go
@ullbergm ullbergm marked this pull request as ready for review October 29, 2025 20:00
@ullbergm ullbergm added the no-stale This is exempt from the stale bot. label Nov 13, 2025
@ullbergm ullbergm force-pushed the main branch 3 times, most recently from cb07a8d to 7f99612 Compare November 16, 2025 04:26
Signed-off-by: Magnus Ullberg <magnus@ullberg.us>
@ullbergm ullbergm force-pushed the copilot/support-custom-cleanup-scripts branch from e074391 to 98d6de8 Compare November 16, 2025 12:14
…tion

Signed-off-by: Magnus Ullberg <magnus@ullberg.us>
… for consistency

Signed-off-by: Magnus Ullberg <magnus@ullberg.us>
Signed-off-by: Magnus Ullberg <magnus@ullberg.us>
Signed-off-by: Magnus Ullberg <magnus@ullberg.us>
…allation

Signed-off-by: Magnus Ullberg <magnus@ullberg.us>
@github-actions github-actions Bot added the area/github Changes made in the github directory. label Nov 16, 2025
…missions

Signed-off-by: Magnus Ullberg <magnus@ullberg.us>
Signed-off-by: Magnus Ullberg <magnus@ullberg.us>
Signed-off-by: Magnus Ullberg <magnus@ullberg.us>
Signed-off-by: Magnus Ullberg <magnus@ullberg.us>
@ullbergm ullbergm merged commit 0bb705d into main Nov 16, 2025
7 checks passed
@ullbergm ullbergm deleted the copilot/support-custom-cleanup-scripts branch November 16, 2025 14:38
@github-actions
Copy link
Copy Markdown

Pull Request closed and locked due to lack of activity.
If you'd like to build on this closed PR, you can clone it using this method: https://stackoverflow.com/a/14969986
Then open a new PR, referencing this closed PR in your message.

@github-actions github-actions Bot locked and limited conversation to collaborators Nov 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area/docs Changes made to the documentation. area/github Changes made in the github directory. area/source Changes to the source. enhancement Enhancement of the code, not introducing new features. no-stale This is exempt from the stale bot. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Support custom cleanup scripts via Kubernetes Jobs before object deletion

3 participants