Skip to content

feat: add PostgreSQL replication health auto rebuild#40

Open
im0x0ing wants to merge 1 commit into
labring:fix/v0.9.3from
im0x0ing:feat/postgres-replication-health-guard
Open

feat: add PostgreSQL replication health auto rebuild#40
im0x0ing wants to merge 1 commit into
labring:fix/v0.9.3from
im0x0ing:feat/postgres-replication-health-guard

Conversation

@im0x0ing

Copy link
Copy Markdown
Collaborator

Check PostgreSQL leader and replica replication state with read-only SQL, report replication readiness on Component status, and create a guarded RebuildInstance OpsRequest after a replica failure persists beyond the configured window.

The health guard finds the PostgreSQL primary and running replicas from the Component workload, then verifies primary-side replication slots and replica-side WAL receiver state. It reports the result through the PostgreSQLReplicationReady condition so users can see whether replication is healthy without manually execing into pods.

When a rebuildable replica failure such as an inactive slot, missing WAL receiver, or non-streaming WAL receiver remains unhealthy long enough, the controller creates a RebuildInstance OpsRequest for that replica. The auto-rebuild path is protected by the failure window, cooldown, running OpsRequest detection, duplicate DAG checks, and leader-pod exclusion to avoid repeated or conflicting rebuilds.

Also add the OpsRequest RBAC marker and focused replication health tests.

Check PostgreSQL leader and replica replication state with read-only SQL, report replication readiness on Component status, and create a guarded RebuildInstance OpsRequest after persistent replica failure.

Also add the OpsRequest RBAC marker and focused replication health tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant