Skip to content

feat: incident detection automation#92

Merged
waltergalvao merged 9 commits intomainfrom
feat/incident-detection-implementation
Feb 27, 2026
Merged

feat: incident detection automation#92
waltergalvao merged 9 commits intomainfrom
feat/incident-detection-implementation

Conversation

@waltergalvao
Copy link
Copy Markdown
Contributor

@waltergalvao waltergalvao commented Feb 23, 2026

Greptile Summary

This PR implements automated incident detection by analyzing deployments for rollbacks, reverts, and hotfixes. The implementation detects incidents through three strategies: version-based rollback detection, GitHub revert PR pattern matching, and configurable hotfix pattern matching (via PR title, branch name, or labels).

Key changes:

  • Extended PullRequest schema with sourceBranch, body, and labels for incident detection
  • Implemented detection service using RE2 library for safe regex matching with 255-char limit
  • Created worker to process incident detection jobs triggered after deployment creation/linking
  • Refactored BullMQ queue initialization to support lazy loading and testing disablement
  • Added comprehensive integration tests with 1400+ lines covering all detection scenarios

Issues found:

  • Missing unique constraint on Incident model creates race condition risk for duplicate incidents (critical)
  • Hardcoded revert pattern only matches GitHub's default format, limiting detection coverage
  • Minor type safety concern with labels array casting

Test environment changes:

  • Changed integration test NODE_ENV from test to production - verify environment-specific logic still behaves correctly
  • Added Redis/Dragonfly container and disabled BullMQ during tests

Confidence Score: 3/5

  • This PR has solid implementation but requires addressing the database constraint issue before merge
  • Score reflects well-tested detection logic with proper security measures (RE2 for regex), but the missing unique constraint on Incident model creates a real risk of duplicate incidents under concurrent deployments. The hardcoded revert pattern and NODE_ENV change are lower priority concerns.
  • Pay close attention to apps/api/prisma/schema.prisma (missing unique constraint) and verify apps/api/package.json NODE_ENV change doesn't affect environment-specific behavior

Important Files Changed

Filename Overview
apps/api/prisma/schema.prisma Added sourceBranch, body, and labels columns to PullRequest model for incident detection. Missing unique constraint on Incident model creates race condition risk for duplicate incidents.
apps/api/src/app/incidents/services/incident-detection.service.ts New service implementing three incident detection methods (rollback, revert, hotfix). Uses safe regex with RE2 library. Hardcoded revert pattern and minor type safety concerns with labels.
apps/api/src/bull-mq/queues.ts Refactored queue initialization to be lazy-loaded via initQueues() function. Added new AUTOMATION_INCIDENT_DETECTION queue. Safe implementation with proper initialization guards.
apps/api/src/lib/string.ts New safeRegex() utility using RE2 library with 255-char limit and error handling. Prevents ReDoS attacks and handles invalid patterns gracefully.
apps/api/src/app/incidents/services/incident-detection.integration.test.ts Comprehensive integration tests covering all detection methods (rollback, revert, hotfix) with positive and negative test cases. Well-structured test setup.
apps/api/package.json Changed integration test NODE_ENV from test to production and added LOG_LEVEL=warn. Added re2 dependency for safe regex. Verify environment-specific logic still works correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start[Deployment Created/Linked] --> Queue[Add to AUTOMATION_INCIDENT_DETECTION Queue]
    Queue --> Worker[Worker: automation-incident-detection]
    Worker --> CheckAutomation{Automation<br/>Enabled?}
    CheckAutomation -->|No| End[End]
    CheckAutomation -->|Yes| CheckCustomer{Active<br/>Customer?}
    CheckCustomer -->|No| End
    CheckCustomer -->|Yes| FetchDeployment[Fetch Deployment<br/>with PRs]
    
    FetchDeployment --> DetectRollback{Detect<br/>Rollback?}
    DetectRollback -->|Yes| ValidateSame{Cause == Fix<br/>Deployment?}
    DetectRollback -->|No| DetectRevert{Detect<br/>Revert?}
    
    DetectRevert -->|Yes| ValidateSame
    DetectRevert -->|No| DetectHotfix{Detect<br/>Hotfix?}
    
    DetectHotfix -->|Yes| ValidateSame
    DetectHotfix -->|No| End
    
    ValidateSame -->|Yes| Error[Throw DataIntegrityException]
    ValidateSame -->|No| CheckDuplicate{Incident<br/>Exists?}
    
    CheckDuplicate -->|Yes| Log[Log: Already Exists]
    CheckDuplicate -->|No| CreateIncident[Create Incident Record]
    
    Log --> End
    CreateIncident --> End
    Error --> End
    
    subgraph Detection Methods
        RollbackLogic[Rollback: Find previous<br/>deployment with same version]
        RevertLogic[Revert: Match PR title<br/>with revert pattern]
        HotfixLogic[Hotfix: Match configurable<br/>regex patterns]
    end
Loading

Last reviewed commit: c9d7a08

@sweetr-dev sweetr-dev Bot added the large Large PR - Consider splitting up into smaller PRs to reduce risk and review time label Feb 23, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds sourceBranch, body, and labels to PullRequest; wires them through GitHub sync, tests, and seeds; implements incident-detection automation with detectors, incident creation, worker, and queue; enqueues detection jobs from deployment flows; switches BullMQ to lazy init and adds safeRegex utility.

Changes

Cohort / File(s) Summary
Database schema & migration
apps/api/prisma/schema.prisma, apps/api/prisma/migrations/.../migration.sql
Adds PullRequest columns: sourceBranch (text, default ""), body (text, default ""), labels (jsonb, default []).
GitHub PR sync
apps/api/src/app/github/services/github-pull-request.service.ts
Fetches headRefName, body, labels from GitHub and upserts into Prisma fields sourceBranch, body, labels.
Deployment flows
apps/api/src/app/deployment/services/deployment-create-from-merge.service.ts, apps/api/src/app/deployment/services/deployment-pr-linking.service.ts
Enqueues AUTOMATION_INCIDENT_DETECTION jobs (deploymentId, workspaceId) after deployment creation or PR linking.
Incident detection service & types
apps/api/src/app/incidents/services/incident-detection.service.ts, apps/api/src/app/incidents/services/incident-detection.types.ts
New automation entry handleIncidentDetectionAutomation({workspaceId, deploymentId}) with detectors: detectRollback, detectRevert, detectHotfix; creates incidents and deduplicates by cause/fix.
Worker & queue infra
apps/api/src/app/incidents/workers/automation-incident-detection.worker.ts, apps/api/src/bull-mq/queues.ts, apps/api/src/bull-mq/init-bull-mq.ts
Adds AUTOMATION_INCIDENT_DETECTION queue enum, introduces initQueues() lazy initialization, updates addJob helpers, and registers the new worker.
Bull board router
apps/api/src/bull-mq/bull-board.router.ts
Now calls initQueues() when constructing BullMQ adapters instead of using an eagerly-initialized queues constant.
Utilities
apps/api/src/lib/string.ts
Adds `safeRegex(pattern: string): RE2
Tests & seeds
apps/api/src/app/activity-events/services/work-log.unit.test.ts, apps/api/src/app/incidents/services/incident-detection.integration.test.ts, apps/api/test/seed/index.ts
Updated test builders to include new PR fields; added comprehensive incident-detection integration tests; extended seed helpers (seedPullRequest fields and new seedAutomation).
Config & CI
apps/api/package.json, apps/api/src/env.ts, .github/workflows/test-integration.yml
Added re2 dependency; adjusted integration test env and scripts; added BULLMQ_ENABLED flag and allowed empty values for several env vars.
Misc wiring
apps/api/src/bull-mq/bull-board.router.ts, apps/api/src/bull-mq/init-bull-mq.ts
Init flow now calls initQueues() early and guards startup with BULLMQ_ENABLED.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • sweetrdev
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: incident detection automation' clearly and concisely summarizes the main change—adding automated incident detection functionality with three detection methods (rollback, revert, hotfix).
Description check ✅ Passed The PR description comprehensively describes the changeset, detailing the incident detection automation feature with rollback, revert, and hotfix detection methods, database schema changes, BullMQ integration, and testing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/incident-detection-implementation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/api/src/app/incidents/services/incident-detection.service.ts">

<violation number="1" location="apps/api/src/app/incidents/services/incident-detection.service.ts:213">
P2: Invalid regex strings in hotfix settings will throw in `new RegExp(...)` and crash incident detection. Guard regex compilation (e.g., try/catch or pre-validate) so a bad pattern doesn't break the automation run.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts Outdated
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19 files reviewed, 10 comments

Edit Code Review Agent Settings | Greptile

Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts Outdated
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts Outdated
Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/api/src/app/deployment/services/deployment-pr-linking.service.ts (1)

108-165: ⚠️ Potential issue | 🟡 Minor

detectRollback may be bypassed for API-triggered deployments.

The AUTOMATION_INCIDENT_DETECTION job is only enqueued when filteredPullRequests is non-empty (the empty-list early-return at line 133 exits before line 162). For an API-triggered rollback, GitHub's compareCommits returns an empty commits array when the head is behind the base (status: "behind"changeType: ROLLBACK), so findMergedPullRequestsByCommitHashes finds no PRs, filteredPullRequests is empty, and the function returns before enqueuing the job. Since detectRollback operates solely on deployment versions (no PR dependency), this silently skips rollback detection for all API-triggered rollback deployments.

Consider unconditionally enqueueing the job at the end of handleDeploymentPullRequestAutoLinking after updateDeploymentChangeType is called (and skipping only for BASELINE), or at least enqueuing it even when no PRs are found so that detectRollback can still run.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/app/deployment/services/deployment-pr-linking.service.ts` around
lines 108 - 165, The code currently returns early when filteredPullRequests is
empty which prevents enqueuing the AUT0MATION_INCIDENT_DETECTION job and thus
skips detectRollback for API-triggered rollbacks; update
handleDeploymentPullRequestAutoLinking so that after computing changeType (or
after calling updateDeploymentChangeType) you always call
addJob(SweetQueue.AUTOMATION_INCIDENT_DETECTION, { deploymentId, workspaceId })
unless changeType === DeploymentChangeType.BASELINE (or other explicit skip),
instead of only calling addJob when filteredPullRequests is non-empty—ensure you
keep the existing linking logic (findMergedPullRequestsByCommitHashes,
filterPullRequestsBySubdirectory, linkPullRequestsToDeployment) but remove the
early return that prevents the job enqueue for empty filteredPullRequests so
detectRollback can run for API-triggered rollbacks.
🧹 Nitpick comments (1)
apps/api/src/app/github/services/github-pull-request.service.ts (1)

130-134: Consider using GITHUB_MAX_PAGE_LIMIT instead of the hardcoded 100 for consistency.

The rest of the file uses the GITHUB_MAX_PAGE_LIMIT constant (e.g., files(first: ${GITHUB_MAX_PAGE_LIMIT})). Using the same constant here keeps the limit in one place and avoids a silent mismatch if the constant ever changes.

♻️ Proposed change
-            labels(first: 100) {
+            labels(first: ${GITHUB_MAX_PAGE_LIMIT}) {
               nodes {
                 name
               }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/app/github/services/github-pull-request.service.ts` around lines
130 - 134, Replace the hardcoded labels(first: 100) limit with the shared
constant GITHUB_MAX_PAGE_LIMIT to match the rest of the file; locate the GraphQL
labels selection in github-pull-request.service.ts (the labels(...) block used
when building the pull request query) and change the literal 100 to use the
GITHUB_MAX_PAGE_LIMIT constant (same interpolation/style used for files(first:
${GITHUB_MAX_PAGE_LIMIT})), ensuring consistency and a single source of truth
for page limits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/api/src/app/incidents/services/incident-detection.service.ts`:
- Around line 175-191: In detectRevert, originalPr.deploymentEvents is currently
searched with de => de.deploymentId !== deployment.id which can pick a
deployment from the wrong environment; change the search to filter
deploymentEvents to those whose applicationId and environmentId match the
current deployment (i.e., de.applicationId === deployment.applicationId &&
de.environmentId === deployment.environmentId && de.deploymentId !==
deployment.id) before selecting a cause, and ensure the Prisma query that loads
originalPr.deploymentEvents includes applicationId and environmentId (or the
full deployment relation) so those fields are available for this check; update
the variable name deploymentLink accordingly.
- Around line 63-82: Add a DB-level uniqueness constraint on Incident
(@@unique([workspaceId, causeDeploymentId])) in the schema via a migration, then
change the non-atomic check-then-create in handleDeploymentIncidentDetection to
a safe operation: either use Prisma's upsert on
getPrisma(workspaceId).incident.upsert(...) keyed by
workspaceId+causeDeploymentId, or call
getPrisma(workspaceId).incident.create(...) and catch & ignore the
unique-constraint violation for causeDeploymentId; reference the existing
getPrisma(...).incident.findFirst/check and getPrisma(...).incident.create calls
to locate the code to replace and handle the conflict.
- Around line 208-228: The detectHotfix logic currently compiles user-supplied
regexes inside the pullRequests.some() callback, risking ReDoS and repeated
compilation; hoist and validate/compile prTitleRegEx, branchRegEx, and
prLabelRegEx once before the .some loop (e.g., create compiledTitleRegex,
compiledBranchRegex, compiledLabelRegex), wrap each new RegExp(...) in a
try/catch to reject or skip invalid/unsafe patterns, and if possible switch to a
safe engine like re2 or implement a timeout/bounded-execution wrapper for regex
tests; then use the precompiled regex variables inside the some() callback
instead of reconstructing them per PR.

---

Outside diff comments:
In `@apps/api/src/app/deployment/services/deployment-pr-linking.service.ts`:
- Around line 108-165: The code currently returns early when
filteredPullRequests is empty which prevents enqueuing the
AUT0MATION_INCIDENT_DETECTION job and thus skips detectRollback for
API-triggered rollbacks; update handleDeploymentPullRequestAutoLinking so that
after computing changeType (or after calling updateDeploymentChangeType) you
always call addJob(SweetQueue.AUTOMATION_INCIDENT_DETECTION, { deploymentId,
workspaceId }) unless changeType === DeploymentChangeType.BASELINE (or other
explicit skip), instead of only calling addJob when filteredPullRequests is
non-empty—ensure you keep the existing linking logic
(findMergedPullRequestsByCommitHashes, filterPullRequestsBySubdirectory,
linkPullRequestsToDeployment) but remove the early return that prevents the job
enqueue for empty filteredPullRequests so detectRollback can run for
API-triggered rollbacks.

---

Nitpick comments:
In `@apps/api/src/app/github/services/github-pull-request.service.ts`:
- Around line 130-134: Replace the hardcoded labels(first: 100) limit with the
shared constant GITHUB_MAX_PAGE_LIMIT to match the rest of the file; locate the
GraphQL labels selection in github-pull-request.service.ts (the labels(...)
block used when building the pull request query) and change the literal 100 to
use the GITHUB_MAX_PAGE_LIMIT constant (same interpolation/style used for
files(first: ${GITHUB_MAX_PAGE_LIMIT})), ensuring consistency and a single
source of truth for page limits.

Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts
Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts
Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@apps/api/prisma/migrations/20260223044415_alter_table_pull_requests_add_columns/migration.sql`:
- Around line 1-4: This PR includes a schema migration that should be split into
a dedicated migrations PR: remove the migration file that alters the
"PullRequest" table (the migration adding columns "body", "labels", and
"sourceBranch") from this feature branch/commit, then create a new branch only
containing the generated migration SQL (the ALTER TABLE ... ADD COLUMN
statements) and open a separate PR for that migration; once the migration PR is
merged and applied, rebase this feature branch onto main so the feature code
(which references PullRequest.body, PullRequest.labels,
PullRequest.sourceBranch) depends on the already-merged migration.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
apps/api/src/app/incidents/services/incident-detection.service.ts (1)

64-87: ⚠️ Potential issue | 🟡 Minor

Non-atomic dedup check still allows duplicate incident inserts under concurrency.

findFirst then create is race-prone when two workers process the same payload concurrently.

Use a DB unique constraint for the dedup key and switch to conflict-safe write (upsert or create+unique-violation handling).

#!/bin/bash
set -euo pipefail

echo "== Check current incident dedup logic =="
rg -n -C3 'incident\.findFirst|incident\.create|incident\.upsert' apps/api/src/app/incidents/services/incident-detection.service.ts

echo
echo "== Check Incident model uniqueness constraints in Prisma schema files =="
fd 'schema\.prisma$' -x sh -c '
  echo "-- {} --"
  rg -n -C8 "model Incident|@@unique|workspaceId|causeDeploymentId|fixDeploymentId" "{}"
'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/app/incidents/services/incident-detection.service.ts` around
lines 64 - 87, Current findFirst then create flow
(getPrisma(...).incident.findFirst / .create) is racy; add a DB-level unique
constraint on the dedup key (e.g. Incident model @@unique on workspaceId,
causeDeploymentId, fixDeploymentId) and change the service to a conflict-safe
write: either use Prisma's upsert on the incident model (referencing
incident.upsert) or perform create inside a try/catch and swallow/handle
unique-constraint errors (detect Prisma unique constraint error code) so
concurrent workers cannot insert duplicates; keep references to
getPrisma(workspaceId), incident.create, and incident.upsert when making the
change.
🧹 Nitpick comments (1)
apps/api/test/seed/index.ts (1)

330-341: Type settings as Prisma.InputJsonValue at the API boundary instead of casting.

Using settings?: object + as Prisma.InputJsonValue cast defers type validation to runtime and weakens compile-time safety. All callsites pass JSON-compatible objects, so the signature should declare the intended type directly.

♻️ Proposed fix
 export async function seedAutomation(
   ctx: SeedWorkspace,
   input: {
     type: AutomationType;
     enabled?: boolean;
-    settings?: object;
+    settings?: Prisma.InputJsonValue;
   }
 ): Promise<{ automationId: number }> {
   const automation = await getPrisma(ctx.workspaceId).automation.create({
     data: {
       type: input.type,
       enabled: input.enabled ?? true,
-      settings: (input.settings ?? {}) as Prisma.InputJsonValue,
+      settings: input.settings ?? {},
       workspaceId: ctx.workspaceId,
     },
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/test/seed/index.ts` around lines 330 - 341, Change the input
parameter type so settings is declared as Prisma.InputJsonValue instead of a
plain object and remove the runtime cast where the value is passed to
getPrisma(...).automation.create; update the function signature (the input param
in the function that calls getPrisma(...).automation.create) from settings?:
object to settings?: Prisma.InputJsonValue so TypeScript enforces
JSON-compatible types for settings and you can pass settings directly to the
create data (used in the automation.create call) without using "as
Prisma.InputJsonValue".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/api/package.json`:
- Line 23: The "test:integration" npm script currently sets NODE_ENV=production
causing integration env validation failures; update the "test:integration"
script (the npm script named "test:integration" in package.json) to set
NODE_ENV=test instead of NODE_ENV=production so integration runs use the test
environment and avoid production-only behavior during tests.

In `@apps/api/src/app/incidents/services/incident-detection.service.ts`:
- Around line 124-133: Update the query that finds causeDeployment in
detectRollback so it cannot pick deployments that occurred after the current
deployment: modify the where.deployedAt clause in
getPrisma(...).deployment.findFirst to include an upper bound (e.g., deployedAt:
{ gt: rolledBackTo.deployedAt, lte: deployment.deployedAt } or deployedAt: { gt:
rolledBackTo.deployedAt, lt: deployment.deployedAt }) so the result is between
rolledBackTo.deployedAt and the current deployment's deployedAt; keep the
existing id not: deployment.id and other filters intact.

In `@apps/api/src/lib/string.ts`:
- Around line 3-8: safeRegex currently only catches syntax errors but allows
catastrophic backtracking patterns; update safeRegex to run a complexity check
with safe-regex2 (or equivalent) before constructing the RegExp: import the
checker, call it on the incoming pattern and if it flags unsafe, logger.warn
with the pattern and return null; only if the checker passes, proceed to new
RegExp(pattern, "i") and return it. Keep the function name safeRegex and its
return behavior (RegExp|null) so callers in incident-detection service (where
.test() is used against PR titles/branches/labels) get null for unsafe patterns
and avoid ReDoS; ensure the case-insensitive flag is preserved when compiling.

---

Duplicate comments:
In `@apps/api/src/app/incidents/services/incident-detection.service.ts`:
- Around line 64-87: Current findFirst then create flow
(getPrisma(...).incident.findFirst / .create) is racy; add a DB-level unique
constraint on the dedup key (e.g. Incident model @@unique on workspaceId,
causeDeploymentId, fixDeploymentId) and change the service to a conflict-safe
write: either use Prisma's upsert on the incident model (referencing
incident.upsert) or perform create inside a try/catch and swallow/handle
unique-constraint errors (detect Prisma unique constraint error code) so
concurrent workers cannot insert duplicates; keep references to
getPrisma(workspaceId), incident.create, and incident.upsert when making the
change.

---

Nitpick comments:
In `@apps/api/test/seed/index.ts`:
- Around line 330-341: Change the input parameter type so settings is declared
as Prisma.InputJsonValue instead of a plain object and remove the runtime cast
where the value is passed to getPrisma(...).automation.create; update the
function signature (the input param in the function that calls
getPrisma(...).automation.create) from settings?: object to settings?:
Prisma.InputJsonValue so TypeScript enforces JSON-compatible types for settings
and you can pass settings directly to the create data (used in the
automation.create call) without using "as Prisma.InputJsonValue".

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17460ad and 0bb7b72.

📒 Files selected for processing (10)
  • apps/api/package.json
  • apps/api/src/app/deployment/services/deployment-pr-linking.service.ts
  • apps/api/src/app/github/services/github-pull-request.service.ts
  • apps/api/src/app/incidents/services/incident-detection.integration.test.ts
  • apps/api/src/app/incidents/services/incident-detection.service.ts
  • apps/api/src/bull-mq/bull-board.router.ts
  • apps/api/src/bull-mq/init-bull-mq.ts
  • apps/api/src/bull-mq/queues.ts
  • apps/api/src/lib/string.ts
  • apps/api/test/seed/index.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/api/src/app/deployment/services/deployment-pr-linking.service.ts

Comment thread apps/api/package.json Outdated
Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts
Comment thread apps/api/src/lib/string.ts Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/api/src/lib/string.ts">

<violation number="1" location="apps/api/src/lib/string.ts:11">
P2: safeRegex now builds RE2 without the previous case-insensitive flag, so hotfix detection becomes case-sensitive and can miss existing patterns that relied on `i`. Preserve the original `i` behavior by passing the flag when constructing RE2.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread apps/api/src/lib/string.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
apps/api/package.json (1)

23-23: ⚠️ Potential issue | 🔴 Critical

Set integration runs back to test environment.

Line 23 still runs integration tests with NODE_ENV=production, which can trigger prod-only paths and is consistent with the current integration pipeline failure.

🔧 Proposed fix
-    "test:integration": "cross-env LOG_LEVEL=warn NODE_ENV=production DATABASE_URL=postgresql://app_user:app_user@localhost:5433/sweetr_test SUPERUSER_DATABASE_URL=postgresql://postgres:postgres@localhost:5433/sweetr_test vitest run --project integration"
+    "test:integration": "cross-env LOG_LEVEL=warn NODE_ENV=test DATABASE_URL=postgresql://app_user:app_user@localhost:5433/sweetr_test SUPERUSER_DATABASE_URL=postgresql://postgres:postgres@localhost:5433/sweetr_test vitest run --project integration"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/package.json` at line 23, The "test:integration" npm script is
setting NODE_ENV=production which triggers prod-only behavior; update the script
(the "test:integration" entry in package.json) to use NODE_ENV=test (or remove
NODE_ENV so it defaults to test) instead of NODE_ENV=production so integration
tests run in the test environment; keep the other env vars (LOG_LEVEL,
DATABASE_URL, SUPERUSER_DATABASE_URL) and the vitest command unchanged.
🧹 Nitpick comments (2)
apps/api/src/app/incidents/services/incident-detection.service.ts (2)

64-87: Duplicate check may conflict with the schema's unique constraint.

The code checks for duplicates using both causeDeploymentId AND fixDeploymentId, but the schema has @@unique([workspaceId, causeDeploymentId]). If a unique constraint violation occurs (due to stricter DB constraint), this code path won't handle it gracefully—it would throw an unhandled exception.

Consider wrapping the create in a try-catch to handle constraint violations, or align the constraint with the code logic as noted in the schema review.

🛡️ Suggested defensive handling
+  try {
     await getPrisma(workspaceId).incident.create({
       data: {
         causeDeploymentId: result.causeDeploymentId,
         fixDeploymentId: result.fixDeploymentId,
         detectedAt: deployment.deployedAt,
         workspaceId,
       },
     });
+  } catch (error) {
+    if (error.code === 'P2002') {
+      logger.info("handleDeploymentIncidentDetection: Incident already exists (constraint)", {
+        causeDeploymentId: result.causeDeploymentId,
+      });
+      return;
+    }
+    throw error;
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/app/incidents/services/incident-detection.service.ts` around
lines 64 - 87, The current duplicate check uses incident.findFirst by
causeDeploymentId+fixDeploymentId but the DB enforces @@unique([workspaceId,
causeDeploymentId]), so incident.create(...) can still raise a unique-constraint
error; wrap the create call in a try-catch around
getPrisma(workspaceId).incident.create(...) (the block that writes
causeDeploymentId, fixDeploymentId, detectedAt: deployment.deployedAt,
workspaceId), catch PrismaClient known request errors
(Prisma.PrismaClientKnownRequestError) and handle code 'P2002' by logging a
clear message via logger.info (including causeDeploymentId, workspaceId and the
existingIncident context) and returning gracefully; for other errors rethrow or
log as error. Ensure you import/identify Prisma error class or check error.code
=== 'P2002' so the race-condition duplicate is handled safely.

236-239: Type cast for pr.labels could silently fail on malformed data.

The cast (pr.labels as string[]) assumes the JSON field always contains a string array. If the data is malformed (e.g., stored as an object), it would not throw but could cause unexpected behavior during label iteration.

Consider adding a runtime guard for additional safety, though this is low risk if data ingestion is controlled:

🛡️ Optional defensive check
     if (compiledLabelRegex) {
-      const labels = (pr.labels as string[]) ?? [];
+      const labels = Array.isArray(pr.labels) ? (pr.labels as string[]) : [];
       if (labels.some((label) => compiledLabelRegex.test(label))) return true;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/app/incidents/services/incident-detection.service.ts` around
lines 236 - 239, The current cast (pr.labels as string[]) can hide malformed
data; update the check around compiledLabelRegex to first validate pr.labels is
an array of strings (e.g., Array.isArray(pr.labels) and typeof each element ===
'string') before iterating, default to an empty array if validation fails, and
optionally log or metric a warning; apply this change where compiledLabelRegex
and pr.labels are used in incident-detection.service.ts so the labels variable
is built from a safe, validated array.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test-integration.yml:
- Around line 70-82: The CI is missing required env vars declared in env.ts
(specifically PORT and FRONTEND_URL) causing envsafe failures; update the
test-integration workflow to export values for PORT and FRONTEND_URL (e.g.,
PORT=3000 and FRONTEND_URL=http://localhost:3000) alongside the existing env
entries so envsafe finds defaults, and ensure any other required variables from
env.ts are also present in the env block to satisfy validation.

In `@apps/api/prisma/schema.prisma`:
- Line 772: The database unique constraint on the Incident model currently
enforces @@unique([workspaceId, causeDeploymentId]) but the incident detection
logic checks duplicates using (workspaceId, causeDeploymentId, fixDeploymentId);
update one side to match: either change the Prisma schema unique to
@@unique([workspaceId, causeDeploymentId, fixDeploymentId]) so it allows
multiple fixDeploymentId values for the same cause, or modify the
duplicate-check in the incident detection service (the code that queries by
workspaceId, causeDeploymentId, fixDeploymentId) to only check by workspaceId
and causeDeploymentId; pick the option that matches the intended domain rule and
make the corresponding change to the Incident model or the duplicate-checking
logic.

In `@apps/api/src/app/incidents/services/incident-detection.integration.test.ts`:
- Around line 1-25: The integration tests fail due to missing required
environment variables used by the test setup (triggered during
createTestContextWithGitProfile in incident-detection.integration.test.ts);
update the CI/test setup to provide the envsafe-required variables or modify the
test harness to inject safe defaults/mocks: either add the missing environment
variables to the CI pipeline (repository secrets or workflow env) or update the
test bootstrap (createTestContextWithGitProfile / test/integration-setup env
loader) to load a test-specific .env or provide fallback values so tests can run
in CI without manual env configuration.

---

Duplicate comments:
In `@apps/api/package.json`:
- Line 23: The "test:integration" npm script is setting NODE_ENV=production
which triggers prod-only behavior; update the script (the "test:integration"
entry in package.json) to use NODE_ENV=test (or remove NODE_ENV so it defaults
to test) instead of NODE_ENV=production so integration tests run in the test
environment; keep the other env vars (LOG_LEVEL, DATABASE_URL,
SUPERUSER_DATABASE_URL) and the vitest command unchanged.

---

Nitpick comments:
In `@apps/api/src/app/incidents/services/incident-detection.service.ts`:
- Around line 64-87: The current duplicate check uses incident.findFirst by
causeDeploymentId+fixDeploymentId but the DB enforces @@unique([workspaceId,
causeDeploymentId]), so incident.create(...) can still raise a unique-constraint
error; wrap the create call in a try-catch around
getPrisma(workspaceId).incident.create(...) (the block that writes
causeDeploymentId, fixDeploymentId, detectedAt: deployment.deployedAt,
workspaceId), catch PrismaClient known request errors
(Prisma.PrismaClientKnownRequestError) and handle code 'P2002' by logging a
clear message via logger.info (including causeDeploymentId, workspaceId and the
existingIncident context) and returning gracefully; for other errors rethrow or
log as error. Ensure you import/identify Prisma error class or check error.code
=== 'P2002' so the race-condition duplicate is handled safely.
- Around line 236-239: The current cast (pr.labels as string[]) can hide
malformed data; update the check around compiledLabelRegex to first validate
pr.labels is an array of strings (e.g., Array.isArray(pr.labels) and typeof each
element === 'string') before iterating, default to an empty array if validation
fails, and optionally log or metric a warning; apply this change where
compiledLabelRegex and pr.labels are used in incident-detection.service.ts so
the labels variable is built from a safe, validated array.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bb7b72 and 36057e4.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (8)
  • .github/workflows/test-integration.yml
  • apps/api/package.json
  • apps/api/prisma/schema.prisma
  • apps/api/src/app/incidents/services/incident-detection.integration.test.ts
  • apps/api/src/app/incidents/services/incident-detection.service.ts
  • apps/api/src/bull-mq/init-bull-mq.ts
  • apps/api/src/env.ts
  • apps/api/src/lib/string.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/api/src/bull-mq/init-bull-mq.ts

Comment thread .github/workflows/test-integration.yml
Comment thread apps/api/prisma/schema.prisma Outdated
Comment thread apps/api/prisma/schema.prisma Outdated
Comment thread apps/api/package.json
Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts
Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts
Comment thread apps/api/src/app/incidents/services/incident-detection.service.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 27, 2026

Additional Comments (1)

apps/api/prisma/schema.prisma
Missing unique constraint allows duplicate incidents from race conditions. If two incident detection jobs run concurrently for the same deployment, both can pass the findFirst check (line 64-70 in incident-detection.service.ts) and create duplicate incidents.

Add unique constraint:

model Incident {
  id Int @id @default(autoincrement())

  teamId Int?
  team   Team? @relation(fields: [teamId], references: [id], onDelete: SetNull)

  leaderId Int?
  leader   GitProfile? @relation(fields: [leaderId], references: [id], onDelete: SetNull)

  detectedAt DateTime
  resolvedAt DateTime?

  causeDeploymentId Int
  causeDeployment   Deployment @relation("CauseDeployment", fields: [causeDeploymentId], references: [id], onDelete: Cascade)

  fixDeploymentId Int?
  fixDeployment   Deployment? @relation("FixDeployment", fields: [fixDeploymentId], references: [id], onDelete: SetNull)

  postmortemUrl String?

  createdAt DateTime @default(now())
  updatedAt DateTime @default(now()) @updatedAt

  workspaceId Int
  workspace   Workspace @relation(fields: [workspaceId], references: [id], onDelete: Cascade)

  archivedAt DateTime?

  @@unique([workspaceId, causeDeploymentId, fixDeploymentId])
  @@index([causeDeploymentId])
  @@index([fixDeploymentId])
  @@index([teamId])
  @@index([leaderId])
  @@index([workspaceId, detectedAt])
}

coderabbitai[bot]
coderabbitai Bot previously approved these changes Feb 27, 2026
@waltergalvao waltergalvao merged commit e3dabeb into main Feb 27, 2026
10 of 12 checks passed
@waltergalvao waltergalvao deleted the feat/incident-detection-implementation branch February 27, 2026 07:50
): Promise<DetectionResult | null> => {
if (!settings.revert?.enabled) return null;

const revertPattern = /^Revert "(.+)"$/;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded revert pattern only matches GitHub's exact "Revert "..."" format. Won't detect reverts using conventional commit style (revert:), different casing, or manual revert messages. Unlike hotfix detection (which has configurable regex), revert detection lacks configurability.

if (compiledBranchRegex?.test(pr.sourceBranch)) return true;

if (compiledLabelRegex) {
const labels = (pr.labels as string[]) ?? [];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type cast to string[] without runtime validation. If pr.labels contains malformed data (non-array), .some() on line 251 will throw. Consider Array.isArray() check:

const labels = Array.isArray(pr.labels) ? pr.labels : [];

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 27, 2026

Additional Comments (1)

apps/api/prisma/schema.prisma
Missing unique constraint to prevent duplicate incidents. Race condition between duplicate check (line 75-82 in incident-detection.service.ts) and insert (line 91-98) could create duplicate incidents for the same cause/fix deployment pair.

Add:

  @@index([causeDeploymentId])
  @@index([fixDeploymentId])
  @@index([teamId])
  @@index([leaderId])
  @@index([workspaceId, detectedAt])
  @@unique([workspaceId, causeDeploymentId, fixDeploymentId])

@coderabbitai coderabbitai Bot mentioned this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

large Large PR - Consider splitting up into smaller PRs to reduce risk and review time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants