fix: Adds retry on GCP "No Capacity" error during endpoint creation for privatelinkendpoint by EspenAlbert · Pull Request #4221 · mongodb/terraform-provider-mongodbatlas

EspenAlbert · 2026-02-24T14:28:16Z

Description

Please include a summary of the fix/feature/change, including any relevant motivation and context.

Link to any related issue(s):

Type of change:

Bug fix (non-breaking change which fixes an issue). Please, add the "bug" label to the PR.
New feature (non-breaking change which adds functionality). Please, add the "enhancement" label to the PR. A migration guide must be created or updated if the new feature will go in a major version.
Breaking change (fix or feature that would cause existing functionality to not work as expected). Please, add the "breaking change" label to the PR. A migration guide must be created or updated.
This change requires a documentation update
Documentation fix/enhancement

Required Checklist:

I have signed the MongoDB CLA
I have read the contributing guides
I have checked that this change does not generate any credentials and that they are NOT accidentally logged anywhere.
I have added tests that prove my fix is effective or that my feature works per HashiCorp requirements
I have added any necessary documentation (if appropriate)
I have run make fmt and formatted my code
If changes include deprecations or removals I have added appropriate changelog entries.
If changes include removal or addition of 3rd party GitHub actions, I updated our internal document. Reach out to the APIx Integration slack channel to get access to the internal document.

Further comments

… endpoint creation

Copilot

Pull request overview

This pull request adds retry logic to handle GCP "No Capacity" errors during private endpoint creation. The change addresses transient capacity issues by implementing a retry mechanism with up to 5 attempts and 10-second delays between retries.

Changes:

Added custom retry loop with 5 attempts and 10-second sleep intervals for handling GCP "No Capacity" errors during private endpoint service creation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-24T14:31:37Z

+		}
+		if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") && attempt < maxRetries-1 {
+			log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)
+			time.Sleep(retrySleep)


The retry logic doesn't respect context cancellation or timeouts. The loop uses time.Sleep(retrySleep) without checking if the context has been cancelled or if the configured timeout (1 hour, as defined in the resource timeouts at line 128) has been exceeded. This could result in the function continuing to retry even after the Terraform operation timeout has been reached, leading to a poor user experience and potential resource leaks.

Suggested change

time.Sleep(retrySleep)

select {

case <-ctx.Done():

return diag.FromErr(ctx.Err())

case <-time.After(retrySleep):

}

Copilot · 2026-02-24T14:31:37Z

+	const maxRetries = 5
+	const retrySleep = 10 * time.Second
+	var privateEndpoint *admin.EndpointService
+	for attempt := range maxRetries {
+		var err error
+		privateEndpoint, _, err = connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()
+		if err == nil {
+			break
+		}
+		if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") && attempt < maxRetries-1 {
+			log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)
+			time.Sleep(retrySleep)
+			continue
+		}
 		return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, err))
 	}


The new retry logic for handling "No Capacity" errors lacks test coverage. The existing tests in resource_test.go don't cover this retry scenario. Given that this file has comprehensive test coverage for other scenarios (basicAWS, basicAzure, basicGCP, deleteOnCreateTimeout, etc.), the new retry behavior should also have test coverage to ensure it works correctly and doesn't introduce regressions.

Copilot · 2026-02-24T14:31:38Z

+	const maxRetries = 5
+	const retrySleep = 10 * time.Second


The retry configuration uses magic numbers (5 for maxRetries and 10 seconds for retrySleep) instead of following the pattern of defining constants at the package level. The existing code defines timeout constants at the package level (see lines 24-30 where delayAndMinTimeout is defined). Consider defining these retry parameters as package-level constants with descriptive names for better maintainability and consistency.

Copilot · 2026-02-24T14:31:38Z

+	for attempt := range maxRetries {
+		var err error
+		privateEndpoint, _, err = connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()
+		if err == nil {
+			break
+		}
+		if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") && attempt < maxRetries-1 {
+			log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)
+			time.Sleep(retrySleep)
+			continue
+		}
 		return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, err))
 	}



There's a potential nil pointer dereference bug. If all retries are exhausted and the last error is a "No Capacity" error that passes the retry condition check (line 161), the loop will exit normally without breaking or returning. This means privateEndpoint will remain nil, causing a panic at line 169 when calling privateEndpoint.GetId().

The condition attempt < maxRetries-1 prevents retrying on the last attempt, but doesn't handle the case where the last attempt also returns a "No Capacity" error. When attempt equals 4 (the last iteration), the condition on line 161 will be false, so the code will skip both the retry logic (line 162-164) and the error return (line 166), allowing the loop to complete with privateEndpoint still nil.

Suggested change

for attempt := range maxRetries {

var err error

privateEndpoint, _, err = connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()

if err == nil {

break

}

if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") && attempt < maxRetries-1 {

log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)

time.Sleep(retrySleep)

continue

}

return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, err))

}

var lastErr error

for attempt := range maxRetries {

privateEndpoint, _, lastErr = connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()

if lastErr == nil {

break

}

if admin.IsErrorCode(lastErr, "ATLAS_GENERAL_ERROR") && strings.Contains(lastErr.Error(), "No Capacity") && attempt < maxRetries-1 {

log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)

time.Sleep(retrySleep)

continue

}

return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, lastErr))

}

if privateEndpoint == nil {

return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, lastErr))

}

Copilot · 2026-02-24T14:31:38Z

+	const maxRetries = 5
+	const retrySleep = 10 * time.Second
+	var privateEndpoint *admin.EndpointService
+	for attempt := range maxRetries {
+		var err error
+		privateEndpoint, _, err = connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()
+		if err == nil {
+			break
+		}
+		if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") && attempt < maxRetries-1 {
+			log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)
+			time.Sleep(retrySleep)
+			continue
+		}
 		return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, err))


This custom retry implementation is inconsistent with the established codebase pattern. The repository extensively uses retry.RetryContext from the Terraform SDK's helper package for retry logic (see internal/service/globalclusterconfig/resource_global_cluster_config.go:129, internal/service/streamconnection/state_transition.go:27, internal/service/team/resource_team.go:177, among others).

Using retry.RetryContext provides several advantages: it respects context cancellation, handles timeouts properly, integrates with Terraform's logging, and provides exponential backoff. The current implementation with time.Sleep doesn't check for context cancellation and uses a fixed delay, which could cause the operation to exceed configured timeouts.

Suggested change

const maxRetries = 5

const retrySleep = 10 * time.Second

var privateEndpoint *admin.EndpointService

for attempt := range maxRetries {

var err error

privateEndpoint, _, err = connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()

if err == nil {

break

}

if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") && attempt < maxRetries-1 {

log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)

time.Sleep(retrySleep)

continue

}

return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, err))

var privateEndpoint *admin.EndpointService

attempts := 0

retryErr := retry.RetryContext(ctx, d.Timeout(schema.TimeoutCreate), func() *retry.RetryError {

attempts++

pe, _, err := connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()

if err != nil {

if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") {

log.Printf("[DEBUG] Attempt %d: GCP private endpoint creation returned 'No Capacity', retrying...", attempts)

return retry.RetryableError(err)

}

return retry.NonRetryableError(err)

}

privateEndpoint = pe

return nil

})

if retryErr != nil {

return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, retryErr))

svc-apix-Bot · 2026-03-02T00:40:15Z

This PR has gone 7 days without any activity and meets the project’s definition of "stale". This will be auto-closed if there is no new activity over the next 7 days. If the issue is still relevant and active, you can simply comment with a "bump" to keep it open, or add the label "not_stale". Thanks for keeping our repository healthy!

svc-apix-Bot · 2026-03-08T00:41:12Z

This PR has gone 7 days without any activity and meets the project’s definition of "stale". This will be auto-closed if there is no new activity over the next 7 days. If the issue is still relevant and active, you can simply comment with a "bump" to keep it open, or add the label "not_stale". Thanks for keeping our repository healthy!

svc-apix-Bot · 2026-03-15T00:46:28Z

This PR has gone 7 days without any activity and meets the project’s definition of "stale". This will be auto-closed if there is no new activity over the next 7 days. If the issue is still relevant and active, you can simply comment with a "bump" to keep it open, or add the label "not_stale". Thanks for keeping our repository healthy!

EspenAlbert · 2026-03-30T14:35:26Z

Still no action from upstream, keeping this open

fix(privatelinkendpoint): add retry on GCP "No Capacity" error during…

caefc0d

… endpoint creation

Copilot AI review requested due to automatic review settings February 24, 2026 14:28

Copilot started reviewing on behalf of EspenAlbert February 24, 2026 14:28 View session

EspenAlbert changed the title ~~fix(privatelinkendpoint): add retry on GCP "No Capacity" error during endpoint creation~~ fix: Adds retry on GCP "No Capacity" error during endpoint creation for privatelinkendpoint Feb 24, 2026

github-actions Bot added the bug label Feb 24, 2026

Copilot AI reviewed Feb 24, 2026

View reviewed changes

svc-apix-Bot added stale and removed stale labels Mar 2, 2026

svc-apix-Bot added stale and removed stale labels Mar 8, 2026

svc-apix-Bot added stale and removed stale labels Mar 15, 2026

EspenAlbert added the not_stale Not stale issue or PR label Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Adds retry on GCP "No Capacity" error during endpoint creation for privatelinkendpoint#4221

fix: Adds retry on GCP "No Capacity" error during endpoint creation for privatelinkendpoint#4221
EspenAlbert wants to merge 1 commit intomasterfrom
gcp-retry

EspenAlbert commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

svc-apix-Bot commented Mar 2, 2026

Uh oh!

svc-apix-Bot commented Mar 8, 2026

Uh oh!

svc-apix-Bot commented Mar 15, 2026

Uh oh!

EspenAlbert commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-			time.Sleep(retrySleep)
+			select {
+			case <-ctx.Done():
+				return diag.FromErr(ctx.Err())
+			case <-time.After(retrySleep):
+			}

Conversation

EspenAlbert commented Feb 24, 2026

Description

Type of change:

Required Checklist:

Further comments

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

svc-apix-Bot commented Mar 2, 2026

Uh oh!

svc-apix-Bot commented Mar 8, 2026

Uh oh!

svc-apix-Bot commented Mar 15, 2026

Uh oh!

EspenAlbert commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants