Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions internal/service/privatelinkendpoint/resource.go
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,20 @@ func resourceCreate(ctx context.Context, d *schema.ResourceData, meta any) diag.
}
}

privateEndpoint, _, err := connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()
if err != nil {
const maxRetries = 5
const retrySleep = 10 * time.Second
Comment on lines +152 to +153
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry configuration uses magic numbers (5 for maxRetries and 10 seconds for retrySleep) instead of following the pattern of defining constants at the package level. The existing code defines timeout constants at the package level (see lines 24-30 where delayAndMinTimeout is defined). Consider defining these retry parameters as package-level constants with descriptive names for better maintainability and consistency.

Copilot uses AI. Check for mistakes.
var privateEndpoint *admin.EndpointService
for attempt := range maxRetries {
var err error
privateEndpoint, _, err = connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()
if err == nil {
break
}
if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") && attempt < maxRetries-1 {
log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)
time.Sleep(retrySleep)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic doesn't respect context cancellation or timeouts. The loop uses time.Sleep(retrySleep) without checking if the context has been cancelled or if the configured timeout (1 hour, as defined in the resource timeouts at line 128) has been exceeded. This could result in the function continuing to retry even after the Terraform operation timeout has been reached, leading to a poor user experience and potential resource leaks.

Suggested change
time.Sleep(retrySleep)
select {
case <-ctx.Done():
return diag.FromErr(ctx.Err())
case <-time.After(retrySleep):
}

Copilot uses AI. Check for mistakes.
continue
}
return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, err))
Comment on lines +152 to 166
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This custom retry implementation is inconsistent with the established codebase pattern. The repository extensively uses retry.RetryContext from the Terraform SDK's helper package for retry logic (see internal/service/globalclusterconfig/resource_global_cluster_config.go:129, internal/service/streamconnection/state_transition.go:27, internal/service/team/resource_team.go:177, among others).

Using retry.RetryContext provides several advantages: it respects context cancellation, handles timeouts properly, integrates with Terraform's logging, and provides exponential backoff. The current implementation with time.Sleep doesn't check for context cancellation and uses a fixed delay, which could cause the operation to exceed configured timeouts.

Suggested change
const maxRetries = 5
const retrySleep = 10 * time.Second
var privateEndpoint *admin.EndpointService
for attempt := range maxRetries {
var err error
privateEndpoint, _, err = connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()
if err == nil {
break
}
if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") && attempt < maxRetries-1 {
log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)
time.Sleep(retrySleep)
continue
}
return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, err))
var privateEndpoint *admin.EndpointService
attempts := 0
retryErr := retry.RetryContext(ctx, d.Timeout(schema.TimeoutCreate), func() *retry.RetryError {
attempts++
pe, _, err := connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()
if err != nil {
if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") {
log.Printf("[DEBUG] Attempt %d: GCP private endpoint creation returned 'No Capacity', retrying...", attempts)
return retry.RetryableError(err)
}
return retry.NonRetryableError(err)
}
privateEndpoint = pe
return nil
})
if retryErr != nil {
return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, retryErr))

Copilot uses AI. Check for mistakes.
}
Comment on lines +152 to 167
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new retry logic for handling "No Capacity" errors lacks test coverage. The existing tests in resource_test.go don't cover this retry scenario. Given that this file has comprehensive test coverage for other scenarios (basicAWS, basicAzure, basicGCP, deleteOnCreateTimeout, etc.), the new retry behavior should also have test coverage to ensure it works correctly and doesn't introduce regressions.

Copilot uses AI. Check for mistakes.

Comment on lines +155 to 168
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential nil pointer dereference bug. If all retries are exhausted and the last error is a "No Capacity" error that passes the retry condition check (line 161), the loop will exit normally without breaking or returning. This means privateEndpoint will remain nil, causing a panic at line 169 when calling privateEndpoint.GetId().

The condition attempt < maxRetries-1 prevents retrying on the last attempt, but doesn't handle the case where the last attempt also returns a "No Capacity" error. When attempt equals 4 (the last iteration), the condition on line 161 will be false, so the code will skip both the retry logic (line 162-164) and the error return (line 166), allowing the loop to complete with privateEndpoint still nil.

Suggested change
for attempt := range maxRetries {
var err error
privateEndpoint, _, err = connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()
if err == nil {
break
}
if admin.IsErrorCode(err, "ATLAS_GENERAL_ERROR") && strings.Contains(err.Error(), "No Capacity") && attempt < maxRetries-1 {
log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)
time.Sleep(retrySleep)
continue
}
return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, err))
}
var lastErr error
for attempt := range maxRetries {
privateEndpoint, _, lastErr = connV2.PrivateEndpointServicesApi.CreatePrivateEndpointService(ctx, projectID, request).Execute()
if lastErr == nil {
break
}
if admin.IsErrorCode(lastErr, "ATLAS_GENERAL_ERROR") && strings.Contains(lastErr.Error(), "No Capacity") && attempt < maxRetries-1 {
log.Printf("[DEBUG] Attempt %d/%d: GCP private endpoint creation returned 'No Capacity', retrying in %s...", attempt+1, maxRetries, retrySleep)
time.Sleep(retrySleep)
continue
}
return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, lastErr))
}
if privateEndpoint == nil {
return diag.FromErr(fmt.Errorf(errorPrivateLinkEndpointsCreate, lastErr))
}

Copilot uses AI. Check for mistakes.
Expand Down
Loading