Skip to content

Long-running Azure test reliability and test-code ref override#11932

Draft
brooke-hamilton wants to merge 5 commits into
mainfrom
brooke-hamilton/fix-lrt-release
Draft

Long-running Azure test reliability and test-code ref override#11932
brooke-hamilton wants to merge 5 commits into
mainfrom
brooke-hamilton/fix-lrt-release

Conversation

@brooke-hamilton
Copy link
Copy Markdown
Member

Description

Improvements to the long-running Azure test workflow to prevent timeouts and to allow test code to be patched without cutting a product patch release.

Three independent fixes:

  1. Optional test_code_ref override for checkout-release-codebase.sh and the long-running Azure workflow. When set (via positional arg, TEST_CODE_REF env var, or workflow_dispatch input), the script clones the supplied git ref into current_release/ instead of the tag matching the installed CLI. The product under test is still the installed release; only the on-disk test/infrastructure code changes. This lets us iterate on test fixes without releasing a new patch.
  2. Pre-restore the UDT usertypealpha-recipe.bicep types in the workflow. Restoring this lazily later in the run was racing with Azure CLI token expiry during the long-running tests.
  3. Robust namespace and DeploymentTemplate teardown in the Kubernetes/Flux noncloud tests.
    • deleteNamespace now waits for normal deletion with assert.Eventually and falls back to clearing namespace finalizers (test-only escape hatch) when a namespace is stuck Terminating.
    • testFluxIntegration now deletes DeploymentTemplate resources first and waits for the Radius finalizer to drain before tearing down namespaces, so namespaces are not stuck on radapp.io/deployment-template-finalizer and radius-rp does not recreate application namespaces while their backing Applications.Core resources still exist.
    • Teardown uses assert.* instead of require.* so one stuck resource does not skip cleanup of the rest.

Type of change

  • This pull request is a minor refactor, code cleanup, test improvement, or other maintenance task and doesn't change the functionality of Radius (issue link optional).

Fixes: #issue_number

Contributor checklist

Please verify that the PR meets the following requirements, where applicable:

  • An overview of proposed schema changes is included in a linked GitHub issue.
    • Yes
    • Not applicable
  • A design document is added or updated under eng/design-notes/ in this repository, if new APIs are being introduced.
    • Yes
    • Not applicable
  • The design document has been reviewed and approved by Radius maintainers/approvers.
    • Yes
    • Not applicable
  • A PR for resource-types-contrib is created, if resource types or recipes are affected by the changes in this PR.
    • Yes
    • Not applicable
  • A PR for dashboard is created, if the Radius Dashboard is affected by the changes in this PR.
    • Yes
    • Not applicable
  • A PR for the documentation repository is created, if the changes in this PR affect the documentation or any user facing updates are made.
    • Yes
    • Not applicable

Copilot AI review requested due to automatic review settings May 18, 2026 21:37
@brooke-hamilton brooke-hamilton requested review from a team as code owners May 18, 2026 21:37
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@brooke-hamilton brooke-hamilton marked this pull request as draft May 18, 2026 21:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves long-running Azure test reliability by allowing test-code checkout overrides, pre-restoring UDT Bicep types, and making Kubernetes/Flux test teardown more robust.

Changes:

  • Adds optional test_code_ref support to the release-code checkout script and workflow.
  • Restores UDT testresources Bicep artifacts before functional tests.
  • Updates Flux/DeploymentTemplate cleanup to delete templates before namespaces and add namespace finalizer fallback handling.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
.github/scripts/checkout-release-codebase.sh Adds configurable release/test-code ref checkout and workflow outputs.
.github/workflows/long-running-azure.yaml Adds manual input for test-code ref and restores UDT Bicep artifacts.
test/functional-portable/kubernetes/noncloud/deploymenttemplate_test.go Enhances namespace deletion waiting and finalizer fallback cleanup.
test/functional-portable/kubernetes/noncloud/flux_test.go Tracks and deletes DeploymentTemplates before namespace teardown.

Comment thread .github/scripts/checkout-release-codebase.sh
Comment thread test/functional-portable/kubernetes/noncloud/deploymenttemplate_test.go Outdated
Comment thread test/functional-portable/kubernetes/noncloud/flux_test.go
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

Unit Tests

    2 files  ±0    423 suites  ±0   7m 15s ⏱️ -8s
5 137 tests ±0  5 135 ✅ ±0  2 💤 ±0  0 ❌ ±0 
6 175 runs  ±0  6 173 ✅ ±0  2 💤 ±0  0 ❌ ±0 

Results for commit 6d2612a. ± Comparison against base commit 9017ff2.

♻️ This comment has been updated with latest results.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.69%. Comparing base (9017ff2) to head (6d2612a).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #11932   +/-   ##
=======================================
  Coverage   51.69%   51.69%           
=======================================
  Files         724      724           
  Lines       45508    45508           
=======================================
  Hits        23525    23525           
  Misses      19763    19763           
  Partials     2220     2220           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI and others added 2 commits May 18, 2026 17:58
fix: add testresources Bicep extension pre-restore to prevent token expiry failures

The long-running-azure.yaml workflow pre-restores Bicep extension artifacts
before running functional tests, but was missing the testresources extension.
The comment explicitly noted all three extensions needed to be restored
(Radius, AWS, UDT testresources), but only two were implemented.

During long-running tests (~1hr), Bicep tries to restore the testresources
extension from the Azure Container Registry using Azure CLI credentials.
The OIDC token from the "Re-login to Azure" step only lasts 5 minutes, so
any build happening after that fails with:
  AADSTS700024: Client assertion is not within its valid time range

By pre-restoring the testresources extension (br:crradfunctest9lhu.azurecr.io/
testresources:latest) while the Azure CLI token is still valid, Bicep caches
the artifact locally. Subsequent builds use the cached version without
re-authenticating to the registry.

Agent-Logs-Url: https://github.com/radius-project/radius/sessions/98d8a698-3cbb-44ef-a1d8-f5315fb82e4b

Co-authored-by: brooke-hamilton <45323234+brooke-hamilton@users.noreply.github.com>

improve flux text namespace delete logic

Signed-off-by: Brooke Hamilton <45323234+brooke-hamilton@users.noreply.github.com>

improve flux test

Signed-off-by: Brooke Hamilton <45323234+brooke-hamilton@users.noreply.github.com>

run tests on selected branch

Signed-off-by: Brooke Hamilton <45323234+brooke-hamilton@users.noreply.github.com>
(cherry picked from commit 0ca765527959b5ef3bdb28aa12e4638a2f8de2a9)
Signed-off-by: Brooke Hamilton <45323234+brooke-hamilton@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread .github/workflows/long-running-azure.yaml
Comment thread test/functional-portable/kubernetes/noncloud/flux_test.go
Comment thread .github/workflows/long-running-azure.yaml Outdated
Comment thread test/functional-portable/kubernetes/noncloud/deploymenttemplate_test.go Outdated
Comment thread test/functional-portable/kubernetes/noncloud/deploymenttemplate_test.go Outdated
if len(ns.Spec.Finalizers) > 0 {
t.Logf("Namespace %s stuck terminating with finalizers: %v; clearing them", namespace, ns.Spec.Finalizers)
ns.Spec.Finalizers = nil
_, err = opts.K8sClient.CoreV1().Namespaces().Finalize(ctx, ns, metav1.UpdateOptions{})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a finalizer is a symptom that there are still hanging resources. IMO we should never force delete finalizers in this case, since it can result in the next run having unexpected behavior (existing but untracked resources).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


// Track DeploymentTemplates created across steps so we can delete them and wait for
// the Radius finalizer to drain before tearing down namespaces.
var deploymentTemplates []*radappiov1alpha3.DeploymentTemplate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this slice being populated?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Populated inside the per-step loop at test/functional-portable/kubernetes/noncloud/flux_test.go#L359-L361:

deploymentTemplate, err := waitForDeploymentTemplateToBeReadyWithGeneration(t, ctx, types.NamespacedName{Name: name, Namespace: namespace}, stepNumber, opts.Client)
require.NoError(t, err)
deploymentTemplates = append(deploymentTemplates, deploymentTemplate)

For every step, after each DeploymentTemplate becomes Ready we append it to the slice the t.Cleanup closes over, so teardown deletes every DT created across all steps before tearing down namespaces.

Comment thread test/functional-portable/kubernetes/noncloud/flux_test.go
Signed-off-by: Brooke Hamilton <45323234+brooke-hamilton@users.noreply.github.com>
Signed-off-by: Brooke Hamilton <45323234+brooke-hamilton@users.noreply.github.com>
@radius-functional-tests
Copy link
Copy Markdown

radius-functional-tests Bot commented May 18, 2026

Radius functional test overview

🔍 Go to test action run

Click here to see the test run details
Name Value
Repository radius-project/radius
Commit ref 6d2612a
Unique ID func471ca7d5d2
Image tag pr-func471ca7d5d2
  • gotestsum 1.13.0
  • KinD: v0.29.0
  • Dapr: 1.14.4
  • Azure KeyVault CSI driver: 1.4.2
  • Azure Workload identity webhook: 1.3.0
  • Bicep recipe location ghcr.io/radius-project/dev/test/testrecipes/test-bicep-recipes/<name>:pr-func471ca7d5d2
  • Terraform recipe location http://tf-module-server.radius-test-tf-module-server.svc.cluster.local/<name>.zip (in cluster)
  • applications-rp test image location: ghcr.io/radius-project/dev/applications-rp:pr-func471ca7d5d2
  • dynamic-rp test image location: ghcr.io/radius-project/dev/dynamic-rp:pr-func471ca7d5d2
  • controller test image location: ghcr.io/radius-project/dev/controller:pr-func471ca7d5d2
  • ucp test image location: ghcr.io/radius-project/dev/ucpd:pr-func471ca7d5d2
  • deployment-engine test image location: ghcr.io/radius-project/deployment-engine:latest

Test Status

⌛ Building Radius and pushing container images for functional tests...
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ Recipe publishing succeeded
✅ Recipe publishing succeeded
⌛ Starting corerp-cloud functional tests...
⌛ Starting ucp-cloud functional tests...
✅ ucp-cloud functional tests succeeded
✅ ucp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded

@DariuszPorowski
Copy link
Copy Markdown
Member

@brooke-hamilton @willdavsmith any progress on this fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants