Skip to content

WIP - Adding lseval job and added wait for testing#79585

Open
JoaoFula wants to merge 1 commit into
openshift:mainfrom
JoaoFula:ols-gpu-for-rhoai
Open

WIP - Adding lseval job and added wait for testing#79585
JoaoFula wants to merge 1 commit into
openshift:mainfrom
JoaoFula:ols-gpu-for-rhoai

Conversation

@JoaoFula
Copy link
Copy Markdown
Contributor

@JoaoFula JoaoFula commented May 21, 2026

Summary by CodeRabbit

This PR updates the OpenShift CI configuration for the lightspeed-service repository by adding a new periodic test job and a short wait step for testing.

What changed in practical terms:

  • Adds a new periodic CI job "lseval-periodic" under ci-operator config for the lightspeed-service component (targets ocp 4.20).
  • The job schedules weekly runs (Sundays at 10:00 UTC) on GPU-enabled infrastructure (amd64, AWS region us-east-2, variant: gpu).
  • Cluster claim timeout is 2h; the job also sets an environment override TIMEOUT: +6 hours.
  • Inserts a wait step before the test run.
  • The job runs tests/scripts/test-lseval-periodic.sh using the lightspeed-service-api image (env OLS_IMAGE) built from src.
  • Only the openai-apitoken credential is mounted at /var/run/openai for the job.
  • Job resource requests include cpu: 100m.

This change enables a weekly GPU-based evaluation run for Lightspeed Service model evaluation in the existing OpenShift CI pipeline.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Warning

Rate limit exceeded

@JoaoFula has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 7 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 49c06940-9fb8-455a-8aad-4a92b24940c9

📥 Commits

Reviewing files that changed from the base of the PR and between 405b3ee and d92ae84.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/lightspeed-service/openshift-lightspeed-service-main-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (1)
  • ci-operator/config/openshift/lightspeed-service/openshift-lightspeed-service-main__4.20.yaml

Walkthrough

Adds a new periodic OpenShift CI test job lseval-periodic to the lightspeed-service 4.20 configuration that runs weekly, targets an amd64 AWS GPU cluster in us-east-2, sets TIMEOUT: +6 hours, executes tests/scripts/test-lseval-periodic.sh, mounts only the openai-apitoken credential at /var/run/openai, depends on lightspeed-service-api (OLS_IMAGE), and requests 100m CPU.

Changes

LSEval Periodic Test Job

Layer / File(s) Summary
LSEval periodic test job configuration
ci-operator/config/openshift/lightspeed-service/openshift-lightspeed-service-main__4.20.yaml
New lseval-periodic test job added to run weekly (0 10 * * 0) on OCP 4.20 with GPU variant (amd64, aws, region us-east-2), sets TIMEOUT: +6 hours environment override, executes tests/scripts/test-lseval-periodic.sh with OLS_IMAGE dependency (lightspeed-service-api), mounts only the openai-apitoken credential to /var/run/openai, and requests 100m CPU.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels: rehearsals-ack

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title references 'Adding lseval job' which matches the main change (new lseval-periodic test job), but 'added wait for testing' is vague and not clearly reflected in the changeset summary. Clarify the title to focus on the primary change: 'Add lseval-periodic CI job for lightspeed-service' or similar. Remove vague references and explain what 'wait for testing' means.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed This PR contains only CI operator YAML configuration changes with no Ginkgo test code. The check about stable test names is not applicable to CI configuration files.
Test Structure And Quality ✅ Passed Check not applicable: PR adds CI YAML config, not Ginkgo test code. Custom check requires reviewing Go Ginkgo tests, which doesn't exist in this PR.
Microshift Test Compatibility ✅ Passed This PR only adds CI configuration (YAML), not Ginkgo e2e test code. The custom check applies only when new Ginkgo tests (It/Describe/Context/When) are added, which is not the case here.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR adds CI YAML job configuration only, not Ginkgo e2e tests. The custom check applies only when new Ginkgo test declarations are added; no such tests are present here.
Topology-Aware Scheduling Compatibility ✅ Passed This change adds only a CI test job configuration, not deployment manifests or operator code. No scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed PR only modifies CI configuration YAML. OTE Stdout Contract applies to process-level test code, not CI config files. No source code violations present.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR adds CI job configuration, not Ginkgo e2e tests. The custom check applies only to Ginkgo test additions (It(), Describe(), etc.), which are not present in this CI configuration change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: JoaoFula
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from joshuawilson and tisnik May 21, 2026 08:37
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ci-operator/config/openshift/lightspeed-service/openshift-lightspeed-service-main__4.20.yaml (1)

162-162: ⚡ Quick win

Clarify the purpose of the - ref: wait step.

In ci-operator/config/openshift/lightspeed-service/openshift-lightspeed-service-main__4.20.yaml (line 162), - ref: wait maps to ci-operator/step-registry/wait/wait-ref.yaml, which is a time-based wait gate (wait-commands.sh) that proceeds only after the TIMEOUT is reached—here overridden to TIMEOUT: +6 hours.

Add a short config comment explaining why this additional 6-hour wait is required for lseval-periodic (and whether/when it can be reduced/removed).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/config/openshift/lightspeed-service/openshift-lightspeed-service-main__4.20.yaml`
at line 162, Add a short inline comment next to the `- ref: wait` step (the wait
step that maps to `ci-operator/step-registry/wait/wait-ref.yaml`) explaining
that this is a time-based gate using `wait-commands.sh` and that `TIMEOUT` has
been overridden to `+6 hours` specifically to allow `lseval-periodic` to
complete long-running evaluation jobs; also state under what conditions this
duration can be reduced or removed (e.g., when `lseval` runtime improvements are
implemented or flakiness/queueing is reduced) and include who to contact or a
JIRA/issue reference for future removal/tuning.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@ci-operator/config/openshift/lightspeed-service/openshift-lightspeed-service-main__4.20.yaml`:
- Line 162: Add a short inline comment next to the `- ref: wait` step (the wait
step that maps to `ci-operator/step-registry/wait/wait-ref.yaml`) explaining
that this is a time-based gate using `wait-commands.sh` and that `TIMEOUT` has
been overridden to `+6 hours` specifically to allow `lseval-periodic` to
complete long-running evaluation jobs; also state under what conditions this
duration can be reduced or removed (e.g., when `lseval` runtime improvements are
implemented or flakiness/queueing is reduced) and include who to contact or a
JIRA/issue reference for future removal/tuning.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0437aa5f-724f-4e13-b5b0-c1ea1ef4a463

📥 Commits

Reviewing files that changed from the base of the PR and between 01dcbd8 and 10e3054.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/lightspeed-service/openshift-lightspeed-service-main-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (1)
  • ci-operator/config/openshift/lightspeed-service/openshift-lightspeed-service-main__4.20.yaml

@JoaoFula
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-lightspeed-service-main-4.20-lseval-periodic

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@JoaoFula: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@JoaoFula JoaoFula force-pushed the ols-gpu-for-rhoai branch from 10e3054 to 405b3ee Compare May 21, 2026 08:54
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/config/openshift/lightspeed-service/openshift-lightspeed-service-main__4.20.yaml`:
- Around line 152-157: The cluster claim lease (`timeout:` under the cluster
claim block) is too short (2h00m0s) for the test's runtime (`env.TIMEOUT` is +6
hours); update the `timeout:` value to at least 7h0m0s (6h test + 1h buffer) so
the cluster lease covers the full `TIMEOUT` duration and teardown overhead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 71e9c78c-578f-461e-904b-2e0628a69500

📥 Commits

Reviewing files that changed from the base of the PR and between 10e3054 and 405b3ee.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/lightspeed-service/openshift-lightspeed-service-main-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (1)
  • ci-operator/config/openshift/lightspeed-service/openshift-lightspeed-service-main__4.20.yaml

Adding lseval job and added wait for testing

Adding lseval job and added wait for testing
@JoaoFula JoaoFula force-pushed the ols-gpu-for-rhoai branch from 405b3ee to d92ae84 Compare May 21, 2026 09:03
@JoaoFula
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-lightspeed-service-main-4.20-lseval-periodic

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@JoaoFula: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@JoaoFula: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-lightspeed-service-main-4.20-lseval-periodic d92ae84 link unknown /pj-rehearse periodic-ci-openshift-lightspeed-service-main-4.20-lseval-periodic
ci/prow/ci-operator-config-metadata d92ae84 link true /test ci-operator-config-metadata

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant