USHIFT-6979: Add PCP metrics collection to test scenarios#6658
USHIFT-6979: Add PCP metrics collection to test scenarios#6658pacevedom wants to merge 4 commits into
Conversation
|
@pacevedom: This pull request references USHIFT-6902 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
/test ? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pacevedom The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds optional PCP collection to test scenarios (gated by SKIP_PCP). New helper functions in ChangesPCP collection infrastructure
Images, blueprints, and CI build flag
sequenceDiagram
participant Host as "Scenario script (Host)"
participant VM as "Test VM"
participant QGA as "qemu-guest-agent"
participant Artifacts as "Host scenario dir (pcp/)"
Host->>Host: check SKIP_PCP
alt SKIP_PCP=false
Host->>VM: read VM ip property
alt VM has IP (online)
Host->>VM: stop pmlogger, stop pmlogger via SSH
Host->>VM: tar /var/log/pcp/pmlogger -> /tmp/pcp-archives.tar.gz
VM->>Artifacts: scp /tmp/pcp-archives.tar.gz -> ${vmdir}/pcp/
else VM offline
Host->>QGA: request guest command to stop pmlogger and create archive
QGA->>Host: make archive available
Host->>Artifacts: download pcp-archives.tar.gz into ${vmdir}/pcp/
end
end
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/test e2e-aws-tests-periodic |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/bin/scenario.sh`:
- Around line 296-323: The dnf install in start_pcp_on_all_vms can hang; update
the command executed by run_command_on_vm (inside function start_pcp_on_all_vms)
to wrap the package installation with a timeout (e.g., use the timeout utility
such as timeout 300s sudo dnf install -y pcp pcp-zeroconf) so a stalled install
will abort and return non-zero; ensure the wrapped command replaces the existing
"sudo dnf install -y pcp pcp-zeroconf" invocation so failures/timeouts propagate
and trigger the existing warning for that VM.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 9595bcb4-1d7c-476f-82df-5e49a6354b46
📒 Files selected for processing (1)
test/bin/scenario.sh
|
/test e2e-aws-tests-periodic |
1 similar comment
|
/test e2e-aws-tests-periodic |
|
/test e2e-aws-tests-periodic |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/bin/scenario.sh`:
- Around line 353-355: The tar invocation currently swallows failures via "||
true"; change the run_command_on_vm "${vmname}" "sudo tar czf
/tmp/pcp-archives.tar.gz -C /var/log/pcp/pmlogger ." call to capture its exit
status, and if non-zero emit a clear warning (e.g., printf or process logger)
including the vmname and the tar command output, mark/record that PCP packaging
for that VM failed (set a variable or append the vmname to a failures list) and
continue rather than silently ignoring the error; use the run_command_on_vm
call, vmname, and the tar command string to locate where to implement this
logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 1e7ed5a3-3544-448c-822e-3f77b1e0f49e
📒 Files selected for processing (8)
test/bin/scenario.shtest/image-blueprints-bootc/el10/layer1-base/group1/rhel102-test-agent.containerfiletest/image-blueprints-bootc/el9/layer1-base/group1/rhel96-test-agent.containerfiletest/image-blueprints-bootc/el9/layer1-base/group1/rhel98-test-agent.containerfiletest/image-blueprints-bootc/upstream/group1/cos10-test-agent.containerfiletest/image-blueprints-bootc/upstream/group1/cos9-test-agent.containerfiletest/image-blueprints/layer1-base/group1/rhel96.tomltest/image-blueprints/layer1-base/group1/rhel98.toml
| run_command_on_vm "${vmname}" \ | ||
| "sudo tar czf /tmp/pcp-archives.tar.gz -C /var/log/pcp/pmlogger ." || true | ||
|
|
There was a problem hiding this comment.
Handle PCP archive creation failures explicitly.
Suppressing tar errors here can mask why artifacts are missing and make collection diagnostics weaker. Fail this VM’s packaging step with a warning and continue.
Proposed fix
- run_command_on_vm "${vmname}" \
- "sudo tar czf /tmp/pcp-archives.tar.gz -C /var/log/pcp/pmlogger ." || true
+ if ! run_command_on_vm "${vmname}" \
+ "sudo rm -f /tmp/pcp-archives.tar.gz && \
+ sudo tar czf /tmp/pcp-archives.tar.gz -C /var/log/pcp/pmlogger ."; then
+ echo "WARNING: Failed to package PCP data on ${vmname}"
+ continue
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| run_command_on_vm "${vmname}" \ | |
| "sudo tar czf /tmp/pcp-archives.tar.gz -C /var/log/pcp/pmlogger ." || true | |
| if ! run_command_on_vm "${vmname}" \ | |
| "sudo rm -f /tmp/pcp-archives.tar.gz && \ | |
| sudo tar czf /tmp/pcp-archives.tar.gz -C /var/log/pcp/pmlogger ."; then | |
| echo "WARNING: Failed to package PCP data on ${vmname}" | |
| continue | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/bin/scenario.sh` around lines 353 - 355, The tar invocation currently
swallows failures via "|| true"; change the run_command_on_vm "${vmname}" "sudo
tar czf /tmp/pcp-archives.tar.gz -C /var/log/pcp/pmlogger ." call to capture its
exit status, and if non-zero emit a clear warning (e.g., printf or process
logger) including the vmname and the tar command output, mark/record that PCP
packaging for that VM failed (set a variable or append the vmname to a failures
list) and continue rather than silently ignoring the error; use the
run_command_on_vm call, vmname, and the tar command string to locate where to
implement this logic.
|
/test e2e-aws-tests-periodic |
| "--filename" "*.log" | ||
| } | ||
|
|
||
| start_pcp_on_all_vms() { |
There was a problem hiding this comment.
I think we should enable services in the containerfile / toml as we install them.
There was a problem hiding this comment.
Will do before publishing final version. Let me leave this comment open to remember. Thanks!
|
@pacevedom: This pull request references USHIFT-6979 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Bake pcp-zeroconf into all test VM images (ostree blueprints across all layers and bootc Containerfiles) so Performance Co-Pilot is available on every test VM. At test time, scenario.sh starts pmcd/pmlogger before tests and collects the PCP archives as artifacts alongside SOS reports. Controlled via SKIP_PCP environment variable (defaults to false). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/test e2e-aws-tests-periodic |
1 similar comment
|
/test e2e-aws-tests-periodic |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/bin/scenario.sh`:
- Around line 1614-1617: The trap currently always calls collect_pcp_reports
which stops pmlogger even on success; change the trap so collect_pcp_reports is
executed only when the exit code indicates failure (i.e., when rc != 0). Update
the trap block that invokes record_junit, collect_pcp_reports, and sos_report so
that collect_pcp_reports is guarded by the same failure conditional used for
record_junit (refer to the trap invocation and the symbols record_junit,
collect_pcp_reports, and sos_report to locate the code) and leave sos_report
behavior unchanged or adjust its rc handling as needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: e28fd666-9399-4751-a915-ff1875556150
📒 Files selected for processing (17)
test/bin/scenario.shtest/image-blueprints-bootc/el10/layer1-base/group1/rhel102-test-agent.containerfiletest/image-blueprints-bootc/el9/layer1-base/group1/rhel96-test-agent.containerfiletest/image-blueprints-bootc/el9/layer1-base/group1/rhel98-test-agent.containerfiletest/image-blueprints-bootc/upstream/group1/cos10-test-agent.containerfiletest/image-blueprints-bootc/upstream/group1/cos9-test-agent.containerfiletest/image-blueprints/layer1-base/group1/rhel96.tomltest/image-blueprints/layer1-base/group1/rhel98.tomltest/image-blueprints/layer2-presubmit/group1/rhel98-source-base.tomltest/image-blueprints/layer2-presubmit/group1/rhel98-source-fake-next-minor.tomltest/image-blueprints/layer2-presubmit/group1/rhel98-source-with-optionals.tomltest/image-blueprints/layer2-presubmit/group1/rhel98-source.tomltest/image-blueprints/layer3-periodic/group1/rhel98-source-isolated.tomltest/image-blueprints/layer4-release/group1/rhel96-brew-y2-with-optionals.tomltest/image-blueprints/layer4-release/group2/rhel96-brew-y1-with-optionals.tomltest/image-blueprints/layer4-release/group3/rhel98-brew-lrel-optional.tomltest/image-blueprints/layer4-release/group3/rhel98-brew-nightly-with-optionals.toml
✅ Files skipped from review due to trivial changes (1)
- test/image-blueprints/layer4-release/group3/rhel98-brew-nightly-with-optionals.toml
🚧 Files skipped from review as they are similar to previous changes (5)
- test/image-blueprints/layer4-release/group1/rhel96-brew-y2-with-optionals.toml
- test/image-blueprints/layer2-presubmit/group1/rhel98-source-fake-next-minor.toml
- test/image-blueprints-bootc/el9/layer1-base/group1/rhel98-test-agent.containerfile
- test/image-blueprints/layer2-presubmit/group1/rhel98-source-base.toml
- test/image-blueprints-bootc/el10/layer1-base/group1/rhel102-test-agent.containerfile
Add pmcd and pmlogger to the enabled services list in all ostree image blueprints that include pcp-zeroconf. This ensures PCP starts logging from first boot, capturing metrics during the boot and greenboot phases rather than only during test execution.
Add pmcd and pmlogger to the systemctl enable command in all bootc test-agent containerfiles that install pcp-zeroconf. This ensures PCP starts logging from first boot in bootc-based test VMs.
Since PCP services are now enabled at the image level, remove the start_pcp_on_all_vms function that manually restarted pmcd/pmlogger before tests. Refactor collect_pcp_reports to dispatch between online and offline collection, mirroring the sos_report pattern. Offline VMs (no network) are now collected via invoke_qemu_script. Also add PCP collection to the action_create EXIT trap so archives are captured even when the boot phase fails.
|
/retest |
|
@pacevedom: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Install and run Performance Co-Pilot on all online VMs during test execution, then collect the archives as artifacts alongside SOS reports. Controlled via SKIP_PCP environment variable (defaults to false).
Summary by CodeRabbit
New Features
Tests
Chores