Skip to content

LCORE-2035: Add logic for tunnel and interception proxies on konflux#1758

Merged
radofuchs merged 6 commits into
lightspeed-core:mainfrom
radofuchs:LCORE_2035_konflux_fix
May 21, 2026
Merged

LCORE-2035: Add logic for tunnel and interception proxies on konflux#1758
radofuchs merged 6 commits into
lightspeed-core:mainfrom
radofuchs:LCORE_2035_konflux_fix

Conversation

@radofuchs
Copy link
Copy Markdown
Contributor

@radofuchs radofuchs commented May 18, 2026

Description

Add logic for tunnel and interception proxies on konflux

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • In-cluster tunnel and TLS-interception proxies with HTTP stats endpoints and optional CA support for e2e runs; deployment manifests added.
  • Tests

    • Proxy-related e2e scenarios enabled to run in-cluster (tag adjustments and expected-response updates) and verification now uses in-cluster stats.
  • Chores

    • CLI/script commands to deploy/sync proxies and fetch stats added; improved pod restart error reporting and readiness handling.

Review Change Stack

@radofuchs radofuchs requested review from max-svistunov and tisnik and removed request for tisnik May 18, 2026 14:46
@radofuchs radofuchs marked this pull request as ready for review May 18, 2026 14:46
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Warning

Rate limit exceeded

@radofuchs has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 45 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f13128cc-7f07-4c1e-9a99-be18ee14d01c

📥 Commits

Reviewing files that changed from the base of the PR and between 1478b0f and 1992e24.

📒 Files selected for processing (1)
  • tests/e2e/features/steps/proxy.py

Walkthrough

This PR enables proxy feature e2e tests to run inside Prow/Konflux clusters by introducing in-cluster tunnel and interception proxy pods, extending proxy implementations with in-cluster mode and stats endpoints, adding orchestration commands to deploy and manage proxies, and updating test steps to route through and verify in-cluster proxies.

Changes

In-Cluster Proxy for Prow E2E Execution

Layer / File(s) Summary
In-Cluster Pod Manifests
tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml, tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml
Kubernetes Pod and Service definitions for tunnel proxy (8888 proxy, 8887 stats) and interception proxy (8889 proxy, 8886 stats), both with non-root security hardening, readiness/liveness probes on /stats, and ConfigMap volumes for script execution.
Llama manifest: optional interception CA mount
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
Adds optional Secret-backed volume/volumeMount for e2e-interception-proxy-ca to mount ca.pem into /tmp/interception-proxy-ca.pem in the Llama container.
Proxy Implementations with In-Cluster Mode
tests/e2e/proxy/tunnel_proxy.py, tests/e2e/proxy/interception_proxy.py
Both proxies extended with run_in_cluster() entry points, HTTP stats servers exposing /stats with CONNECT counts and intercepted host:port targets as JSON, and main() CLI functions for containerized execution.
E2E Operations: Deployment and Stats
tests/e2e-prow/rhoai/scripts/e2e-ops.sh, tests/e2e-prow/rhoai/pipeline-konflux.sh
New e2e-ops commands deploy-e2e-tunnel-proxy, deploy-e2e-interception-proxy, tunnel-proxy-stats, interception-proxy-stats, and sync-interception-proxy-ca-secret to deploy manifests, fetch stats, and manage CA certificates. cmd_restart_llama_stack now can sync the CA before recreating the Llama pod and waits/verifies CA mount. Pipeline comment updated to reference proxy deployment steps.
Test Feature Enablement for Prow
tests/e2e/features/proxy.feature, tests/e2e/features/steps/proxy.py, tests/e2e/utils/prow_utils.py
Removed @skip-in-prow tags from proxy scenarios and adjusted flaky tags. Step implementations detect Prow context, deploy in-cluster proxies when needed, route Llama Stack through in-cluster DNS/ports vs local proxies, fetch/parse in-cluster /stats JSON for verification, sync interception CA into a Secret, and scenario cleanup updated. restart_pod() now raises CalledProcessError with stderr-derived detail on failure.

Sequence Diagram(s)

sequenceDiagram
  participant Test as E2E Test (proxy.feature)
  participant StepImpl as Step Implementation (proxy.py)
  participant EzOps as e2e-ops.sh
  participant Cluster as Prow Cluster
  participant TProxy as e2e-tunnel-proxy Pod
  participant IProxy as e2e-interception-proxy Pod
  participant Llama as Llama Stack Pod

  Test->>StepImpl: Scenario starts (Prow context detected)
  StepImpl->>EzOps: run_e2e_ops deploy-e2e-tunnel-proxy
  EzOps->>Cluster: kubectl apply e2e-tunnel-proxy manifest
  Cluster->>TProxy: Create Pod + Service
  TProxy->>TProxy: Start CONNECT proxy + stats server
  StepImpl->>EzOps: run_e2e_ops deploy-e2e-interception-proxy
  EzOps->>Cluster: kubectl apply e2e-interception-proxy manifest
  Cluster->>IProxy: Create Pod + Service
  IProxy->>IProxy: Start TLS interception proxy + stats server (exports CA PEM)
  StepImpl->>Llama: Configure proxy env (in-cluster DNS/ports)
  Test->>StepImpl: Verify proxy behavior
  StepImpl->>TProxy: GET /stats -> JSON counters
  TProxy-->>StepImpl: JSON counters (connect_count, last_connect_target)
  StepImpl->>IProxy: GET /stats -> JSON counters/hosts
  IProxy-->>StepImpl: JSON counters (connect_count, intercepted hosts)
  StepImpl->>Test: Assert CONNECT counts and intercepted hosts
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • tisnik
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding tunnel and interception proxy logic for Konflux/Prow in-cluster deployment. It is concise, specific, and directly related to the primary modifications across manifests, scripts, and test features.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@radofuchs radofuchs requested a review from tisnik May 18, 2026 14:47
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/proxy/interception_proxy.py (1)

165-167: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a timeout for upstream TLS connect.

The interception path opens the remote TLS connection without a timeout, so a stalled upstream can hang this handler much longer than expected. Align with tunnel_proxy.py behavior and wrap this in asyncio.wait_for(...).

Suggested patch
-                remote_reader, remote_writer = await asyncio.open_connection(
-                    target_host, target_port, ssl=True
-                )
+                remote_reader, remote_writer = await asyncio.wait_for(
+                    asyncio.open_connection(target_host, target_port, ssl=True),
+                    timeout=10,
+                )
-            except (OSError, ConnectionRefusedError, ssl.SSLError) as e:
+            except (asyncio.TimeoutError, OSError, ConnectionRefusedError, ssl.SSLError) as e:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/proxy/interception_proxy.py` around lines 165 - 167, The remote TLS
connect call using asyncio.open_connection (the line assigning remote_reader,
remote_writer) lacks a timeout; wrap that await in asyncio.wait_for(...) using
the same upstream connect timeout constant/variable used by the tunnel proxy
(e.g., UPSTREAM_CONNECT_TIMEOUT or the existing upstream_connect_timeout
parameter) so the connect becomes: remote_reader, remote_writer = await
asyncio.wait_for(asyncio.open_connection(target_host, target_port, ssl=True),
timeout=UPSTREAM_CONNECT_TIMEOUT). Ensure you import or reference the same
timeout symbol used by tunnel_proxy behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml`:
- Around line 35-37: Remove the runtime pip install from the pod startup and
instead bake the trustme dependency into the container image used by the
interception proxy; specifically, eliminate the "pip install --quiet
--no-cache-dir --target /tmp/pydeps 'trustme>=1.2.1'" line in the startup
snippet that ultimately execs python /app/interception_proxy.py and update the
Dockerfile or image build pipeline that produces the image used by
tests/e2e-proxy to pip-install trustme (or add it to requirements) so
tests/e2e/proxy/interception_proxy.py can import trustme at runtime without
network access.

In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml`:
- Around line 204-208: The Secret mount for e2e-interception-proxy-ca is marked
optional and uses a subPath (volumeMount name interception-proxy-ca, mountPath
/tmp/interception-proxy-ca.pem, subPath ca.pem), which causes Kubernetes to
create an empty directory when the Secret is missing; either remove optional:
true to make the Secret required so the pod will fail fast, or remove subPath
and mount the whole secret directory (so the app can check for the file at
/tmp/interception-proxy-ca.pem at runtime); apply the same change for the other
occurrence referenced (lines 221-224).

---

Outside diff comments:
In `@tests/e2e/proxy/interception_proxy.py`:
- Around line 165-167: The remote TLS connect call using asyncio.open_connection
(the line assigning remote_reader, remote_writer) lacks a timeout; wrap that
await in asyncio.wait_for(...) using the same upstream connect timeout
constant/variable used by the tunnel proxy (e.g., UPSTREAM_CONNECT_TIMEOUT or
the existing upstream_connect_timeout parameter) so the connect becomes:
remote_reader, remote_writer = await
asyncio.wait_for(asyncio.open_connection(target_host, target_port, ssl=True),
timeout=UPSTREAM_CONNECT_TIMEOUT). Ensure you import or reference the same
timeout symbol used by tunnel_proxy behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 726f808a-48bb-4619-8976-b6595102ebd2

📥 Commits

Reviewing files that changed from the base of the PR and between 73637c6 and 176e358.

📒 Files selected for processing (10)
  • tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
  • tests/e2e-prow/rhoai/pipeline-konflux.sh
  • tests/e2e-prow/rhoai/scripts/e2e-ops.sh
  • tests/e2e/features/proxy.feature
  • tests/e2e/features/steps/proxy.py
  • tests/e2e/proxy/interception_proxy.py
  • tests/e2e/proxy/tunnel_proxy.py
  • tests/e2e/utils/prow_utils.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
🧰 Additional context used
📓 Path-based instructions (2)
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/proxy.feature
  • tests/e2e/proxy/interception_proxy.py
  • tests/e2e/utils/prow_utils.py
  • tests/e2e/proxy/tunnel_proxy.py
  • tests/e2e/features/steps/proxy.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/e2e/proxy/interception_proxy.py
  • tests/e2e/utils/prow_utils.py
  • tests/e2e/proxy/tunnel_proxy.py
  • tests/e2e/features/steps/proxy.py
🧠 Learnings (3)
📚 Learning: 2026-02-19T10:06:50.647Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:50.647Z
Learning: In the rhoai tests under tests/e2e-prow/rhoai/manifests, avoid static ConfigMap definitions for mock-jwks-script and mcp-mock-server-script since these ConfigMaps are created dynamically by the pipeline.sh deployment script using 'oc create configmap'. Ensure there are no static ConfigMap resources for these names in the manifests. If such ConfigMaps are added in the future, coordinate with the pipeline to reflect dynamic creation or adjust tests to rely on the dynamic provisioning.

Applied to files:

  • tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml
📚 Learning: 2026-04-07T09:20:26.590Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

  • tests/e2e/features/steps/proxy.py
📚 Learning: 2026-04-13T13:39:54.963Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

  • tests/e2e/features/steps/proxy.py
🪛 Checkov (3.2.528)
tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml

[low] 3-63: CPU limits should be set

(CKV_K8S_11)


[low] 3-63: CPU requests should be set

(CKV_K8S_10)


[low] 3-63: The default namespace should not be used

(CKV_K8S_21)


[low] 3-63: Image should use digest

(CKV_K8S_43)


[low] 3-63: Image Pull Policy should be Always

(CKV_K8S_15)


[low] 3-63: Memory limits should be set

(CKV_K8S_13)


[low] 3-63: Memory requests should be set

(CKV_K8S_12)


[low] 3-63: Use read-only filesystem for containers where possible

(CKV_K8S_22)


[low] 3-63: Containers should run as a high UID to avoid host conflict

(CKV_K8S_40)


[low] 3-63: Ensure that Service Account Tokens are only mounted where necessary

(CKV_K8S_38)


[low] 64-77: The default namespace should not be used

(CKV_K8S_21)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml

[medium] 6-224: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[low] 6-224: CPU limits should be set

(CKV_K8S_11)


[low] 6-224: CPU requests should be set

(CKV_K8S_10)


[low] 6-224: The default namespace should not be used

(CKV_K8S_21)


[low] 6-224: Minimize the admission of containers with the NET_RAW capability

(CKV_K8S_28)


[low] 6-224: Image should use digest

(CKV_K8S_43)


[low] 6-224: Image Tag should be fixed - not latest or blank

(CKV_K8S_14)


[low] 6-224: Memory limits should be set

(CKV_K8S_13)


[low] 6-224: Memory requests should be set

(CKV_K8S_12)


[low] 6-224: Minimize the admission of containers with capabilities assigned

(CKV_K8S_37)


[low] 6-224: Use read-only filesystem for containers where possible

(CKV_K8S_22)


[low] 6-224: Containers should run as a high UID to avoid host conflict

(CKV_K8S_40)


[low] 6-224: Prefer using secrets as files over secrets as environment variables

(CKV_K8S_35)


[low] 6-224: Ensure that Service Account Tokens are only mounted where necessary

(CKV_K8S_38)

tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml

[low] 3-55: CPU limits should be set

(CKV_K8S_11)


[low] 3-55: CPU requests should be set

(CKV_K8S_10)


[low] 3-55: The default namespace should not be used

(CKV_K8S_21)


[low] 3-55: Image should use digest

(CKV_K8S_43)


[low] 3-55: Image Pull Policy should be Always

(CKV_K8S_15)


[low] 3-55: Memory limits should be set

(CKV_K8S_13)


[low] 3-55: Memory requests should be set

(CKV_K8S_12)


[low] 3-55: Use read-only filesystem for containers where possible

(CKV_K8S_22)


[low] 3-55: Containers should run as a high UID to avoid host conflict

(CKV_K8S_40)


[low] 3-55: Ensure that Service Account Tokens are only mounted where necessary

(CKV_K8S_38)


[low] 56-69: The default namespace should not be used

(CKV_K8S_21)

🪛 Trivy (0.69.3)
tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml

[info] 15-58: CPU not limited

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)


[error] 15-58: Root file system is not read-only

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[info] 15-58: CPU requests not specified

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)


[info] 15-58: Memory requests not specified

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)


[info] 15-58: Memory not limited

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)


[info] 15-58: Runs with UID <= 10000

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)


[info] 15-58: Runs with GID <= 10000

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)


[info] 5-8: Workloads in the default namespace

pod e2e-interception-proxy in default namespace should set metadata.namespace to a non-default namespace

Rule: KSV-0110

Learn more

(IaC/Kubernetes)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml

[info] 95-208: CPU not limited

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)


[warning] 95-208: Image tag ":latest" used

Container 'llama-stack-container' of Pod 'llama-stack-service' should specify an image tag

Rule: KSV-0013

Learn more

(IaC/Kubernetes)


[error] 95-208: Root file system is not read-only

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[info] 95-208: CPU requests not specified

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)


[info] 95-208: Memory requests not specified

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)


[info] 95-208: Memory not limited

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)


[info] 95-208: Runs with UID <= 10000

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)


[info] 95-208: Runs with GID <= 10000

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)


[warning] 95-208: Restrict container images to trusted registries

Container llama-stack-container in pod llama-stack-service (namespace: default) uses an image from an untrusted registry.

Rule: KSV-0125

Learn more

(IaC/Kubernetes)

tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml

[info] 15-50: CPU not limited

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)


[error] 15-50: Root file system is not read-only

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[info] 15-50: CPU requests not specified

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)


[info] 15-50: Memory requests not specified

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)


[info] 15-50: Memory not limited

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)


[info] 15-50: Runs with UID <= 10000

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)


[info] 15-50: Runs with GID <= 10000

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)


[info] 5-8: Workloads in the default namespace

pod e2e-tunnel-proxy in default namespace should set metadata.namespace to a non-default namespace

Rule: KSV-0110

Learn more

(IaC/Kubernetes)

🔇 Additional comments (7)
tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml (1)

3-69: LGTM!

tests/e2e/proxy/tunnel_proxy.py (1)

22-29: LGTM!

Also applies to: 187-263

tests/e2e-prow/rhoai/scripts/e2e-ops.sh (1)

631-746: LGTM!

Also applies to: 800-836

tests/e2e-prow/rhoai/pipeline-konflux.sh (1)

138-139: LGTM!

tests/e2e/features/proxy.feature (1)

24-25: LGTM!

Also applies to: 56-57, 69-70

tests/e2e/features/steps/proxy.py (1)

99-193: LGTM!

Also applies to: 324-604

tests/e2e/utils/prow_utils.py (1)

113-118: LGTM!

Comment on lines +35 to +37
set -e
pip install --quiet --no-cache-dir --target /tmp/pydeps 'trustme>=1.2.1'
exec python /app/interception_proxy.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify runtime dependency fetch is in startup command and identify where trustme is used.
rg -n "pip install|trustme" tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml tests/e2e/proxy/interception_proxy.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 892


Avoid runtime package installation in pod startup path.

The pip install trustme command in the pod startup (line 36) introduces an external network dependency during pod initialization, which can cause flaky or slow e2e runs. Since trustme is required by the interception proxy logic (tests/e2e/proxy/interception_proxy.py imports and uses it), prebuild the container image with this dependency included so pod startup is deterministic and does not depend on external package indices.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml` around
lines 35 - 37, Remove the runtime pip install from the pod startup and instead
bake the trustme dependency into the container image used by the interception
proxy; specifically, eliminate the "pip install --quiet --no-cache-dir --target
/tmp/pydeps 'trustme>=1.2.1'" line in the startup snippet that ultimately execs
python /app/interception_proxy.py and update the Dockerfile or image build
pipeline that produces the image used by tests/e2e-proxy to pip-install trustme
(or add it to requirements) so tests/e2e/proxy/interception_proxy.py can import
trustme at runtime without network access.

Comment thread tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/features/proxy.feature`:
- Around line 79-80: The feature currently accepts either 404 or 500 as valid
outcomes for proxy errors which masks which subsystem failed; update the test(s)
to assert a single deterministic status per scenario by either (A) splitting the
ambiguous step in tests/e2e/features/proxy.feature into two explicit scenarios
each expecting a specific status, or (B) change the step usage to call the
specific step definition "The status code of the response is one of {first:d} or
{second:d}" in common_http.py and replace it with a new step (e.g., "The status
code of the response is {status:d}") and update scenario(s) to use the new step;
modify the step implementation in common_http.py to only accept one status and
fail clearly when other codes are returned so the tests precisely reflect
whether the main query or Llama Guard moderation path failed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4bf0739a-9ad0-42dd-af6a-2d84feebbdd5

📥 Commits

Reviewing files that changed from the base of the PR and between 176e358 and 0757895.

📒 Files selected for processing (1)
  • tests/e2e/features/proxy.feature
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (1)
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/proxy.feature
🔇 Additional comments (2)
tests/e2e/features/proxy.feature (2)

24-24: Inconsistency between summary and code changes.

The AI-generated summary claims that @skip-in-prow tags were removed from the scenarios at lines 24, 41, and 69, but these lines are not marked as changed in the provided code. Either the summary is incorrect, or these changes occurred in a different commit not shown here.

Also applies to: 41-41, 69-69


56-56: 🏗️ Heavy lift

No action needed — @flaky is a designed feature with automatic retry logic.

The @flaky tag is not a workaround that masks issues. It's a documented, intentional feature implemented in tests/e2e/features/environment.py that automatically retries scenarios up to 5 times to handle transient infrastructure failures (timing issues, network instability, etc.). The hook applies patch_scenario_with_autoretry() to all scenarios tagged with @flaky, allowing the test suite to accommodate infrastructure flakiness without compromising test validity.

Comment on lines +79 to +80
#will be fixed in https://redhat.atlassian.net/browse/LCORE-2255
Then The status code of the response is one of 404 or 500
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check JIRA issue LCORE-2255 status and related error handling

# Search for other scenarios with relaxed error code expectations
echo "=== Scenarios accepting multiple status codes ==="
rg -n 'one of.*or' --glob '*.feature' -C2

# Check if there's error handling or retry logic that might affect status codes
echo -e "\n=== Error handling in proxy steps ==="
rg -n -A5 'status.*code.*(404|500)' tests/e2e/features/steps/proxy.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 813


🏁 Script executed:

# Find the step definition that checks status codes
rg -n 'The status code of the response is' tests/e2e/ -A3

# Get the full scenarios from the feature file
echo "=== Scenario context for both occurrences ==="
sed -n '40,82p' tests/e2e/features/proxy.feature

Repository: lightspeed-core/lightspeed-stack

Length of output: 50389


🏁 Script executed:

# Get the step implementation
sed -n '43,50p' tests/e2e/features/steps/common_http.py

# Get full proxy.feature scenarios for both occurrences
echo "=== Scenario at lines 40-52 ==="
sed -n '40,52p' tests/e2e/features/proxy.feature

echo -e "\n=== Scenario at lines 68-82 ==="
sed -n '68,82p' tests/e2e/features/proxy.feature

# Check if there's any documentation about why both codes are needed
echo -e "\n=== Proxy feature file header/comments ==="
head -40 tests/e2e/features/proxy.feature

Repository: lightspeed-core/lightspeed-stack

Length of output: 3500


🏁 Script executed:

# Look for comments or documentation about proxy error handling
rg -n 'proxy.*error|error.*proxy' tests/e2e/ -i -A2 -B2

# Check the full step implementation
sed -n '43,60p' tests/e2e/features/steps/common_http.py

# Search for any recent changes or comments in proxy-related code
rg -n '404|500' tests/e2e/features/proxy.feature

# Check if there's any configuration or setup that might affect error codes
grep -n "unreachable\|CA\|certificate" tests/e2e/features/proxy.feature

Repository: lightspeed-core/lightspeed-stack

Length of output: 1684


Accept both 404 and 500 status codes as temporary trade-off while LCORE-2255 is resolved.

The dual status codes (404 or 500) in both proxy error scenarios appear intentional. The step definition "The status code of the response is one of {first:d} or {second:d}" is explicitly defined in common_http.py (lines 43-60) with proper error reporting.

The feature file header explains that "Llama Guard moderation can issue separate provider calls inside Llama Stack that may not inherit the same proxy/TLS CA trust as the scenario's remote inference provider." This means both error scenarios can fail via different code paths—either the main query fails or the Llama Guard moderation calls fail—resulting in either 404 or 500 depending on where the failure occurs.

While this reduces test precision, it appears to be a pragmatic solution acknowledging multiple error paths in the system. The JIRA reference (LCORE-2255) confirms this is a known issue being tracked for resolution. Consider prioritizing the fix so that both scenarios specify exactly which code should be returned for each failure mode.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/features/proxy.feature` around lines 79 - 80, The feature currently
accepts either 404 or 500 as valid outcomes for proxy errors which masks which
subsystem failed; update the test(s) to assert a single deterministic status per
scenario by either (A) splitting the ambiguous step in
tests/e2e/features/proxy.feature into two explicit scenarios each expecting a
specific status, or (B) change the step usage to call the specific step
definition "The status code of the response is one of {first:d} or {second:d}"
in common_http.py and replace it with a new step (e.g., "The status code of the
response is {status:d}") and update scenario(s) to use the new step; modify the
step implementation in common_http.py to only accept one status and fail clearly
when other codes are returned so the tests precisely reflect whether the main
query or Llama Guard moderation path failed.

Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in overall. I'd just avoid usage of magic port numbers in e2e, they can be added as named constants in e2e/features/steps pacakge. But it's just a nit.

Comment thread tests/e2e/features/steps/proxy.py Outdated

def _cluster_interception_proxy_port(requested_port: int) -> int:
"""Map feature-file port to the in-cluster interception proxy listener."""
if requested_port in (8889, 8890):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob. might be worth to document those two magic ports (as named constants perhaps?)

Comment thread tests/e2e/features/steps/proxy.py Outdated
"""Start a tunnel proxy in a background thread."""
"""Start a tunnel proxy locally, or verify the in-cluster proxy (Konflux/Prow)."""
if is_prow_environment():
if port != 8888:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtto pls

line = request_line.decode("utf-8", errors="replace").strip()
method_path = line.split()
path = method_path[1] if len(method_path) > 1 else ""
while True:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's interesting way how to handle large payloads, notes taken ;)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/features/steps/proxy.py (1)

1-620: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix Black formatting issues before merge.

The CI pipeline reports that this file fails Black's formatting check. Run black tests/e2e/features/steps/proxy.py locally and commit the reformatted result.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/features/steps/proxy.py` around lines 1 - 620, Run Black to
reformat this file (tests/e2e/features/steps/proxy.py) and commit the changes so
CI passes; specifically format the file with `black
tests/e2e/features/steps/proxy.py` and include the resulting diff in your
PR—this will correctly fix spacing/line-break issues around functions like
_is_docker_mode, _get_proxy_host, start_tunnel_proxy, start_interception_proxy,
configure_llama_interception_with_ca, and verify_interception_proxy_used.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/e2e/features/steps/proxy.py`:
- Around line 1-620: Run Black to reformat this file
(tests/e2e/features/steps/proxy.py) and commit the changes so CI passes;
specifically format the file with `black tests/e2e/features/steps/proxy.py` and
include the resulting diff in your PR—this will correctly fix spacing/line-break
issues around functions like _is_docker_mode, _get_proxy_host,
start_tunnel_proxy, start_interception_proxy,
configure_llama_interception_with_ca, and verify_interception_proxy_used.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5615c323-f9c0-4721-9690-57fb61dd4b37

📥 Commits

Reviewing files that changed from the base of the PR and between 0757895 and 1478b0f.

📒 Files selected for processing (2)
  • tests/e2e/features/steps/proxy.py
  • tests/e2e/proxy/interception_proxy.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
  • GitHub Check: Pyright
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/e2e/proxy/interception_proxy.py
  • tests/e2e/features/steps/proxy.py
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/proxy/interception_proxy.py
  • tests/e2e/features/steps/proxy.py
🧠 Learnings (2)
📚 Learning: 2026-04-07T09:20:26.590Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

  • tests/e2e/features/steps/proxy.py
📚 Learning: 2026-04-13T13:39:54.963Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

  • tests/e2e/features/steps/proxy.py
🪛 GitHub Actions: Black / 0_black.txt
tests/e2e/features/steps/proxy.py

[error] 1-1: Black formatting check failed (black --check). would reformat this file; 1 file would be reformatted and the step exited with code 1.

🪛 GitHub Actions: Black / black
tests/e2e/features/steps/proxy.py

[error] Black check indicates formatting changes needed for this file (reported as the single file that would be reformatted).

🔇 Additional comments (8)
tests/e2e/proxy/interception_proxy.py (1)

37-38: LGTM!

tests/e2e/features/steps/proxy.py (7)

36-40: LGTM!


47-50: LGTM!


139-145: LGTM!


338-342: LGTM!


375-377: LGTM!


478-480: LGTM!


516-518: LGTM!

@radofuchs radofuchs merged commit 28b57cc into lightspeed-core:main May 21, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants