LCORE-1518: Skip model authentication against Azure#1766
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (22)
WalkthroughThis PR refactors Azure Entra ID token management from startup-time file generation to runtime in-memory caching. It moves token acquisition to request time via the ChangesAzure Token Handling Refactoring: In-Memory Caching & Runtime Acquisition
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/client.py (1)
257-263:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHarden provider-data JSON parsing before merge.
At Line 258, valid JSON that is not an object (for example, a list) will pass decoding and then fail at Line 262 on
.update(...).Proposed fix
- try: - provider_data = json.loads(provider_data_json) if provider_data_json else {} - except (json.JSONDecodeError, TypeError): - provider_data = {} + try: + parsed = json.loads(provider_data_json) if provider_data_json else {} + provider_data = parsed if isinstance(parsed, dict) else {} + except (json.JSONDecodeError, TypeError): + provider_data = {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/client.py` around lines 257 - 263, The JSON parsing for provider data currently assumes json.loads(provider_data_json) yields a dict and then calls provider_data.update(updates); to fix, after decoding (and in the existing except block) ensure the decoded value is a dict (use isinstance(provider_data, dict)) and if it's not, replace it with an empty dict before calling provider_data.update(updates); keep the existing exception handling for JSONDecodeError/TypeError and apply this check where provider_data and provider_data_json are used.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/providers.md`:
- Around line 95-107: The docs use two different config keys—`base_url` in the
prose and `api_base` in the YAML—causing inconsistency with the actual schema;
pick the correct schema key (whichever the code expects for the `remote::azure`
provider) and update both the descriptive text and the example so they match
(for example, replace `api_base` with `base_url` or vice versa) and keep
references to `azure_entra_id`, `model_validation: false`, `provider_id: azure`,
and `provider_type: remote::azure` unchanged.
- Around line 114-118: Update the lifecycle text under "Lightspeed startup
(library and service mode)" to remove the claim that Lightspeed "Acquires an
initial access token from Microsoft Entra ID" and "Caches the token in memory
(`SecretStr`, per worker)" at startup; instead state that Lightspeed reads Entra
ID configuration at startup but defers authentication until request time,
acquiring and caching access tokens on demand per-request (and still
initializing the Llama Stack client with `X-LlamaStack-Provider-Data` as
before). Ensure the revised step explicitly reflects the runtime-only,
deferred-auth flow implemented in the PR.
In `@src/client.py`:
- Around line 245-250: When initializing the AsyncLlamaStackAsLibraryClient
inside the token-refresh path, wrap the await client.initialize() call in a
try/except that catches initialization failures (broad Exception or the specific
initialization error type) and translate them into a service-unavailable error
(e.g., raise or return a ServiceUnavailableError / HTTP 503) instead of letting
the exception bubble; set self._lsc only after successful initialization (leave
self._lsc unchanged on failure) and include the client
variable/AsyncLlamaStackAsLibraryClient.initialize in your handling so callers
receive a 503-style backend-unavailable response when initialization fails.
In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Line 731: The test creates an unused mock "mock_updated_client" and sets
update_azure_token to return it but never asserts it's used; either remove the
unused mock_updated_client and its return configuration to simplify the test, or
if you intend to verify the updated client is applied, change the assertions to
check that get_client() (or subsequent operations that use the client)
returns/uses mock_updated_client after calling update_azure_token; update
references to AsyncLlamaStackClient, mock_updated_client, and update_azure_token
accordingly.
---
Outside diff comments:
In `@src/client.py`:
- Around line 257-263: The JSON parsing for provider data currently assumes
json.loads(provider_data_json) yields a dict and then calls
provider_data.update(updates); to fix, after decoding (and in the existing
except block) ensure the decoded value is a dict (use isinstance(provider_data,
dict)) and if it's not, replace it with an empty dict before calling
provider_data.update(updates); keep the existing exception handling for
JSONDecodeError/TypeError and apply this check where provider_data and
provider_data_json are used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 23accaba-033e-4848-bb78-61bb7f096894
📒 Files selected for processing (22)
Makefiledocs/providers.mddocs/rag_guide.mdexamples/azure-run.yamlscripts/llama-stack-entrypoint.shsrc/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/app/main.pysrc/authorization/azure_token_manager.pysrc/client.pysrc/llama_stack_configuration.pysrc/utils/query.pytests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yamltests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yamltests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_responses.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/authorization/test_azure_token_manager.pytests/unit/test_client.pytests/unit/test_llama_stack_configuration.pytests/unit/utils/test_query.py
💤 Files with no reviewable changes (4)
- Makefile
- tests/unit/utils/test_query.py
- docs/rag_guide.md
- src/utils/query.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_responses.pytests/unit/test_llama_stack_configuration.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/authorization/test_azure_token_manager.pytests/unit/test_client.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/responses.pysrc/app/main.pysrc/authorization/azure_token_manager.pysrc/llama_stack_configuration.pysrc/client.pysrc/app/endpoints/query.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: FastAPI dependencies: Import fromfastapimodule forAPIRouter,HTTPException,Request,status,Depends
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoints and handleAPIConnectionErrorfrom Llama Stack
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/responses.pysrc/app/main.pysrc/app/endpoints/query.py
🧠 Learnings (4)
📚 Learning: 2026-05-12T15:14:34.788Z
Learnt from: syedriko
Repo: lightspeed-core/lightspeed-stack PR: 1727
File: scripts/konflux_requirements.sh:9-15
Timestamp: 2026-05-12T15:14:34.788Z
Learning: In this repo, the `.konflux/` directory is committed/tracked and is guaranteed to exist in a fresh clone. Therefore, shell scripts that write output under `.konflux/` (e.g., create files like `.konflux/<...>`) should not waste effort by calling `mkdir -p .konflux` first. Only add directory-creation logic if the script may run in an environment/repo state where `.konflux/` might not be present.
Applied to files:
scripts/llama-stack-entrypoint.sh
📚 Learning: 2026-02-19T10:06:50.647Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:50.647Z
Learning: In the rhoai tests under tests/e2e-prow/rhoai/manifests, avoid static ConfigMap definitions for mock-jwks-script and mcp-mock-server-script since these ConfigMaps are created dynamically by the pipeline.sh deployment script using 'oc create configmap'. Ensure there are no static ConfigMap resources for these names in the manifests. If such ConfigMaps are added in the future, coordinate with the pipeline to reflect dynamic creation or adjust tests to rely on the dynamic provisioning.
Applied to files:
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yamltests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/responses.pysrc/app/endpoints/query.py
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
Applied to files:
src/app/endpoints/query.py
🔇 Additional comments (17)
examples/azure-run.yaml (1)
27-27: LGTM!src/app/main.py (1)
98-102: LGTM!scripts/llama-stack-entrypoint.sh (1)
18-18: LGTM!tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml (1)
163-163: LGTM!tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml (1)
153-153: LGTM!src/authorization/azure_token_manager.py (2)
36-42: LGTM!Also applies to: 59-78
99-106: LGTM!tests/unit/authorization/test_azure_token_manager.py (1)
56-61: LGTM!Also applies to: 72-78, 80-92, 163-179
src/llama_stack_configuration.py (2)
46-78: LGTM!
582-583: LGTM!Also applies to: 632-632
tests/unit/test_llama_stack_configuration.py (1)
15-15: LGTM!Also applies to: 28-104, 527-527
tests/unit/test_client.py (1)
6-6: LGTM!Also applies to: 13-13, 16-19, 102-178
src/app/endpoints/query.py (1)
199-207: LGTM!src/app/endpoints/responses.py (1)
400-408: LGTM!src/app/endpoints/streaming_query.py (1)
257-265: LGTM!tests/unit/app/endpoints/test_query.py (1)
544-546: LGTM!Also applies to: 568-568
tests/unit/app/endpoints/test_responses.py (1)
469-470: LGTM!Also applies to: 485-485, 505-505
3324de6 to
ba728c8
Compare
| api_key: ${env.AZURE_API_KEY} # Must be exactly this - placeholder for Entra ID token | ||
| api_base: ${env.AZURE_API_BASE} | ||
| # api_key: ${env.AZURE_API_KEY} # Can be omitted when Entra ID configured in LCORE | ||
| base_url: ${env.AZURE_API_BASE} |
There was a problem hiding this comment.
Change in attribute name
| # shellcheck source=/dev/null | ||
| set -a && . "$ENV_FILE" && set +a | ||
| fi | ||
| -o "$ENRICHED_CONFIG" 2>&1 || ENRICHMENT_FAILED=1 |
There was a problem hiding this comment.
The API key is not passed by .env file anymore
| if azure_entra_id_config is not None: | ||
| AzureEntraIDManager().set_config(azure_entra_id_config) | ||
| azure_base_url = await AsyncLlamaStackClientHolder().get_azure_base_url() | ||
| AzureEntraIDManager().set_base_url(azure_base_url) |
There was a problem hiding this comment.
In FastAPI lifespan just setup manager's attributes and defer the token acquisition to inference requests.
| token = self.access_token.get_secret_value() | ||
| if not token or self.azure_base_url is None: | ||
| return None | ||
| return {"azure_api_key": token, "azure_api_base": self.azure_base_url} |
There was a problem hiding this comment.
Config and validator attribute discrepancy in LLS (api_base vs base_url)
| return _is_inout_shield(shield) or not is_output_shield(shield) | ||
|
|
||
|
|
||
| async def update_azure_token( |
There was a problem hiding this comment.
Replaced to client holder
Description
Optimizes Azure Entra ID token handling by skipping startup authentication for Azure models. This allows to deffer token acquisition to inference requests allowing to remove prestart token acquisition and passing by environment variables.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
.envfiles for Azure setup.