Skip to content

fix(analyzer): resolve LSP CALLS edges on repos without a venv (#685)#686

Open
DvirDukhan wants to merge 2 commits into
stagingfrom
dvirdukhan/fix-analyzer-silent-failures
Open

fix(analyzer): resolve LSP CALLS edges on repos without a venv (#685)#686
DvirDukhan wants to merge 2 commits into
stagingfrom
dvirdukhan/fix-analyzer-silent-failures

Conversation

@DvirDukhan
Copy link
Copy Markdown
Contributor

@DvirDukhan DvirDukhan commented May 27, 2026

Fixes #685.

Problem

The Python analyzer hardcoded environment_path="{path}/venv" when
starting jedi-language-server via multilspy. When the repo had no venv
(the common case for cloned codebases like sphinx, sympy, and every
SWE-bench worktree), jedi raised InvalidPythonEnvironment on every
request_definition() call. analyzer.resolve() then swallowed the
exception silently and the indexer produced a graph with DEFINES edges
only — zero CALLS, zero EXTENDS.

Discovered during benchmark validation: sphinx (5K functions) and sympy
(41K functions) had no resolved cross-references at all, while pytest
worked because… (still unexplained, see #685).

Fix

  1. api/analyzers/source_analyzer.py — prefer {repo}/venv, then
    {repo}/.venv, then fall back to the host interpreter's environment
    (sys.executable's prefix) so jedi always has a valid Python to
    introspect.
  2. api/analyzers/analyzer.py — log resolve() failures at WARN with
    file/line context instead of swallowing them silently, so the next
    regression is loud.

Verification

Re-indexed sphinx-doc/sphinx-9230 with the fix:

Edge type Before After
DEFINES 7064 5640
CALLS 0 4931
EXTENDS 0 484
PARAMETERS 0 552
RETURNS 0 143

(DEFINES count differs because the previous run included tests/ and
this re-run excluded it; the headline is CALLS going from 0 → 4931.)

The benchmark harness will now show code_graph's true value on medium /
large repos, where it was previously running against a half-empty graph
and producing artificial regressions.

Summary by CodeRabbit

  • Bug Fixes

    • Improved error diagnostics: Analysis now logs detailed debugging information including file paths, line/column positions, and exception details when errors occur
    • Enhanced file handling: Better error handling for missing files during symbol resolution processing
  • New Features

    • Virtual environment support: Automatically detects and uses local Python virtual environments

Review Change Stack

The Python analyzer hardcoded `environment_path={path}/venv` when starting
jedi-language-server via multilspy. When the repo had no venv (the common
case for cloned codebases like sphinx, sympy, anything from SWE-bench),
jedi raised `InvalidPythonEnvironment` on every `request_definition()`
call. analyzer.resolve() then swallowed the exception silently and the
indexer produced a graph with DEFINES edges only — zero CALLS, zero
EXTENDS. Benchmark validation showed sphinx (5K functions) and sympy
(41K functions) had no resolved cross-references at all.

Fix:
- source_analyzer.py: prefer {repo}/venv, then {repo}/.venv, then fall
  back to the host interpreter's environment (sys.executable's prefix)
  so jedi always has a valid Python to introspect.
- analyzer.py: log resolve() failures at WARN with file/line context
  instead of swallowing them silently, so the next regression is loud.

Verified: re-indexed sphinx-doc/sphinx-9230 with the fix:
  DEFINES: 5640, CALLS: 4931, EXTENDS: 484 (was DEFINES-only).

Fixes #685.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

Symbol resolution in the Python analyzer now logs exceptions with file and node context instead of silently failing. Second-pass analysis detects local Python virtual environments and safely handles missing files by skipping them with a warning rather than crashing.

Changes

Robust Symbol Resolution with Error Visibility

Layer / File(s) Summary
Exception logging in resolve()
api/analyzers/analyzer.py
AbstractAnalyzer.resolve() catches Exception as e and logs a structured warning with file_path, node start position (row/column), and exception details, instead of returning an empty list silently.
Python environment detection and safe file iteration in second_pass()
api/analyzers/source_analyzer.py
SourceAnalyzer.second_pass() detects project-local venv or .venv with bin/python and passes the interpreter path to MultilspyConfig; fallback to sys.executable when absent. File iteration uses self.files.get() instead of direct indexing, logging a warning and continuing when files are missing instead of raising KeyError.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Poem

🐰 From silent shadows, the errors now sing,
With logs that whisper each exception's sting,
A venv found, or Python declared,
Missing files handled—now nothing's scared,
The graph builds stronger, no ghosts in the night! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly references fixing LSP CALLS edge resolution on repos without venv, which directly matches the main changes in analyzer.py and source_analyzer.py.
Linked Issues check ✅ Passed The PR directly addresses issue #685 by logging resolve() failures with context and implementing virtualenv detection (venv → .venv → sys.executable fallback) to enable CALLS/EXTENDS edge resolution.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing #685: exception logging in analyzer.py and virtualenv path resolution in source_analyzer.py with defensive file tracking.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dvirdukhan/fix-analyzer-silent-failures

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DvirDukhan
Copy link
Copy Markdown
Contributor Author

Smoke benchmark with fix (n=3, Sonnet 4.5, step-limit 50)

After re-indexing all 3 SWE-bench worktrees with the fix and re-running the smoke:

config total tokens vs baseline
baseline 2,143,778
lsp 1,844,540 −14.0%
code_graph 1,368,152 −36.2%

Per-instance, code_graph wins everywhere (was losing badly on sympy before):

instance baseline code_graph Δ
pytest-dev/pytest-6202 630,303 142,795 −77.3%
sphinx-doc/sphinx-9230 894,514 725,862 −18.9%
sympy/sympy-20154 618,961 499,495 −19.3%

Sympy went from +78.7% (regression) to −19.3% (savings) — straight-line attributable to this fix turning a half-empty graph into a usable one. The 98 pp swing is direct evidence #685 was the dominant noise source in earlier runs.

Edge counts before → after:

  • sphinx: DEFINES 7064, CALLS 0 → DEFINES 5640, CALLS 4931
  • sympy: DEFINES 41525, CALLS 0 → DEFINES 30002, CALLS 22803

In source_analyzer.second_pass, the list of files we iterate can include
paths that first_pass did not add to self.files (e.g. parse errors,
LSP-induced timeouts, or rare edge cases where a candidate file is
present in the input list but never makes it into the files map).
Previously this raised KeyError and aborted the entire index. Hit on
sympy/polys/distributedmodules.py during bench calibration of sympy-12481.

Skip with a WARN log instead so a single bad file no longer takes down
the whole index.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
api/analyzers/source_analyzer.py (1)

142-142: ⚡ Quick win

Move sys import to module level.

The sys import should be at the top of the file with other imports, not inside the method. This follows PEP 8 conventions and improves code readability.

♻️ Proposed fix

Add the import at the module level (after line 21):

 import logging
+import sys
 # Configure logging

Remove the import from the method:

     if any(path.rglob('*.py')):
-        import sys
         py_venv = path / "venv"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/analyzers/source_analyzer.py` at line 142, Remove the inline "import sys"
from inside the method in source_analyzer.py and add a single "import sys" at
the top of the module alongside the other imports (module-level). Ensure you
only have one import statement for sys and delete the in-method import so
PEP8-compliant, readable code is preserved; no other code changes required.
api/analyzers/analyzer.py (1)

60-60: ⚡ Quick win

Move logging import to module level.

Importing logging inside the exception handler is inefficient. Although Python caches module imports, the import statement executes on every exception. Move the import to the top of the file with the other imports.

♻️ Proposed fix

Add the import at the module level (after line 8):

 from multilspy import SyncLanguageServer
+import logging

Remove the import from the except block:

     except Exception as e:
-        import logging
         logging.getLogger(__name__).warning(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/analyzers/analyzer.py` at line 60, The import of logging currently inside
the exception handler should be moved to the module level: add "import logging"
with the other top-level imports near the top of api/analyzers/analyzer.py, and
remove the inline "import logging" from the except block (the exception handler
in the analyzer code where logging is currently imported). Ensure subsequent
uses of logging in that function/class (e.g., in the except block) refer to the
module-level logging object.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@api/analyzers/analyzer.py`:
- Line 60: The import of logging currently inside the exception handler should
be moved to the module level: add "import logging" with the other top-level
imports near the top of api/analyzers/analyzer.py, and remove the inline "import
logging" from the except block (the exception handler in the analyzer code where
logging is currently imported). Ensure subsequent uses of logging in that
function/class (e.g., in the except block) refer to the module-level logging
object.

In `@api/analyzers/source_analyzer.py`:
- Line 142: Remove the inline "import sys" from inside the method in
source_analyzer.py and add a single "import sys" at the top of the module
alongside the other imports (module-level). Ensure you only have one import
statement for sys and delete the in-method import so PEP8-compliant, readable
code is preserved; no other code changes required.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 69f73010-8608-4864-8c63-c30c8ce94c0e

📥 Commits

Reviewing files that changed from the base of the PR and between b066064 and a112c6d.

📒 Files selected for processing (2)
  • api/analyzers/analyzer.py
  • api/analyzers/source_analyzer.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python analyzer silently swallows multilspy errors → empty CALLS edges on large codebases

1 participant