feat(graph): per-branch graph identity (T17 #651)#675
Conversation
Refactor FalkorDB graph naming so each (project, branch) pair gets
its own graph: 'code:{project}:{branch}'. This lets concurrent agents
working on different branches of the same repo index in parallel
without overwriting each other.
Changes:
- api/graph.py: add DEFAULT_BRANCH, compose_graph_name(),
parse_graph_name(); Graph and AsyncGraphQuery constructors now
accept (name, branch=None); Graph.from_raw_name() classmethod for
internal callers that need to bypass composition (e.g. clone());
get_repos()/async_get_repos() now return {project, branch, graph}
dicts.
- api/info.py: branch-aware Redis hash keys
('{repo}:{branch}_info'); reads fall back to legacy '{repo}_info'
for un-migrated graphs.
- api/git_utils: GitRepoName() and switch_commit() thread branch
through; LegacyGitRepoName() retained for the migration helper.
- api/project.py: detect_branch() via 'git rev-parse --abbrev-ref
HEAD'; Project.__init__ / from_git_repository /
from_local_repository accept branch.
- api/index.py: all Pydantic request models gain
'branch: Optional[str]'; endpoints thread it into
AsyncGraphQuery + info functions; responses include 'branch'.
- api/cli.py: --branch flag on index / index-repo / search /
neighbors / paths / info; new 'cgraph migrate' command.
- api/migrations/per_branch.py (NEW): idempotent migration that
renames legacy '<project>' graphs to 'code:<project>:_default',
'{<project>}_info' Redis keys to '{<project>}:_default_info',
and '{<project>}_git' graphs to '{<project>}:_default_git'.
Supports --dry-run.
Tests:
- tests/test_per_branch_graphs.py (NEW): 24 unit tests covering
compose/parse helpers, Graph constructor branch awareness,
AsyncGraphQuery, info-key shape, GitRepoName shape, and migration
idempotency (with mocked FalkorDB).
- tests/test_async_graph.py, tests/test_cli.py,
tests/endpoints/test_list_repos.py: updated assertions for the
new dict return shape from get_repos / async_get_repos.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughImplements per-(repo, branch) graph identity: compose graph names as code:{project}:{branch}, add branch-aware Graph/AsyncGraphQuery/Project/info/git utilities, propagate branch through CLI and REST endpoints, add one-shot migration, and update tests and coverage/analyzer/LLM callsites. ChangesPer-branch graph identity
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Implements per-(project, branch) graph identity (issue #651, T17) so concurrent agents indexing the same repo on different branches no longer overwrite each other. Graph names become code:{project}:{branch} (default branch _default), with matching {project}:{branch}_info Redis hashes and {project}:{branch}_git history graphs. Adds CLI --branch support, a cgraph migrate command for legacy graphs, branch-aware FastAPI request/response models, and 24 new unit tests.
Changes:
- New naming primitives (
compose_graph_name/parse_graph_name/DEFAULT_BRANCH) plus branch-awareGraph,AsyncGraphQuery,get_repos,set_/get_repo_*andGitRepoName, with legacy-key fallback on reads. - CLI gains
--branchonindex/index-repo/search/neighbors/paths/info(auto-detected viagit rev-parse --abbrev-ref HEAD) and a new idempotentcgraph migrate [--dry-run]powered byapi/migrations/per_branch.py. - REST endpoints accept optional
branchin models/query params and surface it in responses;get_reposreturns{project, branch, graph}dicts; tests updated accordingly.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| api/graph.py | Adds branch-aware naming helpers, updates Graph/AsyncGraphQuery, repo listing dict shape, from_raw_name/clone. |
| api/info.py | Branch-scoped Redis keys with legacy fallback for reads. |
| api/project.py | detect_branch via git, branch propagation through Project and analysis. |
| api/git_utils/git_utils.py | Branch-aware GitRepoName/build_commit_graph/switch_commit; legacy git key helper. |
| api/index.py | branch on all request models / read endpoints; responses include resolved branch. |
| api/cli.py | --branch option on commands; new migrate subcommand. |
| api/migrations/init.py / api/migrations/per_branch.py | One-shot legacy→_default graph/Redis migration with dry-run. |
| api/auto_complete.py | Threads branch into sync/async prefix search. |
| api/analyzers/source_analyzer.py | analyze_local_repository takes optional branch, auto-detects from path. |
| api/code_coverage/lcov/lcov.py | process_lcov accepts branch. |
| tests/test_per_branch_graphs.py | New tests for composition/parsing, Graph constructors, async repos, info/git key shapes, migration. |
| tests/test_async_graph.py / tests/test_cli.py / tests/endpoints/test_list_repos.py | Updated to new dict return shape of get_repos. |
| pyproject.toml / uv.lock | Adds [dependency-groups].dev with pytest-anyio. |
Comments suppressed due to low confidence (2)
api/index.py:232
ChatRequestexposes an optionalbranchfield, butchat()callsask(data.repo, data.msg)without it.ask()/_create_kg_agent()ultimately constructKnowledgeGraph(name=repo_name, ...)using the bare project name, which after T17 resolves to a different (and likely non-existent) FalkorDB key than the per-branch graph the user just indexed.
As a result, chat against a per-branch repo silently targets the wrong (legacy/missing) graph, and the branch parameter is misleadingly accepted but ignored. Either propagate data.branch through ask and _create_kg_agent (so the KnowledgeGraph is created with the composed code:{project}:{branch} name), or drop branch from ChatRequest until chat is wired up.
@app.post('/api/chat')
async def chat(data: ChatRequest, _=Depends(public_or_auth)):
"""Chat with the CodeGraph language model."""
try:
answer = await ask(data.repo, data.msg)
except Exception as e:
logging.exception("Chat error for repo '%s': %s", data.repo, e)
return JSONResponse({"status": "error", "response": "Internal server error"},
status_code=500)
return {"status": "success", "response": answer}
api/index.py:302
list_commitsnow selects a per-branch git graph viaGitRepoName(data.repo, data.branch), but unlike the other read endpoints in this PR (graph_entities,get_neighbors,auto_complete,repo_info,find_paths) the response does not include"branch". For consistency — and so the frontend can disambiguate which branch the commit list came from — please include"branch"(normalized via_normalize_branch/DEFAULT_BRANCH) in the response payload.
async def list_commits(data: RepoRequest, _=Depends(public_or_auth)):
"""List all commits of a specified repository."""
git_graph = AsyncGitGraph(git_utils.GitRepoName(data.repo, data.branch))
try:
commits = await git_graph.list_commits()
finally:
await git_graph.close()
return {"status": "success", "commits": commits}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
api/git_utils/git_utils.py (1)
303-318:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFail fast when the branch-scoped current commit is missing.
get_repo_commit()can returnNone, but this path continues intogit_graph.get_commits([current_hash, to])and only later raises a generic "Commits not found" error. A direct guard here makes the failure branch-specific and much easier to diagnose.Suggested change
current_hash = get_repo_commit(repo, branch) logging.info(f"Current graph commit: {current_hash}") + if current_hash is None: + raise ValueError(f'Missing current commit metadata for repository "{repo}"') if current_hash == to:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/git_utils/git_utils.py` around lines 303 - 318, Check for a missing branch-scoped commit immediately after calling get_repo_commit(repo, branch): if current_hash is None, log an error mentioning the branch and raise a ValueError with a descriptive message (e.g., "Current commit for branch X not found") instead of proceeding to git_graph.get_commits; update the existing debug log for current_hash to use an f-string (logging.debug(f"Current commit: {current_hash} is the requested commit")) and then keep the existing flow using git_graph.get_commits([current_hash, to]) only when current_hash is not None.api/cli.py (1)
229-243:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftExplicit
--branchdoes not actually select that branch.
Project.from_git_repository(url, branch=branch)only tags the graph withbranch; the factory inapi/project.pyclones the remote first and never checks out the requested branch beforeanalyze_sources(). That meanscgraph index-repo --branch feature-xcan index the default checkout into thefeature-xgraph, which breaks the per-branch isolation this PR is adding. Please either check out the requested branch immediately after cloning or fail fast until that path supports branch selection.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/cli.py` around lines 229 - 243, The branch option is never actually checked out after cloning — Project.from_git_repository(url, branch=branch) only tags the graph but does not change the working tree before analyze_sources(), so indexing uses the default checkout. After creating the Project (Project.from_git_repository) but before calling project.analyze_sources(), either perform an explicit checkout of the requested branch on the cloned repo (e.g., use the Project's repo handle to checkout branch) or raise an error if branch switching is unsupported; ensure the checkout occurs before calling project.analyze_sources(ignore=...) so analyze_sources() runs against the requested branch.
🧹 Nitpick comments (1)
api/auto_complete.py (1)
6-6: ⚡ Quick winFix
prefix_searchreturn type annotation to match actual return value.
prefix_searchappears to return completion collections (not a string), andasync_prefix_searchalready advertiseslist. Aligning these hints avoids type-checking drift.Suggested diff
-def prefix_search(repo: str, prefix: str, branch: Optional[str] = None) -> str: +def prefix_search(repo: str, prefix: str, branch: Optional[str] = None) -> list:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/auto_complete.py` at line 6, Update the return type annotation of prefix_search in api/auto_complete.py to match the actual returned completion collection (as async_prefix_search does) instead of str; change the annotation on def prefix_search(repo: str, prefix: str, branch: Optional[str] = None) -> str to a list-return type (e.g., -> list or a more specific -> list[dict] / -> List[Dict[str, Any]] or the actual Completion type used by async_prefix_search) so static typing aligns with the real return value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/git_utils/git_utils.py`:
- Around line 17-30: Rename the mixed-case helpers GitRepoName and
LegacyGitRepoName to snake_case (git_repo_name, legacy_git_repo_name) and update
all call sites and tests to use the new names; keep the same behavior/formatting
(including the "{repo}" hash-tag behavior) so callers aren't otherwise affected.
In addition, adjust get_repo_commit's signature/typing to return Optional[str]
and modify switch_commit to guard against a None current_hash (i.e., check if
current_hash is None before using it and handle the "missing commit" path
appropriately—fail fast or return a clear error), updating any logic that
assumes a non-null return. Ensure you change only symbol names and usages
(function names: GitRepoName/LegacyGitRepoName ->
git_repo_name/legacy_git_repo_name, get_repo_commit, and switch_commit) so
linters and type-checkers reflect the Optional return type.
In `@api/graph.py`:
- Around line 131-133: Normalize the branch value before storing it on the
wrapper: if the incoming branch is None or the empty string, set a normalized
value (use DEFAULT_BRANCH) and assign that to self.branch before calling
compose_graph_name so self.name matches the actual graph key; update the same
logic in both constructor sites (the block using self.project / self.branch /
self.name and the other occurrence around lines 798-800) so exposed self.branch
never remains "" and always reflects the key used by compose_graph_name.
- Around line 95-99: The loop over db.list_graphs() currently filters out names
using _is_internal_suffix(g) before parse_graph_name(g), causing valid composed
graph names like "code:repo:release_tmp" to be incorrectly skipped; change the
logic so you first call parse_graph_name(g) (check for None) and then, if parsed
indicates a branch/name to inspect, apply _is_internal_suffix to the branch
component (or the parsed full name) to decide whether to continue; update the
loop in the function that iterates db.list_graphs() so parse_graph_name(g) runs
first and only then use _is_internal_suffix on the appropriate parsed part.
In `@api/index.py`:
- Around line 78-81: ChatRequest.branch is collected but never used: update the
chat handling and ask flow so branch is respected or explicitly rejected. Either
(A) change ask(repo, msg) to ask(repo, msg, branch: Optional[str]) and propagate
data.branch from chat() into the ask(...) call and into the graph lookup logic
inside ask (use branch when resolving the repository/graph), and update any
other places that call ask(...) (search for other ask( calls) to pass an
optional branch; or (B) if branch-scoped graphs aren't supported yet, validate
in chat() that data.branch is None and return a 400/error for non-null branch.
Ensure the change references the ChatRequest.branch field, the chat() handler,
and the ask(...) function/graph lookup logic so branch is either threaded
through or explicitly rejected.
In `@api/migrations/per_branch.py`:
- Around line 70-80: The current rename flow aborts when
db.connection.exists(dst) is true which leaves stale src graphs if a prior run
copied dst but failed before deleting src; update the rename logic that uses
db.connection.exists, db.select_graph(src), g.copy(dst) and g.delete() so that
if dst exists but src still exists you detect this partial-run state and either
(a) verify the graphs are identical (e.g. compare checksums/metadata) and if so
delete src and log a completed-retry, or (b) if verification is not possible,
attempt an atomic safe-retry by copying src to a temp dst name then swapping or
else log a clear error and surface failure; ensure dry_run still only logs
actions and do not delete in dry_run.
In `@api/project.py`:
- Around line 68-71: In Project.__init__, normalize an explicit empty string
branch the same as missing by treating branch == "" like None before
autodetection: update the branch handling (the block using detect_branch(path)
and DEFAULT_BRANCH) to check if branch is None or branch == "" (or use branch =
branch or ... pattern) so autodetection runs for empty strings and then assign
the resolved value to self.branch; reference the Project.__init__ parameter
branch, detect_branch(path), DEFAULT_BRANCH, and self.branch when making the
change.
---
Outside diff comments:
In `@api/cli.py`:
- Around line 229-243: The branch option is never actually checked out after
cloning — Project.from_git_repository(url, branch=branch) only tags the graph
but does not change the working tree before analyze_sources(), so indexing uses
the default checkout. After creating the Project (Project.from_git_repository)
but before calling project.analyze_sources(), either perform an explicit
checkout of the requested branch on the cloned repo (e.g., use the Project's
repo handle to checkout branch) or raise an error if branch switching is
unsupported; ensure the checkout occurs before calling
project.analyze_sources(ignore=...) so analyze_sources() runs against the
requested branch.
In `@api/git_utils/git_utils.py`:
- Around line 303-318: Check for a missing branch-scoped commit immediately
after calling get_repo_commit(repo, branch): if current_hash is None, log an
error mentioning the branch and raise a ValueError with a descriptive message
(e.g., "Current commit for branch X not found") instead of proceeding to
git_graph.get_commits; update the existing debug log for current_hash to use an
f-string (logging.debug(f"Current commit: {current_hash} is the requested
commit")) and then keep the existing flow using
git_graph.get_commits([current_hash, to]) only when current_hash is not None.
---
Nitpick comments:
In `@api/auto_complete.py`:
- Line 6: Update the return type annotation of prefix_search in
api/auto_complete.py to match the actual returned completion collection (as
async_prefix_search does) instead of str; change the annotation on def
prefix_search(repo: str, prefix: str, branch: Optional[str] = None) -> str to a
list-return type (e.g., -> list or a more specific -> list[dict] / ->
List[Dict[str, Any]] or the actual Completion type used by async_prefix_search)
so static typing aligns with the real return value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a2741065-86c8-4bb0-86c2-c7ad45352562
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
api/analyzers/source_analyzer.pyapi/auto_complete.pyapi/cli.pyapi/code_coverage/lcov/lcov.pyapi/git_utils/git_utils.pyapi/graph.pyapi/index.pyapi/info.pyapi/migrations/__init__.pyapi/migrations/per_branch.pyapi/project.pypyproject.tomltests/endpoints/test_list_repos.pytests/test_async_graph.pytests/test_cli.pytests/test_per_branch_graphs.py
|
Actionable comments posted: 0 |
- pyproject + uv.lock: drop pytest-anyio 0.0.0 placeholder; anyio's pytest plugin (already required via tests extra) provides the @pytest.mark.anyio marker. (Copilot review) - api/graph.py: when listing graphs, apply the internal-suffix filter to the *branch* component for parsed names (e.g. code:repo:main_git is hidden, but code:repo:release_tmp stays visible). Legacy bare names still filtered by full name. (CodeRabbit graph.py:99) - api/graph.py: Graph/AsyncGraphQuery constructors now coerce branch="" to DEFAULT_BRANCH alongside branch=None, so self.branch matches the actual graph key produced by compose_graph_name. (CodeRabbit graph.py:133) - api/project.py: same coercion for Project.__init__; an explicit empty-string branch now autodetects (or falls back to DEFAULT_BRANCH) instead of staying "". (CodeRabbit project.py:71) - api/index.py + api/llm.py: ChatRequest.branch is now threaded through ask() / _ask_sync() / _create_kg_agent() and used to compose the FalkorDB graph key. Previously it was accepted on the request but silently ignored. (CodeRabbit index.py:81) - api/migrations/per_branch.py: when both src and dst graphs exist (interrupted prior migration) the rename helper now deletes the leftover legacy src and reports success, so reruns can complete cleanly instead of skipping forever. Dry-run path preserved. (CodeRabbit per_branch.py:80) - api/git_utils/git_utils.py: rename CamelCase helpers GitRepoName/LegacyGitRepoName -> git_repo_name/legacy_git_repo_name (snake_case per api guidelines); keep CamelCase as deprecated aliases so the public surface stays backwards compatible. Update the in-module callers. switch_commit now guards against a None current_hash and raises a clear ValueError instead of silently walking off the end. (CodeRabbit git_utils.py:30) - api/info.py: get_repo_commit return type tightened to Optional[str] to reflect the documented None return. (CodeRabbit git_utils.py:30) All 24 tests in tests/test_per_branch_graphs.py still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Actionable comments posted: 0 |
|
Reviewed #675 — great work, this is exactly what was needed. Migration story is solid, backwards-compat is real, hash-tag preservation is forward-thinking for cluster mode. Approving. Three small follow-ups worth tracking, not blockers:
|
Per @galshubeli's review on #675: single-arg GitRepoName('foo') used to return '{foo}_git' but now returns '{foo}:<DEFAULT_BRANCH>_git' — a silent shape change for anyone still on the alias. Make it loud with DeprecationWarning so leftover callers find out before they query a graph that no longer exists. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the approval @galshubeli! Addressed #3 in For the other two:
PR's ready to merge from my side. |
|
Actionable comments posted: 0 |
Closes #651.
Stacked on #666 (
mcp/t1-scaffold).What
Each FalkorDB graph is now identified by
(project, branch)ratherthan just
project. Concurrent agents indexing the same repo ondifferent branches no longer overwrite each other.
New graph name format:
code:{project}:{branch}(default branch:_default). The companion Redis info hash becomes{project}:{branch}_infoand the git-history graph becomes{project}:{branch}_git.Highlights
Graph/AsyncGraphQueryconstructors acceptbranch=None./api/*endpoints accept an optionalbranchfield; responsesinclude
branchso the frontend can disambiguate.--branchon every read/index command and auto-detectsfrom
git rev-parse --abbrev-ref HEADwhen omitted.cgraph migratecommand (idempotent,--dry-runsupported)promotes legacy
<project>graphs tocode:<project>:_default.tests/test_per_branch_graphs.py. Existingtest_async_graph.py/test_cli.py/test_list_repos.pyupdated for the new dict return shape of
get_repos.Backward compatibility
Graph("foo")still works → composes tocode:foo:_default.{foo}_infokey when thenew key is absent, so un-migrated repos remain queryable until
the operator runs
cgraph migrate.parse_graph_namereturnsNonefor legacy bare names socallers can detect and special-case them.
Migration story
One-shot rename of legacy artifacts via
cgraph migrate:<project>→code:<project>:_default{<project>}_info→{<project>}:_default_info{<project>}_git→{<project>}:_default_gitSafe to re-run.
--dry-runpreviews actions without writing.Not in scope
process_git_historyalready does (it now writes to the per-branch git graph but does
not yet track multiple branches in a single run).
Test status
tests/test_per_branch_graphs.py: 24 ✅tests/test_async_graph.py: 6 ✅tests/test_cli.py: 10 ✅Pre-existing failures unrelated to T17 (missing
tests/git_repofixture, hard-coded port 6379 in
test_graph_ops.py, analyzertests requiring specific source builds) remain unchanged from
mcp/t1-scaffold.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com
Summary by CodeRabbit
New Features
Refactor
Tests
Chores