From 598b014456e90a4eabc4384c352add92f24811dd Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 22 Apr 2026 09:58:16 -0400 Subject: [PATCH 1/4] docs/semcode-mcp: Compact per-tool descriptions The MCP instructions are loaded into the LLM context at every session start. Repeating the git_sha/branch and pagination parameter blocks on every tool entry wastes roughly a third of the tokens without adding information that the preamble has not already established. Factor the repeated conventions (git_sha/branch, pagination, date filters, *_patterns array semantics) into a Common parameters section and reduce each tool entry to the parameters unique to that tool. Fold the reachable_sha caveat from the Recipes section into find_commit where it applies. --- docs/semcode-mcp.md | 131 +++++++++++++++++++------------------------- 1 file changed, 56 insertions(+), 75 deletions(-) diff --git a/docs/semcode-mcp.md b/docs/semcode-mcp.md index 1efa3b4..2780007 100644 --- a/docs/semcode-mcp.md +++ b/docs/semcode-mcp.md @@ -3,30 +3,36 @@ All semcode functions are git aware and default to lookups on the current commit. You can also pass a specific commit you're interested in, or a branch name. -**Note on Regex Patterns**: All regex patterns in semcode are **case-insensitive by default**. This applies to all pattern matching including function names, commit messages, symbols, and lore email searches. You don't need to use the `(?i)` flag. +**Regex**: all patterns are case-insensitive; no `(?i)` needed. Applies to +function names, commit messages, symbols, and lore email searches. -**Branch Support**: Most query tools support a `branch` parameter as an alternative to `git_sha`. When you specify a branch name (e.g., "main", "develop"), it will be resolved to the current tip commit of that branch. Branch takes precedence over git_sha if both are provided. +## Common parameters + +Most query tools accept these; they are not repeated per tool. + +- **git_sha**: commit to search (default: current) +- **branch**: branch name, resolved to its tip (e.g., "main"); takes + precedence over git_sha if both are given +- **page**: pagination (1-based), 50 lines per page; omit for full results +- **since_date / until_date**: e.g., "yesterday", "2 weeks ago", + "2024-01-15" +- **\*_patterns**: arrays of regex. `author_patterns`, `subject_patterns`, + `from_patterns`, `body_patterns`, `recipients_patterns`, + `symbols_patterns`, `path_patterns` are OR'd within an array. + `regex_patterns` and `symbol_patterns` are AND'd within an array. + +## Code lookup **find_function**: search for functions and macros - - git_sha: indicates which commit to search (default: current) - - branch: branch name to search (alternative to git_sha, e.g., "main", "develop") - name: function/macro name, or a regex - also displays details on callers and callees **find_type**: search for types and typedefs - - git_sha: indicates which commit to search (default: current) - - branch: branch name to search (alternative to git_sha, e.g., "main", "develop") - name: type/typedef name or regex **find_callers**: find all functions that call a function or macro - - git_sha: indicates which commit to search (default: current) - - branch: branch name to search (alternative to git_sha, e.g., "main", "develop") - name: function to search **find_calls**: find all functions called by a function or macro - - git_sha: indicates which commit to search (default: current) - - branch: branch name to search (alternative to git_sha, e.g., "main", "develop") - name: function to search **find_callchain**: search complete function/macro call chain (forward and reverse) - - git_sha: indicates which commit to search (default: current) - - branch: branch name to search (alternative to git_sha, e.g., "main", "develop") - name: function or macro to search - up_levels: number of caller levels to show (default: 2, 0 = unlimited) - down_levels: number of callee levels to show (default: 3, 0 = unlimited) @@ -34,9 +40,10 @@ commit. You can also pass a specific commit you're interested in, or a branch n **diff_functions**: extract functions and types from a unified diff - diff_content: the string to analyze - Use this to determine which symbols are involved in a given diff + +## Code search + **grep_functions**: search function/macro bodies for a regex - - git_sha: indicates which commit to search (default: current) - - branch: branch name to search (alternative to git_sha, e.g., "main", "develop") - pattern: the regex to search for - verbose: boolean, if true show full function bodies (default: false) - path_pattern: optional regex to filter results by path @@ -44,87 +51,65 @@ commit. You can also pass a specific commit you're interested in, or a branch n - this only searches inside functions or macros, there's no need to escape your pattern to limit the search. **vgrep_functions**: vector embedding search on functions/macros/types - - git_sha: indicates which commit to search (default: current) - - branch: branch name to search (alternative to git_sha, e.g., "main", "develop") - query_text: text describing the kind of functions to find (e.g., "memory allocation", "string comparison") - path_pattern: optional regex to filter results by path - limit: max number of results to return (default: 10, max: 100) - Embedding searches are only useful when you want to search for broad concepts that a regex won't find well. - The database might not have embeddings indexed + +## Commit search + **find_commit**: search for changes, potentially in a range of commits - - This can return a large body of results. Use pagination to manage context - - git_ref: single commit ref to lookup (sha, short sha, branch, HEAD etc) - - git_range: optional git range to search multiple commits: HEAD~10..HEAD etc + - Can return a large body of results; use pagination to manage context. + - git_ref: single commit ref (sha, short sha, branch, HEAD, etc.) + - git_range: optional range for multiple commits, e.g., HEAD~10..HEAD; cannot be combined with git_ref - - author_patterns: optional array of regex to filter by author name/email (OR logic) - - subject_patterns: optional array of regex to filter by subject line (OR logic) - - regex_patterns: optional array of regex patterns to filter commits. - - All patterns are AND'd together - - Applied against the combination of commit message and unified diff - - symbol_patterns: optional array of regex of symbols to search for - - Use this to quickly find commits changing a function or type (w/regex) - - path_patterns: optional regex to filter commits based on which files they - change. Multiple regex can be passed and will be OR'd together - - page: optional page number for pagination (1-based). Each page contains - 50 lines, results indicate current page and total pages. Default: full results - - reachable_sha: optional git sha, filter results to only those reachable from the - sha provided. Mutually exclusive with git_range + - reachable_sha: optional git sha; filter to results reachable from it. + Mutually exclusive with git_range. To search commits reachable from + HEAD, pass reachable_sha=HEAD alone -- do NOT also pass a git_range. + - regex_patterns: applied against commit message + unified diff + - symbol_patterns: find commits changing a function or type - verbose: show full diff in addition to metadata (default: false) + - accepts: author_patterns, subject_patterns, path_patterns **vcommit_similar_commits**: search commits based on vector embeddings - - git_range: optional git range to search multiple commits: HEAD~10..HEAD etc - query_text: search text - - author_patterns: optional array of regex to filter by author name/email (OR logic) - - subject_patterns: optional array of regex to filter by subject line (OR logic) - - regex_patterns: array of regex AND'd together to limit search results - - symbol_patterns: array of regex AND'd together to limit search results based - on symbols changed in the commit - - path_patterns: optional regex to filter commits based on which files they - change. Multiple regex can be passed and will be OR'd together - - limit: max results to return (default 10, max 50) - - reachable_sha: optional git sha, filter results to only those reachable from the - sha provided. Mutually exclusive with git_range - - page: optional page number for pagination (1-based). Each page contains - 50 lines, results indicate current page and total pages. Default: full results + - git_range: optional range, e.g., HEAD~10..HEAD + - reachable_sha: optional git sha, reachable-from filter. Mutually + exclusive with git_range. + - regex_patterns: AND'd to limit results + - symbol_patterns: AND'd to limit results by symbols changed + - limit: max results (default 10, max 50) + - accepts: author_patterns, subject_patterns, path_patterns + +## Lore (kernel mailing list archive) + **lore_search**: search lore.kernel.org email archives - - from_patterns: optional array of regex to filter by sender (OR logic) - - subject_patterns: optional array of regex to filter by subject (OR logic) - - body_patterns: optional array of regex to filter by message body (OR logic) - - symbols_patterns: optional array of regex to filter by symbols in patches (OR logic) - - recipients_patterns: optional array of regex to filter by recipients (OR logic) - message_id: optional exact message ID for direct lookup - verbose: show full message body (default: false) - show_thread: show full email thread for each match (default: false) - - show_replies: show replies/subthreads under each match (default: false, mutually exclusive with show_thread) - - limit: max number of results (default: 100, 0 = unlimited) - - since_date: filter emails from this date onwards (e.g., "yesterday", "2 weeks ago", "2024-01-15") - - until_date: filter emails up to this date + - show_replies: show replies/subthreads under each match + (default: false, mutually exclusive with show_thread) - mbox: output in MBOX format with full headers and body (default: false) - - page: optional page number for pagination (1-based). Each page contains - 50 lines, results indicate current page and total pages. Default: full results + - limit: max number of results (default: 100, 0 = unlimited) + - accepts: from_patterns, subject_patterns, body_patterns, + symbols_patterns, recipients_patterns **dig**: find lore.kernel.org emails related to a git commit - commit: git commit reference (SHA, short SHA, HEAD, branch name, etc.) - verbose: show full message body (default: false) - show_all: show all duplicate results, not just most recent (default: false) - - show_thread: show full thread for each result (use with show_all, default: false) - - show_replies: show replies/subthreads under each result (use with show_all, mutually exclusive with show_thread) - - since_date: filter emails from this date onwards - - until_date: filter emails up to this date - - page: optional page number for pagination (1-based). Each page contains - 50 lines, results indicate current page and total pages. Default: full results + - show_thread: show full thread for each result (use with show_all) + - show_replies: show replies/subthreads (use with show_all, mutually + exclusive with show_thread) **vlore_similar_emails**: semantic vector search over lore.kernel.org emails - query_text: text describing the kind of emails to find (e.g., "memory leak fix", "performance optimization") - - from_patterns: optional array of regex to filter by sender (OR logic) - - subject_patterns: optional array of regex to filter by subject (OR logic) - - body_patterns: optional array of regex to filter by message body (OR logic) - - symbols_patterns: optional array of regex to filter by symbols in patches (OR logic) - - recipients_patterns: optional array of regex to filter by recipients (OR logic) - limit: max number of results to return (default: 20, max: 100) - - since_date: filter emails from this date onwards - - until_date: filter emails up to this date - - page: optional page number for pagination (1-based). Each page contains - 50 lines, results indicate current page and total pages. Default: full results + - accepts: from_patterns, subject_patterns, body_patterns, + symbols_patterns, recipients_patterns - The database might not have lore embeddings indexed + +## Branch / status + **list_branches**: list all indexed branches with their status - No parameters required - Shows branch names, indexed commit SHAs, and freshness status @@ -178,7 +163,3 @@ backported commit is searching by commit subject: semcode> commit -r "bnxt_en: Fix memory corruption when FW resources change during ifdown" semcode> commit -r "bnxt_en: Fix memory corruption when FW resources change during ifdown" --reachable HEAD ``` - -❌ WRONG: reachable_sha=HEAD + git_range=HEAD~5000..HEAD -❌ WRONG: git_range=HEAD~5000..HEAD -✅ CORRECT: reachable_sha=HEAD only (no git_range) From 927d79240d8d04ead681565860d5dc043c2863a9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 22 Apr 2026 10:06:26 -0400 Subject: [PATCH 2/4] docs/semcode-mcp: Further tighten per-tool descriptions Apply additional token-efficiency reductions to the MCP tool guide beyond the initial compaction pass. The lazy-loading section previously duplicated information that the server's meta-tool schemas expose at runtime; collapse it to a single pointer. Factor repeated '(default: false)' markers on boolean parameters into one conventions line rather than restating the default on every tool. Hoist the reachable_sha / git_range mutual-exclusion rule to the Commit search section header so it is not duplicated across two tool entries. Trim examples and filler phrasings that the parameter names and prior sentences already convey. Net reduction is roughly 15 to 20 percent of the file, with no load-bearing details removed. --- docs/semcode-mcp.md | 106 +++++++++++++++++--------------------------- 1 file changed, 41 insertions(+), 65 deletions(-) diff --git a/docs/semcode-mcp.md b/docs/semcode-mcp.md index 2780007..d93137d 100644 --- a/docs/semcode-mcp.md +++ b/docs/semcode-mcp.md @@ -8,8 +8,6 @@ function names, commit messages, symbols, and lore email searches. ## Common parameters -Most query tools accept these; they are not repeated per tool. - - **git_sha**: commit to search (default: current) - **branch**: branch name, resolved to its tip (e.g., "main"); takes precedence over git_sha if both are given @@ -21,6 +19,9 @@ Most query tools accept these; they are not repeated per tool. `symbols_patterns`, `path_patterns` are OR'd within an array. `regex_patterns` and `symbol_patterns` are AND'd within an array. +**Conventions**: boolean parameters default to `false`; `limit: 0` +means unlimited unless a max is given. + ## Code lookup **find_function**: search for functions and macros @@ -45,38 +46,38 @@ Most query tools accept these; they are not repeated per tool. **grep_functions**: search function/macro bodies for a regex - pattern: the regex to search for - - verbose: boolean, if true show full function bodies (default: false) + - verbose: if true, show full function bodies - path_pattern: optional regex to filter results by path - - limit: max number of results to return (default: 100, 0 = unlimited) - - this only searches inside functions or macros, there's no need to escape - your pattern to limit the search. + - limit: max number of results (default: 100) + - only searches inside functions or macros; no need to escape + your pattern to limit the search **vgrep_functions**: vector embedding search on functions/macros/types - - query_text: text describing the kind of functions to find (e.g., "memory allocation", "string comparison") + - query_text: text describing the kind of functions to find - path_pattern: optional regex to filter results by path - - limit: max number of results to return (default: 10, max: 100) - - Embedding searches are only useful when you want to search for broad - concepts that a regex won't find well. - - The database might not have embeddings indexed + - limit: max number of results (default: 10, max: 100) + - only useful for broad concepts that a regex won't find well + - the database might not have embeddings indexed ## Commit search +In both tools below, `reachable_sha` and `git_range` are mutually +exclusive. To search commits reachable from HEAD, pass +`reachable_sha=HEAD` alone. + **find_commit**: search for changes, potentially in a range of commits - - Can return a large body of results; use pagination to manage context. + - can return a large body of results; use pagination to manage context - git_ref: single commit ref (sha, short sha, branch, HEAD, etc.) - git_range: optional range for multiple commits, e.g., HEAD~10..HEAD; cannot be combined with git_ref - - reachable_sha: optional git sha; filter to results reachable from it. - Mutually exclusive with git_range. To search commits reachable from - HEAD, pass reachable_sha=HEAD alone -- do NOT also pass a git_range. + - reachable_sha: optional git sha; filter to results reachable from it - regex_patterns: applied against commit message + unified diff - symbol_patterns: find commits changing a function or type - - verbose: show full diff in addition to metadata (default: false) + - verbose: show full diff in addition to metadata - accepts: author_patterns, subject_patterns, path_patterns **vcommit_similar_commits**: search commits based on vector embeddings - query_text: search text - git_range: optional range, e.g., HEAD~10..HEAD - - reachable_sha: optional git sha, reachable-from filter. Mutually - exclusive with git_range. + - reachable_sha: optional git sha, reachable-from filter - regex_patterns: AND'd to limit results - symbol_patterns: AND'd to limit results by symbols changed - limit: max results (default 10, max 50) @@ -86,70 +87,45 @@ Most query tools accept these; they are not repeated per tool. **lore_search**: search lore.kernel.org email archives - message_id: optional exact message ID for direct lookup - - verbose: show full message body (default: false) - - show_thread: show full email thread for each match (default: false) + - verbose: show full message body + - show_thread: show full email thread for each match - show_replies: show replies/subthreads under each match - (default: false, mutually exclusive with show_thread) - - mbox: output in MBOX format with full headers and body (default: false) - - limit: max number of results (default: 100, 0 = unlimited) + (mutually exclusive with show_thread) + - mbox: output in MBOX format with full headers and body + - limit: max number of results (default: 100) - accepts: from_patterns, subject_patterns, body_patterns, symbols_patterns, recipients_patterns **dig**: find lore.kernel.org emails related to a git commit - commit: git commit reference (SHA, short SHA, HEAD, branch name, etc.) - - verbose: show full message body (default: false) - - show_all: show all duplicate results, not just most recent (default: false) + - verbose: show full message body + - show_all: show all duplicate results, not just most recent - show_thread: show full thread for each result (use with show_all) - show_replies: show replies/subthreads (use with show_all, mutually exclusive with show_thread) **vlore_similar_emails**: semantic vector search over lore.kernel.org emails - - query_text: text describing the kind of emails to find (e.g., "memory leak fix", "performance optimization") - - limit: max number of results to return (default: 20, max: 100) + - query_text: text describing the kind of emails to find + - limit: max number of results (default: 20, max: 100) - accepts: from_patterns, subject_patterns, body_patterns, symbols_patterns, recipients_patterns - - The database might not have lore embeddings indexed + - the database might not have lore embeddings indexed ## Branch / status -**list_branches**: list all indexed branches with their status - - No parameters required - - Shows branch names, indexed commit SHAs, and freshness status - - **up-to-date**: indexed commit matches current branch tip - - **outdated**: branch has new commits since indexing (re-index to update) - - Useful for tracking multiple stable branches (e.g., linux-5.10.y, 6.1.y, 6.12.y) - and knowing when they need re-indexing after new releases -**compare_branches**: compare two branches and show their relationship - - branch1: first branch name (e.g., "main") - - branch2: second branch name (e.g., "feature-branch") - - Shows merge base, ahead/behind status, and indexing status for both branches -**indexing_status**: check the status of background indexing operation - - No parameters required - - Shows current indexing progress, errors, and timing +**list_branches**: list indexed branches with indexed SHA and + freshness (up-to-date vs. outdated against current tip). No + parameters. +**compare_branches**: compare two branches; shows merge base, + ahead/behind status, and indexing status for both + - branch1, branch2: branch names +**indexing_status**: show background indexing progress, errors, + and timing. No parameters. ## Lazy Loading -To reduce the initial context size consumed by the MCP server (saving ~96% of initial tokens), you can start the server in **lazy mode** using the `--lazy` flag. - -In lazy mode, the server initially exposes only 3 meta-tools: - -**list_categories**: List available tool categories - - No parameters required - - Returns a list of categories (e.g., `code_lookup`, `code_search`) and their descriptions - - Use this first to discover what semcode can do - -**get_tools**: Get full schemas for tools in a category - - category: The name of the category to inspect (from `list_categories`) - - Returns the full tool definitions for all tools in that category - - Use this to learn how to call specific tools - -**call_tool**: Execute a specific tool - - tool_name: Name of the tool to execute (e.g., `find_function`) - - arguments: Object containing the arguments for the tool - - Use this to run tools after you've discovered them - -**Workflow**: -1. Call `list_categories` to see available functionality -2. Call `get_tools` for a relevant category (e.g., `code_lookup`) -3. Call `call_tool` to execute the desired tool (e.g., `find_function`) +Start the server with `--lazy` to cut initial context ~96%. The +server then exposes only three meta-tools (`list_categories`, +`get_tools`, `call_tool`); call them in that order to discover +and invoke full tools on demand. ## Recipes From c7ac6655f0cfa916d086f98336a19786de793584 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 22 Apr 2026 10:13:19 -0400 Subject: [PATCH 3/4] docs/semcode-mcp: Resolve naming and required-parameter ambiguities Four ambiguities in the tool guide could steer agents into wrong or rejected tool calls. First, the singular 'symbol_patterns' used by commit tools and the plural 'symbols_patterns' used by lore tools have opposite grouping semantics (AND vs OR) but near-identical names. The common-parameters summary stated the rule once, but per-tool bullets did not repeat which pattern arrays are AND'd or OR'd, so an agent scanning a single tool entry could reasonably assume uniform behavior. Mark the grouping inline on each pattern array in the per-tool descriptions. Second, find_commit takes 'git_ref' rather than the common 'git_sha'. The deviation was undocumented; an agent following the common parameters section would pass 'git_sha' and receive a validation error. Note the rename explicitly. Third, the relationship between git_ref, git_range, and reachable_sha in find_commit was underspecified. git_range and git_ref are mutually exclusive as commit selectors; reachable_sha is a filter that may accompany either, or stand alone to mean 'all indexed commits reachable from this sha'. State the three-way relationship explicitly. Fourth, dig's 'commit' parameter is required in the JSON schema but the doc did not mark it as such. Flag it so agents supply it. --- docs/semcode-mcp.md | 35 ++++++++++++++++++++++------------- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/docs/semcode-mcp.md b/docs/semcode-mcp.md index d93137d..dd24d07 100644 --- a/docs/semcode-mcp.md +++ b/docs/semcode-mcp.md @@ -60,31 +60,39 @@ means unlimited unless a max is given. ## Commit search -In both tools below, `reachable_sha` and `git_range` are mutually -exclusive. To search commits reachable from HEAD, pass -`reachable_sha=HEAD` alone. +Note: commit tools use **`git_ref`** (not `git_sha` from the common +parameters) and **`symbol_patterns`** (singular; AND'd -- distinct +from lore's plural `symbols_patterns`, which is OR'd). + +Commit selection in `find_commit`: `git_ref` and `git_range` are +mutually exclusive. `reachable_sha` is a filter that may accompany +either, or stand alone (with no `git_ref` or `git_range`) to mean +"all indexed commits reachable from this sha". **find_commit**: search for changes, potentially in a range of commits - can return a large body of results; use pagination to manage context - git_ref: single commit ref (sha, short sha, branch, HEAD, etc.) - - git_range: optional range for multiple commits, e.g., HEAD~10..HEAD; - cannot be combined with git_ref + - git_range: optional range for multiple commits, e.g., HEAD~10..HEAD - reachable_sha: optional git sha; filter to results reachable from it - - regex_patterns: applied against commit message + unified diff - - symbol_patterns: find commits changing a function or type + - regex_patterns (AND'd): applied against commit message + unified diff + - symbol_patterns (AND'd): find commits changing a function or type + - author_patterns, subject_patterns, path_patterns (each OR'd) - verbose: show full diff in addition to metadata - - accepts: author_patterns, subject_patterns, path_patterns **vcommit_similar_commits**: search commits based on vector embeddings - query_text: search text - git_range: optional range, e.g., HEAD~10..HEAD - - reachable_sha: optional git sha, reachable-from filter - - regex_patterns: AND'd to limit results - - symbol_patterns: AND'd to limit results by symbols changed + - reachable_sha: optional git sha, reachable-from filter (combinable + with git_range) + - regex_patterns (AND'd), symbol_patterns (AND'd) + - author_patterns, subject_patterns, path_patterns (each OR'd) - limit: max results (default 10, max 50) - - accepts: author_patterns, subject_patterns, path_patterns ## Lore (kernel mailing list archive) +Lore tools use **`symbols_patterns`** (plural; OR'd within the array -- +distinct from commit tools' singular `symbol_patterns`, which is AND'd). +All `*_patterns` arrays below are OR'd within the array. + **lore_search**: search lore.kernel.org email archives - message_id: optional exact message ID for direct lookup - verbose: show full message body @@ -96,7 +104,8 @@ exclusive. To search commits reachable from HEAD, pass - accepts: from_patterns, subject_patterns, body_patterns, symbols_patterns, recipients_patterns **dig**: find lore.kernel.org emails related to a git commit - - commit: git commit reference (SHA, short SHA, HEAD, branch name, etc.) + - commit (required): git commit reference (SHA, short SHA, HEAD, + branch name, etc.) - verbose: show full message body - show_all: show all duplicate results, not just most recent - show_thread: show full thread for each result (use with show_all) From 946a4c1a05e4fa2536cfc3829ae17d336beb451e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 22 Apr 2026 10:16:24 -0400 Subject: [PATCH 4/4] docs/semcode-mcp: Address minor ambiguities and examples A second review pass turned up several smaller doc issues that do not cause validation errors on their own, but do send agents on needless round-trips. The page-size convention ('50 lines per page') did not say what 'lines' measures -- it is output lines of rendered text, not result records. The grep_functions hint 'no need to escape your pattern' reads ambiguously, as if regex metacharacters were escaped automatically; the intent is that the search is already scoped to function and macro bodies, so anchors are unnecessary. The 'limit: 0 = unlimited' convention collides silently with tools that declare an explicit max (vgrep_functions, vcommit_similar_commits, vlore_similar_emails); note that an explicit max wins. The find_callers, find_calls, and find_callchain descriptions referred variously to 'functions' and 'functions or macros' on the two sides of an edge; make clear that both sides of a call edge include function-like macros. The diff_functions parameter description 'the string to analyze' becomes 'unified diff text (e.g., output of git diff)'. Rewrite the commit-reachable recipe as an MCP tool call rather than the query tool's CLI syntax, since this file is loaded as MCP server instructions. --- docs/semcode-mcp.md | 43 ++++++++++++++++++++++++++----------------- 1 file changed, 26 insertions(+), 17 deletions(-) diff --git a/docs/semcode-mcp.md b/docs/semcode-mcp.md index dd24d07..1ac97c3 100644 --- a/docs/semcode-mcp.md +++ b/docs/semcode-mcp.md @@ -11,7 +11,8 @@ function names, commit messages, symbols, and lore email searches. - **git_sha**: commit to search (default: current) - **branch**: branch name, resolved to its tip (e.g., "main"); takes precedence over git_sha if both are given -- **page**: pagination (1-based), 50 lines per page; omit for full results +- **page**: pagination (1-based); pages are 50 lines of the tool's + rendered text output, not 50 result records. Omit for full results. - **since_date / until_date**: e.g., "yesterday", "2 weeks ago", "2024-01-15" - **\*_patterns**: arrays of regex. `author_patterns`, `subject_patterns`, @@ -20,27 +21,32 @@ function names, commit messages, symbols, and lore email searches. `regex_patterns` and `symbol_patterns` are AND'd within an array. **Conventions**: boolean parameters default to `false`; `limit: 0` -means unlimited unless a max is given. +means unlimited, except where the tool declares an explicit max -- +in that case the max wins and `limit: 0` is rejected. ## Code lookup +In the call-graph tools below (`find_callers`, `find_calls`, +`find_callchain`), both sides of a call edge include functions and +function-like macros. + **find_function**: search for functions and macros - name: function/macro name, or a regex - also displays details on callers and callees **find_type**: search for types and typedefs - name: type/typedef name or regex -**find_callers**: find all functions that call a function or macro - - name: function to search -**find_calls**: find all functions called by a function or macro - - name: function to search -**find_callchain**: search complete function/macro call chain (forward and reverse) +**find_callers**: find callers (functions or macros) of the named entity + - name: function or macro to search +**find_calls**: find callees (functions or macros) of the named entity + - name: function or macro to search +**find_callchain**: complete call chain (forward and reverse) - name: function or macro to search - up_levels: number of caller levels to show (default: 2, 0 = unlimited) - down_levels: number of callee levels to show (default: 3, 0 = unlimited) - calls_limit: max calls to show per level (default: 15, 0 = unlimited) **diff_functions**: extract functions and types from a unified diff - - diff_content: the string to analyze - - Use this to determine which symbols are involved in a given diff + - diff_content: unified diff text (e.g., output of `git diff`) + - use this to determine which symbols are involved in a given diff ## Code search @@ -49,8 +55,9 @@ means unlimited unless a max is given. - verbose: if true, show full function bodies - path_pattern: optional regex to filter results by path - limit: max number of results (default: 100) - - only searches inside functions or macros; no need to escape - your pattern to limit the search + - the search is already scoped to function and macro bodies; no + need to anchor the pattern to constrain the search (regex + metacharacters are NOT auto-escaped) **vgrep_functions**: vector embedding search on functions/macros/types - query_text: text describing the kind of functions to find - path_pattern: optional regex to filter results by path @@ -138,13 +145,15 @@ and invoke full tools on demand. ## Recipes -### Searching for commits reachable from HEAD (or any other git sha) +### Locating a backported commit reachable from HEAD (or any other sha) -If a repository heavily cherry-picks patches, it might have a backported commit -under a different git sha. This means the most effective way to find the -backported commit is searching by commit subject: +Repositories that heavily cherry-pick patches store the backport +under a different git sha than the upstream commit. Search by +commit subject to find it, then narrow to commits reachable from +the branch tip with `reachable_sha`: ``` -semcode> commit -r "bnxt_en: Fix memory corruption when FW resources change during ifdown" -semcode> commit -r "bnxt_en: Fix memory corruption when FW resources change during ifdown" --reachable HEAD +find_commit(regex_patterns=["bnxt_en: Fix memory corruption when FW resources change during ifdown"]) +find_commit(regex_patterns=["bnxt_en: Fix memory corruption when FW resources change during ifdown"], + reachable_sha="HEAD") ```