Skip to content

[AMD] add dsv4 sglang disagg#1818

Open
billishyahao wants to merge 12 commits into
mainfrom
amd/dsv4_sgl_di
Open

[AMD] add dsv4 sglang disagg#1818
billishyahao wants to merge 12 commits into
mainfrom
amd/dsv4_sgl_di

Conversation

@billishyahao

@billishyahao billishyahao commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

cc @Duyi-Wang


Note

Medium Risk
Touches shared disagg launch paths (server_sglang.sh, models.yaml) for all models, not only DSv4; behavior changes when EP is disabled and MoE auto-sizing is partially commented out.

Overview
Adds dsv4-fp4-mi355x-sglang-disagg to the AMD master benchmark matrix (8k/1k, non-MTP) with sweeps over pure TP8, DEP8 (MoRI KV + MoE a2a), and dp-attention + TP-MoE, plus a new workflow runner dsv4_fp4_mi355x_sglang-disagg.sh and a perf-changelog entry.

The multi-node harness is extended for DSv4 PD: a DeepSeek-V4-Pro block in models.yaml (dsv4 attention backend, mori disagg, prefill disable_cuda_graph) and matching MoRI/kernel env overrides in env.sh; the bench client uses --dsv4 framing instead of chat templates.

server_sglang.sh / models.yaml refactor MoE CLI so ep_flags (mori a2a, deepep, fake dispatch) apply only when EP is on—ep=1 stays TP-MoE even with dp-attention—and prefill can honor per-model disable_cuda_graph, context_length, and optional MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK_* overrides. submit.sh threads DRY_RUN for previewing composed launch commands on a real allocation.

Reviewed by Cursor Bugbot for commit f56f8de. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c22652b. Configure here.

Comment thread benchmarks/multi_node/amd_utils/server_sglang.sh
@github-actions

Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx added the all-evals Expand eval selection to every fixed-sequence config label Jun 21, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Comment thread perf-changelog.yaml
Comment on lines +4002 to +4003
description:
- "init submission of dsv4 sglang disagg "

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx

Copy link
Copy Markdown
Collaborator

hi @billishyahao there seems to be an accuracy issues with TP8+TP8. codex has narrowed it down to conc=4, here is the bug report for when u wake up, please take a look

sgl-project/sglang#28851

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27896968169/job/82550287079?pr=1818

@functionstackx functionstackx left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u remove https://github.com/SemiAnalysisAI/InferenceX/blob/main/benchmarks/multi_node/amd_utils/patches/mori_conn.py too

this is no longer needed now that sgl-project/sglang#26525 is fixed in sgl-project/sglang#26539

_MORI_PATCH_FILE="$DI_REPO_DIR/benchmarks/multi_node/amd_utils/patches/mori_conn.py"
_MORI_PATCH_TARGET="/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py"
if [[ "${MORI_CONN_PATCH:-auto}" != "skip" ]] \
&& [[ -f "$_MORI_PATCH_FILE" ]] \
&& [[ "${DOCKER_IMAGE_NAME:-}" == *"v0.5.12.post1"* ]] \
&& [[ "${EXTRA_DOCKER_MOUNTS:-}" != *"$_MORI_PATCH_TARGET"* ]]; then
EXTRA_DOCKER_MOUNTS="${EXTRA_DOCKER_MOUNTS:-} -v ${_MORI_PATCH_FILE}:${_MORI_PATCH_TARGET}:ro"
export EXTRA_DOCKER_MOUNTS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

all-evals Expand eval selection to every fixed-sequence config AMD full-sweep-enabled

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants