add a tool for ProfilerStep GPU timeline sanity checks by STwangyingrui · Pull Request #1155 · ModelTC/LightX2V

STwangyingrui · 2026-06-16T09:42:40Z

Adds a standalone trace linter that flags suspicious ProfilerStep idle gaps (ratio + per-launch budget); ALERT is a TensorBoard review hint, not a hard failure.

gemini-code-assist

Code Review

This pull request introduces a new utility script, analyze_torch_trace_gap.py, to check for idle gaps in PyTorch Profiler TensorBoard traces. The feedback focuses on improving the robustness and efficiency of the script. Key suggestions include handling cases where the trace JSON is parsed directly as a list to avoid an AttributeError, adding defensive checks for event types and required keys, simplifying the gpu_span calculation using the sorted order of merged intervals, and wrapping the file processing loop in a try-except block to gracefully handle errors when analyzing multiple traces.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-16T09:44:06Z

+    events = payload.get("traceEvents", payload)
+    if not isinstance(events, list):
+        raise ValueError(f"Unexpected trace format in {path}")


If the trace file is formatted as a JSON array directly (which is a common format for Chrome traces), payload will be parsed as a list. Calling payload.get(...) will then raise an AttributeError: 'list' object has no attribute 'get'. We should check the type of payload before attempting to access it as a dictionary.

if isinstance(payload, dict): events = payload.get("traceEvents", []) elif isinstance(payload, list): events = payload else: events = [] if not isinstance(events, list): raise ValueError(f"Unexpected trace format in {path}")

gemini-code-assist · 2026-06-16T09:44:06Z

+    for ev in events:
+        if ev.get("ph") != "X":
+            continue
+        if ev.get("cat") != "user_annotation":
+            continue
+        name = ev.get("name", "")
+        step_id = _parse_step_id(name)
+        if step_id is None:
+            continue
+        ts = float(ev["ts"])
+        dur = float(ev["dur"])
+        steps.append((step_id, Interval(ts, ts + dur), name))


To prevent potential KeyError or AttributeError if the trace contains malformed events or non-dictionary elements, we should add defensive checks to ensure each event is a dictionary and contains the required keys (ts and dur) before attempting to access or convert them.

Suggested change

for ev in events:

if ev.get("ph") != "X":

continue

if ev.get("cat") != "user_annotation":

continue

name = ev.get("name", "")

step_id = _parse_step_id(name)

if step_id is None:

continue

ts = float(ev["ts"])

dur = float(ev["dur"])

steps.append((step_id, Interval(ts, ts + dur), name))

for ev in events:

if not isinstance(ev, dict):

continue

if ev.get("ph") != "X":

continue

if ev.get("cat") != "user_annotation":

continue

name = ev.get("name", "")

step_id = _parse_step_id(name)

if step_id is None:

continue

ts = ev.get("ts")

dur = ev.get("dur")

if ts is None or dur is None:

continue

steps.append((step_id, Interval(float(ts), float(ts) + float(dur)), name))

gemini-code-assist · 2026-06-16T09:44:06Z

+    for ev in events:
+        if ev.get("ph") != "X":
+            continue
+        if ev.get("cat") not in _GPU_ACTIVITY_CATS:
+            continue
+        ts = float(ev["ts"])
+        dur = float(ev["dur"])
+        if dur <= 0:
+            continue
+        intervals.append(Interval(ts, ts + dur))


Similarly, we should add defensive checks here to ensure each event is a dictionary and contains the required keys (ts and dur) before attempting to access or convert them.

Suggested change

for ev in events:

if ev.get("ph") != "X":

continue

if ev.get("cat") not in _GPU_ACTIVITY_CATS:

continue

ts = float(ev["ts"])

dur = float(ev["dur"])

if dur <= 0:

continue

intervals.append(Interval(ts, ts + dur))

for ev in events:

if not isinstance(ev, dict):

continue

if ev.get("ph") != "X":

continue

if ev.get("cat") not in _GPU_ACTIVITY_CATS:

continue

ts = ev.get("ts")

dur = ev.get("dur")

if ts is None or dur is None:

continue

ts = float(ts)

dur = float(dur)

if dur <= 0:

continue

intervals.append(Interval(ts, ts + dur))

gemini-code-assist · 2026-06-16T09:44:06Z

+        if merged:
+            gpu_span = max(x.end for x in merged) - min(x.start for x in merged)
+        else:
+            gpu_span = 0.0


Since merged is already sorted by start and contains mutually disjoint (non-overlapping) intervals, the minimum start time is guaranteed to be merged[0].start and the maximum end time is guaranteed to be merged[-1].end. We can simplify the calculation to avoid scanning the entire list with min and max.

Suggested change

if merged:

gpu_span = max(x.end for x in merged) - min(x.start for x in merged)

else:

gpu_span = 0.0

if merged:

gpu_span = merged[-1].end - merged[0].start

else:

gpu_span = 0.0

gemini-code-assist · 2026-06-16T09:44:06Z

+    for path in args.traces:
+        stats = analyze_trace_file(path)
+        print(render_report(path, stats, gap_threshold=args.gap_threshold, gap_per_event_us=args.gap_per_event_us), end="")
+
+    return 0


If one of the trace files in a batch is corrupt or missing, the script will crash immediately and skip processing the remaining files. Wrapping the analysis loop in a try-except block allows the script to gracefully report errors for individual files and continue processing the rest, returning a non-zero exit code at the end if any errors occurred.

Suggested change

for path in args.traces:

stats = analyze_trace_file(path)

print(render_report(path, stats, gap_threshold=args.gap_threshold, gap_per_event_us=args.gap_per_event_us), end="")

return 0

exit_code = 0

for path in args.traces:

try:

stats = analyze_trace_file(path)

print(render_report(path, stats, gap_threshold=args.gap_threshold, gap_per_event_us=args.gap_per_event_us), end="")

except Exception as e:

print(f"Error analyzing {path}: {e}", file=sys.stderr)

exit_code = 1

return exit_code

add a tool to analyze torch trace gap

67d2580

STwangyingrui closed this Jun 16, 2026

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

STwangyingrui deleted the yr/gap_prof_tool branch June 16, 2026 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a tool for ProfilerStep GPU timeline sanity checks#1155

add a tool for ProfilerStep GPU timeline sanity checks#1155
STwangyingrui wants to merge 1 commit into
mainfrom
yr/gap_prof_tool

STwangyingrui commented Jun 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

STwangyingrui commented Jun 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant