Skip to content

[Mgmt] Profile Network generator performance#59541

Closed
live1206 wants to merge 2 commits into
Azure:mainfrom
live1206:network-mpg-generator-profiling
Closed

[Mgmt] Profile Network generator performance#59541
live1206 wants to merge 2 commits into
Azure:mainfrom
live1206:network-mpg-generator-profiling

Conversation

@live1206
Copy link
Copy Markdown
Member

@live1206 live1206 commented May 29, 2026

Summary

Draft PR for Network MPG generator profiling. This PR adds env-gated management generator timing instrumentation and a small LRO operation-source fix needed for the Network code model to complete direct generation when an LRO final result is a primitive/framework type.

Set AZSDK_MGMT_GENERATOR_PROFILE=1 to emit [mgmt-generator-profile] timings.

Current conclusion

The latest profiling shows tspCodeModel.json generation is not the 10+ minute hotspot for Network. The TypeSpec + emitter/code-model path completes in about a minute end-to-end, and the code-model creation/serialization/write portion takes only seconds.

The dominant measured cost is in the .NET generator Roslyn post-processing path, specifically shared MTG simplification/name-reduction work after generated types have already been written into the in-memory Roslyn workspace.

High-level timing

Current manual Network regen comparison

The latest manual end-to-end Network regeneration timings are the cleanest current comparison between the migration branch and AutoRest.CSharp:

Manual Network regen path Elapsed
TypeSpec/MTG on the Network migration branch 00:11:20.27
AutoRest.CSharp 00:08:19.39
Difference ~3m00.88

This means the current TypeSpec/MTG Network regen is still about 3 minutes slower than AutoRest.CSharp for the manual regen path, even after the narrow simplifier optimization. MTG is about 36% slower than the AutoRest.CSharp run (680.27s / 499.39s).

Profiling phase isolation

The profiling data below comes from the latest-main/temp profiling setup. It should be read as phase isolation for the current generator hotspot, not as the normal end-to-end Network RegenSdkLocal time. In this setup, the temporary spec/project inputs had already been prepared on disk before the measured generator-focused runs.

Measurement Elapsed What it means
Prepared-temp profiling run: TypeSpec compile + emitter + .NET generator, baseline/no narrow-simplifier tweak ~12m36s Baseline profiling run before the Simplifier.ReduceAsync skip experiment. This is not the normal full Network RegenSdkLocal workflow timing.
Direct .NET generator from saved tspCodeModel.json/Configuration.json, baseline/no narrow-simplifier tweak ~11m48s to ~12m12s Isolates the .NET generator phase; almost all time is Roslyn post-processing.
TypeSpec + emitter/code-model only, generator skipped ~51s Isolates TypeSpec compile + emitter/code-model generation and input writing.
Direct .NET generator from fresh saved inputs with current narrow simplifier branch ~8m13s Fresh 2026-06-02 rerun for the .NET generator phase; Roslyn post-processing was ~7m53.9s of this total.

The main conclusion is: the measured multi-minute hotspot is the .NET generator post-processing phase, not TypeSpec/emitter code-model creation.

The narrow simplifier experiment reduced direct-generator wall-clock from about 12m12s in the baseline profiling setup to ~8m13s in the fresh current-branch rerun. The current manual full-regeneration comparison still shows a remaining gap versus AutoRest.CSharp, so further work should focus on reducing the direct-generator post-processing cost and validating the win in full manual regen.

Emitter/code-model timing

I isolated the TypeSpec + emitter path with temporary local emitter instrumentation and AZSDK_CSHARP_EMITTER_SKIP_GENERATOR=1, so the run writes tspCodeModel.json/Configuration.json but does not invoke the .NET generator.

Stage Elapsed
RegenSdkLocal emitter-only worker wall-clock ~51s
Base emitter createSdkContext 5.158s
Base emitter createModel 0.213s
Mgmt updateCodeModel callback ~0.046s
serializeCodeModel 0.404s
Write tspCodeModel.json + Configuration.json 0.136s
Base emitter total from context creation through write-inputs 5.957s
Mgmt emitter total around emitAzureCodeModel 11.386s

The generated tspCodeModel.json was 39.9 MiB and contained 1 client, 1,065 models, and 284 enums. This confirms code-model creation and writing are seconds, not minutes.

A full non-skip emitter run currently hits the known Network customization validation issue before reaching Roslyn post-processing (CodeGenSuppress on VirtualMachineScaleSetVmNetworkResource), so it is not useful as a full-generation timing. Direct-generator profiling from saved inputs is the useful measurement for the .NET generator phase.

Direct .NET generator timing

I reran the Network direct generator profile from fresh saved inputs with the current local MTG branch on 2026-06-02. This run used the narrow simplifier behavior, so it validates the current optimized shape rather than the older baseline.

Direct generator timing, shown as per-step elapsed cost:

Step Elapsed Share of direct-generator time
Build providers and write generated types into the in-memory workspace ~17.7s ~3.5%
Delete old generated files ~0.002s ~0.0%
Roslyn post-processing ~7m53.9s ~96.1%
Write generated files to disk ~1.7s ~0.3%
Total direct-generator wall-clock reported by generator ~8m13.1s 100%
Shell wall-clock for the command ~8m15.8s n/a

The direct generator wrote 3,542 generated files, including 1,192 .Serialization.cs files. This fresh run confirms that, in the isolated .NET generator phase, Roslyn post-processing still accounts for almost all elapsed time even though Network has many providers, visitors, models, and writers before post-processing.

Roslyn simplifier category breakdown

On 2026-06-02, I reran the isolated Network direct-generator baseline to check timing drift and compared broad/root simplifier output against a temporary no-simplifier run. The ~20 minute broad/root simplifier result is reproducible.

Run Internal generator total Shell elapsed Notes
No Roslyn simplifier 00:07:36.29 458.751s Fast, but generated output changes broadly.
Broad/root simplifier baseline rerun 00:19:42.39 1185.422s Consistent with prior ~20m baseline.
Prior broad/root simplifier baseline ~00:19:57 ~20m00s Previous rerun before this drift check.

The no-simplifier output differed from the baseline in 3,534 / 3,542 generated .cs files across 108,504 diff blocks. Categories overlap because one hunk can include several simplifications.

Simplification category Diff blocks Files
global:: / name simplification 98,190 3,534
this. qualification 16,302 2,321
XML cref simplification 10,959 2,110
Parentheses 6,693 2,187
Qualified member/name 5,868 1,622
Attribute suffix 1,207 556
Generic method inference 1,156 675
Predefined type keyword 280 275

Conclusion: Roslyn is mostly doing semantic name/global-alias reduction across nearly every Network file. This explains why span filtering only produced a small win, while disabling simplification drops the isolated generator to ~7m36s but changes output everywhere. The next promising experiments should replace lower-risk generated patterns first (this. removal, XML cref, redundant parentheses/generic inference) before attempting broad global:: type-name emission.

CPU tracing points at Roslyn simplification/semantic work:

Inclusive sample Method/family
30.38% ReduceAsync task
26.65% CSharpNameReducer.SimplifyName
18.61% NameSimplifier.TrySimplify
12.18% CSharpSemanticModel.GetSymbolInfo
10.84% CSharpNameReducer.Rewriter.VisitMemberAccessExpression

Exclusive samples are dominated by thread-pool wait/lock contention (LowLevelLifoSemaphore.WaitForSignal, Monitor.Enter_Slowpath), consistent with expensive concurrent Roslyn simplification and contention rather than management provider construction.

CodeWriter string/byte conversion check

I checked the shared MTG CodeWriter path after a teammate raised possible heavy string/byte conversion overhead. The generated C# source path is mostly char/string based:

  • CodeWriter writes chars into UnsafeBufferSequence : IBufferWriter<char>.
  • TypeProviderWriter.Write() materializes one final string per generated file via CodeWriter.ToString().
  • GeneratedCodeWorkspace.AddGeneratedFile adds that string to Roslyn.
  • GetGeneratedFilesAsync later calls Roslyn SourceText.ToString().
  • CSharpGen writes final files with File.WriteAllTextAsync, which encodes once at disk write time.

There is a char-to-byte conversion path in UnsafeBufferSequence.Reader.CopyTo(Stream) / CopyToAsync(Stream) via Encoding.UTF8.GetBytes(...), but that is not the normal generated .cs source path used by Network file generation. It is used by stream/BinaryData helper paths.

The Network phase timings also bound the possible impact: in the rebuilt direct-generator run, provider construction + visitors + CodeWriter + adding generated files to the workspace reached 00:00:17.62; visitors alone completed by 00:00:11.45, so generated source writing/add-to-workspace was about 6.18s. Final file write after Roslyn was about 1.95s. Roslyn post-processing was 00:19:19.58 after that point (00:19:37.21 cumulative from 00:00:17.62).

Conclusion: CodeWriter/string/byte conversion is worth keeping efficient, but it is not the current Network MTG hotspot. Even eliminating all pre/post Roslyn source materialization would save seconds, while the measured issue is roughly 19 minutes of Roslyn simplification work.

Follow-up easy category experiments

I also tried the lower-risk categories from the breakdown to see whether we could preserve exact generated output while reducing Roslyn work before tackling broad global:: name reduction.

Experiment Result Decision
Member-level Simplifier.ReduceAsync(document, spans) Preserved generated test-project output and npm run test:generator passed. Rebuilt Network direct-generator run was 00:19:39.16 internal / 1182.394s shell, with Roslyn post-processing 00:19:37.21. Not a meaningful win versus the broad/root baseline rerun (00:19:42.39 internal / 1185.422s shell).
Guarded this. emission from Snippet.This Generated test-project output stayed unchanged after normal simplification, but many raw writer/unit expected baselines changed from this.Create... to Create.... Not taking as-is; it would need an explicit generated-file/raw-render mode or broader baseline churn for a small/overlapping category.
Direct XML cref emission A naive short-name form first broke generic cref output (IList vs IList{T}), and after fixing that still changed exact generated output (SampleTypeSpec.Thing -> Thing, Models.Custom.Friend -> Friend). Not safe as-is because Roslyn's final cref form depends on namespace context/imports.

One fast Network run (~7m22) was discarded after validation because the output still contained unsimplified tokens (93,679 global::System occurrences, 12,448 this. occurrences, and 9,774 global:: crefs), meaning the profiling clone had used stale/no-simplifier assemblies. After rebuilding the profiling generator, the output was properly simplified (2 global::System, 31 this., 0 global:: crefs) and timing returned to the ~20 minute baseline.

Conclusion from these easy experiments: the overlapping low-risk categories do not materially reduce Network time on their own. The remaining meaningful target is still broad semantic name/global-alias simplification across nearly every file, or a more structural change that avoids asking Roslyn to rediscover reductions that the generator can emit safely.

Local MTG post-processing profile

I temporarily wired the management generator worktree to local TypeSpec/MTG source under /workspaces/typespec to instrument shared MTG post-processing directly. Those local project-reference and instrumentation edits were used only for profiling and are not intended as product changes in this PR.

Local MTG direct-generator run, shown as per-step elapsed cost rather than cumulative markers:

Step Elapsed
Build providers and write generated types into the in-memory workspace ~16.4s
Delete old generated files ~0.1s
Roslyn post-processing ~11m51.4s
Write generated files to disk ~1.6s
Remaining generator overhead ~2.5s
Total direct-generator wall-clock ~12m12s

MTG post-processing instrumentation for 4,168 documents showed that the expensive portion was Simplifier.ReduceAsync; syntax-root/semantic-model lookup, member removal, library rewriters, and formatting were comparatively small.

Optimization experiments

Roslyn 4.1.0 comparison

I temporarily changed local MTG from Microsoft.CodeAnalysis.CSharp.Workspaces 4.8.0 to 4.1.0 to match AutoRest.CSharp's Roslyn version and reran the direct Network generator from saved inputs.

Experiment Roslyn post-processing wall-clock Total direct generator wall-clock
MTG local baseline, Roslyn 4.8.0 ~12m07.9s ~12m12s
MTG local with Roslyn 4.1.0 13m22.5s 13m27s

The Roslyn downgrade was slower for this workload, so the Roslyn version difference does not look like the cause of AutoRest.CSharp being faster.

Narrow simplifier experiment

I tested a local MTG change that avoids blanket Simplifier.Annotation on every generated document root and skips Simplifier.ReduceAsync for documents that have no remaining Simplifier.Annotation after member removal/rewriters.

Experiment Documents Skipped simplifier Roslyn post-processing wall-clock Total direct generator wall-clock
MTG local baseline 4,168 0 ~12m07.9s ~12m12s
Narrow simplifier 4,171 3,273 7m59.5s 8m03s

This cut direct-generator wall-clock by about 4m09s versus the local baseline.

A focused MTG draft PR has been opened for this improvement: microsoft/typespec#10846.

AutoRest.CSharp comparison

The latest manual Network regen comparison is:

Manual Network regen path Elapsed
TypeSpec/MTG on the Network migration branch 00:11:20.27
AutoRest.CSharp 00:08:19.39

So AutoRest.CSharp is still faster by about 3m00.88s for the manual Network regen path.

AutoRest.CSharp uses the same broad post-processing pattern at a high level: generated documents are added to a Roslyn workspace, generated document roots are annotated for simplification, then Simplifier.ReduceAsync runs during post-processing. The Network performance difference does not appear to be because AutoRest.CSharp avoids Roslyn simplification entirely.

The workloads are different:

Comparison point AutoRest.CSharp Network output MTG Network migration output
Generated source files under src/Generated 2,538 3,518
Generated source size under src/Generated ~31.1 MiB ~30.3 MiB
Generated source lines under src/Generated ~597k ~586k
Largest generated source hotspot ArmNetworkModelFactory.cs ~0.9 MiB ArmNetworkModelFactory.cs ~1.7 MiB
Profiling workspace generated documents n/a 4,171

The broad global:: fallback experiment also showed why simply matching AutoRest-style all-file simplification is not a good MTG fix: global:: appeared in 3,681 / 4,171 generated documents, so that fallback re-enabled simplification for almost every file and erased most of the narrow-simplifier win.

Conclusion: AutoRest.CSharp is still faster in the latest manual regen comparison. The current evidence still points to generated-code shape and Roslyn post-processing workload as the main difference, not to AutoRest.CSharp avoiding Roslyn simplification altogether. The better MTG direction remains targeted simplification plus direct valid code emission where possible, while continuing to validate improvements against full manual Network regen.

Next focus

Continue improving shared Microsoft.TypeSpec.Generator Roslyn post-processing, especially GeneratedCodeWorkspace.ProcessDocument. The current best target is to avoid full-document Simplifier.ReduceAsync unless a document has meaningful simplifier annotations, rather than forcing simplification across every generated file.

live1206 and others added 2 commits May 29, 2026 08:57
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added CodeGen Issues that relate to code generation Mgmt This issue is related to a management package. labels May 29, 2026
@live1206
Copy link
Copy Markdown
Member Author

live1206 commented Jun 3, 2026

Converted this profiling tracker to an issue: #59625

@live1206
Copy link
Copy Markdown
Member Author

live1206 commented Jun 3, 2026

Closing in favor of the tracking issue: #59625

@live1206 live1206 closed this Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CodeGen Issues that relate to code generation Mgmt This issue is related to a management package.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant