[Mgmt] Profile Network generator performance by live1206 · Pull Request #59541 · Azure/azure-sdk-for-net

live1206 · 2026-05-29T09:39:18Z

Summary

Draft PR for Network MPG generator profiling. This PR adds env-gated management generator timing instrumentation and a small LRO operation-source fix needed for the Network code model to complete direct generation when an LRO final result is a primitive/framework type.

Set AZSDK_MGMT_GENERATOR_PROFILE=1 to emit [mgmt-generator-profile] timings.

Current conclusion

The latest profiling shows tspCodeModel.json generation is not the 10+ minute hotspot for Network. The TypeSpec + emitter/code-model path completes in about a minute end-to-end, and the code-model creation/serialization/write portion takes only seconds.

The dominant measured cost is in the .NET generator Roslyn post-processing path, specifically shared MTG simplification/name-reduction work after generated types have already been written into the in-memory Roslyn workspace.

High-level timing

Current manual Network regen comparison

The latest manual end-to-end Network regeneration timings are the cleanest current comparison between the migration branch and AutoRest.CSharp:

Manual Network regen path	Elapsed
TypeSpec/MTG on the Network migration branch	`00:11:20.27`
AutoRest.CSharp	`00:08:19.39`
Difference	`~3m00.88`

This means the current TypeSpec/MTG Network regen is still about 3 minutes slower than AutoRest.CSharp for the manual regen path, even after the narrow simplifier optimization. MTG is about 36% slower than the AutoRest.CSharp run (680.27s / 499.39s).

Profiling phase isolation

The profiling data below comes from the latest-main/temp profiling setup. It should be read as phase isolation for the current generator hotspot, not as the normal end-to-end Network RegenSdkLocal time. In this setup, the temporary spec/project inputs had already been prepared on disk before the measured generator-focused runs.

Measurement	Elapsed	What it means
Prepared-temp profiling run: TypeSpec compile + emitter + .NET generator, baseline/no narrow-simplifier tweak	~12m36s	Baseline profiling run before the `Simplifier.ReduceAsync` skip experiment. This is not the normal full Network `RegenSdkLocal` workflow timing.
Direct .NET generator from saved `tspCodeModel.json`/`Configuration.json`, baseline/no narrow-simplifier tweak	~11m48s to ~12m12s	Isolates the .NET generator phase; almost all time is Roslyn post-processing.
TypeSpec + emitter/code-model only, generator skipped	~51s	Isolates TypeSpec compile + emitter/code-model generation and input writing.
Direct .NET generator from fresh saved inputs with current narrow simplifier branch	~8m13s	Fresh 2026-06-02 rerun for the .NET generator phase; Roslyn post-processing was ~7m53.9s of this total.

The main conclusion is: the measured multi-minute hotspot is the .NET generator post-processing phase, not TypeSpec/emitter code-model creation.

The narrow simplifier experiment reduced direct-generator wall-clock from about 12m12s in the baseline profiling setup to ~8m13s in the fresh current-branch rerun. The current manual full-regeneration comparison still shows a remaining gap versus AutoRest.CSharp, so further work should focus on reducing the direct-generator post-processing cost and validating the win in full manual regen.

Emitter/code-model timing

I isolated the TypeSpec + emitter path with temporary local emitter instrumentation and AZSDK_CSHARP_EMITTER_SKIP_GENERATOR=1, so the run writes tspCodeModel.json/Configuration.json but does not invoke the .NET generator.

Stage	Elapsed
`RegenSdkLocal` emitter-only worker wall-clock	~51s
Base emitter `createSdkContext`	5.158s
Base emitter `createModel`	0.213s
Mgmt `updateCodeModel` callback	~0.046s
`serializeCodeModel`	0.404s
Write `tspCodeModel.json` + `Configuration.json`	0.136s
Base emitter total from context creation through write-inputs	5.957s
Mgmt emitter total around `emitAzureCodeModel`	11.386s

The generated tspCodeModel.json was 39.9 MiB and contained 1 client, 1,065 models, and 284 enums. This confirms code-model creation and writing are seconds, not minutes.

A full non-skip emitter run currently hits the known Network customization validation issue before reaching Roslyn post-processing (CodeGenSuppress on VirtualMachineScaleSetVmNetworkResource), so it is not useful as a full-generation timing. Direct-generator profiling from saved inputs is the useful measurement for the .NET generator phase.

Direct .NET generator timing

I reran the Network direct generator profile from fresh saved inputs with the current local MTG branch on 2026-06-02. This run used the narrow simplifier behavior, so it validates the current optimized shape rather than the older baseline.

Direct generator timing, shown as per-step elapsed cost:

Step	Elapsed	Share of direct-generator time
Build providers and write generated types into the in-memory workspace	~17.7s	~3.5%
Delete old generated files	~0.002s	~0.0%
Roslyn post-processing	~7m53.9s	~96.1%
Write generated files to disk	~1.7s	~0.3%
Total direct-generator wall-clock reported by generator	~8m13.1s	100%
Shell wall-clock for the command	~8m15.8s	n/a

The direct generator wrote 3,542 generated files, including 1,192 .Serialization.cs files. This fresh run confirms that, in the isolated .NET generator phase, Roslyn post-processing still accounts for almost all elapsed time even though Network has many providers, visitors, models, and writers before post-processing.

Roslyn simplifier category breakdown

On 2026-06-02, I reran the isolated Network direct-generator baseline to check timing drift and compared broad/root simplifier output against a temporary no-simplifier run. The ~20 minute broad/root simplifier result is reproducible.

Run	Internal generator total	Shell elapsed	Notes
No Roslyn simplifier	`00:07:36.29`	`458.751s`	Fast, but generated output changes broadly.
Broad/root simplifier baseline rerun	`00:19:42.39`	`1185.422s`	Consistent with prior ~20m baseline.
Prior broad/root simplifier baseline	~`00:19:57`	~`20m00s`	Previous rerun before this drift check.

The no-simplifier output differed from the baseline in 3,534 / 3,542 generated .cs files across 108,504 diff blocks. Categories overlap because one hunk can include several simplifications.

Simplification category	Diff blocks	Files
`global::` / name simplification	98,190	3,534
`this.` qualification	16,302	2,321
XML `cref` simplification	10,959	2,110
Parentheses	6,693	2,187
Qualified member/name	5,868	1,622
Attribute suffix	1,207	556
Generic method inference	1,156	675
Predefined type keyword	280	275

Conclusion: Roslyn is mostly doing semantic name/global-alias reduction across nearly every Network file. This explains why span filtering only produced a small win, while disabling simplification drops the isolated generator to ~7m36s but changes output everywhere. The next promising experiments should replace lower-risk generated patterns first (this. removal, XML cref, redundant parentheses/generic inference) before attempting broad global:: type-name emission.

CPU tracing points at Roslyn simplification/semantic work:

Inclusive sample	Method/family
30.38%	`ReduceAsync` task
26.65%	`CSharpNameReducer.SimplifyName`
18.61%	`NameSimplifier.TrySimplify`
12.18%	`CSharpSemanticModel.GetSymbolInfo`
10.84%	`CSharpNameReducer.Rewriter.VisitMemberAccessExpression`

Exclusive samples are dominated by thread-pool wait/lock contention (LowLevelLifoSemaphore.WaitForSignal, Monitor.Enter_Slowpath), consistent with expensive concurrent Roslyn simplification and contention rather than management provider construction.

CodeWriter string/byte conversion check

I checked the shared MTG CodeWriter path after a teammate raised possible heavy string/byte conversion overhead. The generated C# source path is mostly char/string based:

CodeWriter writes chars into UnsafeBufferSequence : IBufferWriter<char>.
TypeProviderWriter.Write() materializes one final string per generated file via CodeWriter.ToString().
GeneratedCodeWorkspace.AddGeneratedFile adds that string to Roslyn.
GetGeneratedFilesAsync later calls Roslyn SourceText.ToString().
CSharpGen writes final files with File.WriteAllTextAsync, which encodes once at disk write time.

There is a char-to-byte conversion path in UnsafeBufferSequence.Reader.CopyTo(Stream) / CopyToAsync(Stream) via Encoding.UTF8.GetBytes(...), but that is not the normal generated .cs source path used by Network file generation. It is used by stream/BinaryData helper paths.

The Network phase timings also bound the possible impact: in the rebuilt direct-generator run, provider construction + visitors + CodeWriter + adding generated files to the workspace reached 00:00:17.62; visitors alone completed by 00:00:11.45, so generated source writing/add-to-workspace was about 6.18s. Final file write after Roslyn was about 1.95s. Roslyn post-processing was 00:19:19.58 after that point (00:19:37.21 cumulative from 00:00:17.62).

Conclusion: CodeWriter/string/byte conversion is worth keeping efficient, but it is not the current Network MTG hotspot. Even eliminating all pre/post Roslyn source materialization would save seconds, while the measured issue is roughly 19 minutes of Roslyn simplification work.

Follow-up easy category experiments

I also tried the lower-risk categories from the breakdown to see whether we could preserve exact generated output while reducing Roslyn work before tackling broad global:: name reduction.

Experiment	Result	Decision
Member-level `Simplifier.ReduceAsync(document, spans)`	Preserved generated test-project output and `npm run test:generator` passed. Rebuilt Network direct-generator run was `00:19:39.16` internal / `1182.394s` shell, with Roslyn post-processing `00:19:37.21`.	Not a meaningful win versus the broad/root baseline rerun (`00:19:42.39` internal / `1185.422s` shell).
Guarded `this.` emission from `Snippet.This`	Generated test-project output stayed unchanged after normal simplification, but many raw writer/unit expected baselines changed from `this.Create...` to `Create...`.	Not taking as-is; it would need an explicit generated-file/raw-render mode or broader baseline churn for a small/overlapping category.
Direct XML `cref` emission	A naive short-name form first broke generic cref output (`IList` vs `IList{T}`), and after fixing that still changed exact generated output (`SampleTypeSpec.Thing` -> `Thing`, `Models.Custom.Friend` -> `Friend`).	Not safe as-is because Roslyn's final cref form depends on namespace context/imports.

One fast Network run (~7m22) was discarded after validation because the output still contained unsimplified tokens (93,679 global::System occurrences, 12,448 this. occurrences, and 9,774 global:: crefs), meaning the profiling clone had used stale/no-simplifier assemblies. After rebuilding the profiling generator, the output was properly simplified (2 global::System, 31 this., 0 global:: crefs) and timing returned to the ~20 minute baseline.

Conclusion from these easy experiments: the overlapping low-risk categories do not materially reduce Network time on their own. The remaining meaningful target is still broad semantic name/global-alias simplification across nearly every file, or a more structural change that avoids asking Roslyn to rediscover reductions that the generator can emit safely.

Local MTG post-processing profile

I temporarily wired the management generator worktree to local TypeSpec/MTG source under /workspaces/typespec to instrument shared MTG post-processing directly. Those local project-reference and instrumentation edits were used only for profiling and are not intended as product changes in this PR.

Local MTG direct-generator run, shown as per-step elapsed cost rather than cumulative markers:

Step	Elapsed
Build providers and write generated types into the in-memory workspace	~16.4s
Delete old generated files	~0.1s
Roslyn post-processing	~11m51.4s
Write generated files to disk	~1.6s
Remaining generator overhead	~2.5s
Total direct-generator wall-clock	~12m12s

MTG post-processing instrumentation for 4,168 documents showed that the expensive portion was Simplifier.ReduceAsync; syntax-root/semantic-model lookup, member removal, library rewriters, and formatting were comparatively small.

Optimization experiments

Roslyn 4.1.0 comparison

I temporarily changed local MTG from Microsoft.CodeAnalysis.CSharp.Workspaces 4.8.0 to 4.1.0 to match AutoRest.CSharp's Roslyn version and reran the direct Network generator from saved inputs.

Experiment	Roslyn post-processing wall-clock	Total direct generator wall-clock
MTG local baseline, Roslyn 4.8.0	~12m07.9s	~12m12s
MTG local with Roslyn 4.1.0	13m22.5s	13m27s

The Roslyn downgrade was slower for this workload, so the Roslyn version difference does not look like the cause of AutoRest.CSharp being faster.

Narrow simplifier experiment

I tested a local MTG change that avoids blanket Simplifier.Annotation on every generated document root and skips Simplifier.ReduceAsync for documents that have no remaining Simplifier.Annotation after member removal/rewriters.

Experiment	Documents	Skipped simplifier	Roslyn post-processing wall-clock	Total direct generator wall-clock
MTG local baseline	4,168	0	~12m07.9s	~12m12s
Narrow simplifier	4,171	3,273	7m59.5s	8m03s

This cut direct-generator wall-clock by about 4m09s versus the local baseline.

A focused MTG draft PR has been opened for this improvement: microsoft/typespec#10846.

AutoRest.CSharp comparison

The latest manual Network regen comparison is:

Manual Network regen path	Elapsed
TypeSpec/MTG on the Network migration branch	`00:11:20.27`
AutoRest.CSharp	`00:08:19.39`

So AutoRest.CSharp is still faster by about 3m00.88s for the manual Network regen path.

AutoRest.CSharp uses the same broad post-processing pattern at a high level: generated documents are added to a Roslyn workspace, generated document roots are annotated for simplification, then Simplifier.ReduceAsync runs during post-processing. The Network performance difference does not appear to be because AutoRest.CSharp avoids Roslyn simplification entirely.

The workloads are different:

Comparison point	AutoRest.CSharp Network output	MTG Network migration output
Generated source files under `src/Generated`	2,538	3,518
Generated source size under `src/Generated`	~31.1 MiB	~30.3 MiB
Generated source lines under `src/Generated`	~597k	~586k
Largest generated source hotspot	`ArmNetworkModelFactory.cs` ~0.9 MiB	`ArmNetworkModelFactory.cs` ~1.7 MiB
Profiling workspace generated documents	n/a	4,171

The broad global:: fallback experiment also showed why simply matching AutoRest-style all-file simplification is not a good MTG fix: global:: appeared in 3,681 / 4,171 generated documents, so that fallback re-enabled simplification for almost every file and erased most of the narrow-simplifier win.

Conclusion: AutoRest.CSharp is still faster in the latest manual regen comparison. The current evidence still points to generated-code shape and Roslyn post-processing workload as the main difference, not to AutoRest.CSharp avoiding Roslyn simplification altogether. The better MTG direction remains targeted simplification plus direct valid code emission where possible, while continuing to validate improvements against full manual Network regen.

Next focus

Continue improving shared Microsoft.TypeSpec.Generator Roslyn post-processing, especially GeneratedCodeWorkspace.ProcessDocument. The current best target is to avoid full-document Simplifier.ReduceAsync unless a document has meaningful simplifier annotations, rather than forcing simplification across every generated file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

live1206 · 2026-06-03T07:29:06Z

Converted this profiling tracker to an issue: #59625

live1206 · 2026-06-03T07:29:08Z

Closing in favor of the tracking issue: #59625

live1206 and others added 2 commits May 29, 2026 08:57

Add management generator profiling instrumentation

a1f9079

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Handle non-model LRO final results in management generator

b1621e6

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot added CodeGen Issues that relate to code generation Mgmt This issue is related to a management package. labels May 29, 2026

live1206 mentioned this pull request Jun 3, 2026

[Mgmt] Profile Network generator performance #59625

Open

live1206 closed this Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Mgmt] Profile Network generator performance#59541

[Mgmt] Profile Network generator performance#59541
live1206 wants to merge 2 commits into
Azure:mainfrom
live1206:network-mpg-generator-profiling

live1206 commented May 29, 2026 •

edited

Loading

Uh oh!

live1206 commented Jun 3, 2026

Uh oh!

live1206 commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

live1206 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Current conclusion

High-level timing

Current manual Network regen comparison

Profiling phase isolation

Emitter/code-model timing

Direct .NET generator timing

Roslyn simplifier category breakdown

CodeWriter string/byte conversion check

Follow-up easy category experiments

Local MTG post-processing profile

Optimization experiments

Roslyn 4.1.0 comparison

Narrow simplifier experiment

AutoRest.CSharp comparison

Next focus

Uh oh!

live1206 commented Jun 3, 2026

Uh oh!

live1206 commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

live1206 commented May 29, 2026 •

edited

Loading