[Mgmt] Profile Network generator performance#59541
Closed
live1206 wants to merge 2 commits into
Closed
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
Converted this profiling tracker to an issue: #59625 |
Member
Author
|
Closing in favor of the tracking issue: #59625 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Draft PR for Network MPG generator profiling. This PR adds env-gated management generator timing instrumentation and a small LRO operation-source fix needed for the Network code model to complete direct generation when an LRO final result is a primitive/framework type.
Set
AZSDK_MGMT_GENERATOR_PROFILE=1to emit[mgmt-generator-profile]timings.Current conclusion
The latest profiling shows
tspCodeModel.jsongeneration is not the 10+ minute hotspot for Network. The TypeSpec + emitter/code-model path completes in about a minute end-to-end, and the code-model creation/serialization/write portion takes only seconds.The dominant measured cost is in the .NET generator Roslyn post-processing path, specifically shared MTG simplification/name-reduction work after generated types have already been written into the in-memory Roslyn workspace.
High-level timing
Current manual Network regen comparison
The latest manual end-to-end Network regeneration timings are the cleanest current comparison between the migration branch and AutoRest.CSharp:
00:11:20.2700:08:19.39~3m00.88This means the current TypeSpec/MTG Network regen is still about 3 minutes slower than AutoRest.CSharp for the manual regen path, even after the narrow simplifier optimization. MTG is about 36% slower than the AutoRest.CSharp run (
680.27s / 499.39s).Profiling phase isolation
The profiling data below comes from the latest-main/temp profiling setup. It should be read as phase isolation for the current generator hotspot, not as the normal end-to-end Network
RegenSdkLocaltime. In this setup, the temporary spec/project inputs had already been prepared on disk before the measured generator-focused runs.Simplifier.ReduceAsyncskip experiment. This is not the normal full NetworkRegenSdkLocalworkflow timing.tspCodeModel.json/Configuration.json, baseline/no narrow-simplifier tweakThe main conclusion is: the measured multi-minute hotspot is the .NET generator post-processing phase, not TypeSpec/emitter code-model creation.
The narrow simplifier experiment reduced direct-generator wall-clock from about 12m12s in the baseline profiling setup to ~8m13s in the fresh current-branch rerun. The current manual full-regeneration comparison still shows a remaining gap versus AutoRest.CSharp, so further work should focus on reducing the direct-generator post-processing cost and validating the win in full manual regen.
Emitter/code-model timing
I isolated the TypeSpec + emitter path with temporary local emitter instrumentation and
AZSDK_CSHARP_EMITTER_SKIP_GENERATOR=1, so the run writestspCodeModel.json/Configuration.jsonbut does not invoke the .NET generator.RegenSdkLocalemitter-only worker wall-clockcreateSdkContextcreateModelupdateCodeModelcallbackserializeCodeModeltspCodeModel.json+Configuration.jsonemitAzureCodeModelThe generated
tspCodeModel.jsonwas 39.9 MiB and contained 1 client, 1,065 models, and 284 enums. This confirms code-model creation and writing are seconds, not minutes.A full non-skip emitter run currently hits the known Network customization validation issue before reaching Roslyn post-processing (
CodeGenSuppressonVirtualMachineScaleSetVmNetworkResource), so it is not useful as a full-generation timing. Direct-generator profiling from saved inputs is the useful measurement for the .NET generator phase.Direct .NET generator timing
I reran the Network direct generator profile from fresh saved inputs with the current local MTG branch on 2026-06-02. This run used the narrow simplifier behavior, so it validates the current optimized shape rather than the older baseline.
Direct generator timing, shown as per-step elapsed cost:
The direct generator wrote 3,542 generated files, including 1,192
.Serialization.csfiles. This fresh run confirms that, in the isolated .NET generator phase, Roslyn post-processing still accounts for almost all elapsed time even though Network has many providers, visitors, models, and writers before post-processing.Roslyn simplifier category breakdown
On 2026-06-02, I reran the isolated Network direct-generator baseline to check timing drift and compared broad/root simplifier output against a temporary no-simplifier run. The ~20 minute broad/root simplifier result is reproducible.
00:07:36.29458.751s00:19:42.391185.422s00:19:5720m00sThe no-simplifier output differed from the baseline in
3,534 / 3,542generated.csfiles across108,504diff blocks. Categories overlap because one hunk can include several simplifications.global::/ name simplificationthis.qualificationcrefsimplificationConclusion: Roslyn is mostly doing semantic name/global-alias reduction across nearly every Network file. This explains why span filtering only produced a small win, while disabling simplification drops the isolated generator to ~7m36s but changes output everywhere. The next promising experiments should replace lower-risk generated patterns first (
this.removal, XMLcref, redundant parentheses/generic inference) before attempting broadglobal::type-name emission.CPU tracing points at Roslyn simplification/semantic work:
ReduceAsynctaskCSharpNameReducer.SimplifyNameNameSimplifier.TrySimplifyCSharpSemanticModel.GetSymbolInfoCSharpNameReducer.Rewriter.VisitMemberAccessExpressionExclusive samples are dominated by thread-pool wait/lock contention (
LowLevelLifoSemaphore.WaitForSignal,Monitor.Enter_Slowpath), consistent with expensive concurrent Roslyn simplification and contention rather than management provider construction.CodeWriter string/byte conversion check
I checked the shared MTG
CodeWriterpath after a teammate raised possible heavy string/byte conversion overhead. The generated C# source path is mostly char/string based:CodeWriterwrites chars intoUnsafeBufferSequence : IBufferWriter<char>.TypeProviderWriter.Write()materializes one final string per generated file viaCodeWriter.ToString().GeneratedCodeWorkspace.AddGeneratedFileadds that string to Roslyn.GetGeneratedFilesAsynclater calls RoslynSourceText.ToString().CSharpGenwrites final files withFile.WriteAllTextAsync, which encodes once at disk write time.There is a char-to-byte conversion path in
UnsafeBufferSequence.Reader.CopyTo(Stream)/CopyToAsync(Stream)viaEncoding.UTF8.GetBytes(...), but that is not the normal generated.cssource path used by Network file generation. It is used by stream/BinaryData helper paths.The Network phase timings also bound the possible impact: in the rebuilt direct-generator run, provider construction + visitors + CodeWriter + adding generated files to the workspace reached
00:00:17.62; visitors alone completed by00:00:11.45, so generated source writing/add-to-workspace was about6.18s. Final file write after Roslyn was about1.95s. Roslyn post-processing was00:19:19.58after that point (00:19:37.21cumulative from00:00:17.62).Conclusion: CodeWriter/string/byte conversion is worth keeping efficient, but it is not the current Network MTG hotspot. Even eliminating all pre/post Roslyn source materialization would save seconds, while the measured issue is roughly 19 minutes of Roslyn simplification work.
Follow-up easy category experiments
I also tried the lower-risk categories from the breakdown to see whether we could preserve exact generated output while reducing Roslyn work before tackling broad
global::name reduction.Simplifier.ReduceAsync(document, spans)npm run test:generatorpassed. Rebuilt Network direct-generator run was00:19:39.16internal /1182.394sshell, with Roslyn post-processing00:19:37.21.00:19:42.39internal /1185.422sshell).this.emission fromSnippet.Thisthis.Create...toCreate....crefemissionIListvsIList{T}), and after fixing that still changed exact generated output (SampleTypeSpec.Thing->Thing,Models.Custom.Friend->Friend).One fast Network run (
~7m22) was discarded after validation because the output still contained unsimplified tokens (93,679global::Systemoccurrences,12,448this.occurrences, and9,774global::crefs), meaning the profiling clone had used stale/no-simplifier assemblies. After rebuilding the profiling generator, the output was properly simplified (2global::System,31this.,0global::crefs) and timing returned to the ~20 minute baseline.Conclusion from these easy experiments: the overlapping low-risk categories do not materially reduce Network time on their own. The remaining meaningful target is still broad semantic name/global-alias simplification across nearly every file, or a more structural change that avoids asking Roslyn to rediscover reductions that the generator can emit safely.
Local MTG post-processing profile
I temporarily wired the management generator worktree to local TypeSpec/MTG source under
/workspaces/typespecto instrument shared MTG post-processing directly. Those local project-reference and instrumentation edits were used only for profiling and are not intended as product changes in this PR.Local MTG direct-generator run, shown as per-step elapsed cost rather than cumulative markers:
MTG post-processing instrumentation for 4,168 documents showed that the expensive portion was
Simplifier.ReduceAsync; syntax-root/semantic-model lookup, member removal, library rewriters, and formatting were comparatively small.Optimization experiments
Roslyn 4.1.0 comparison
I temporarily changed local MTG from
Microsoft.CodeAnalysis.CSharp.Workspaces4.8.0 to 4.1.0 to match AutoRest.CSharp's Roslyn version and reran the direct Network generator from saved inputs.The Roslyn downgrade was slower for this workload, so the Roslyn version difference does not look like the cause of AutoRest.CSharp being faster.
Narrow simplifier experiment
I tested a local MTG change that avoids blanket
Simplifier.Annotationon every generated document root and skipsSimplifier.ReduceAsyncfor documents that have no remainingSimplifier.Annotationafter member removal/rewriters.This cut direct-generator wall-clock by about 4m09s versus the local baseline.
A focused MTG draft PR has been opened for this improvement: microsoft/typespec#10846.
AutoRest.CSharp comparison
The latest manual Network regen comparison is:
00:11:20.2700:08:19.39So AutoRest.CSharp is still faster by about 3m00.88s for the manual Network regen path.
AutoRest.CSharp uses the same broad post-processing pattern at a high level: generated documents are added to a Roslyn workspace, generated document roots are annotated for simplification, then
Simplifier.ReduceAsyncruns during post-processing. The Network performance difference does not appear to be because AutoRest.CSharp avoids Roslyn simplification entirely.The workloads are different:
src/Generatedsrc/Generatedsrc/GeneratedArmNetworkModelFactory.cs~0.9 MiBArmNetworkModelFactory.cs~1.7 MiBThe broad
global::fallback experiment also showed why simply matching AutoRest-style all-file simplification is not a good MTG fix:global::appeared in 3,681 / 4,171 generated documents, so that fallback re-enabled simplification for almost every file and erased most of the narrow-simplifier win.Conclusion: AutoRest.CSharp is still faster in the latest manual regen comparison. The current evidence still points to generated-code shape and Roslyn post-processing workload as the main difference, not to AutoRest.CSharp avoiding Roslyn simplification altogether. The better MTG direction remains targeted simplification plus direct valid code emission where possible, while continuing to validate improvements against full manual Network regen.
Next focus
Continue improving shared
Microsoft.TypeSpec.GeneratorRoslyn post-processing, especiallyGeneratedCodeWorkspace.ProcessDocument. The current best target is to avoid full-documentSimplifier.ReduceAsyncunless a document has meaningful simplifier annotations, rather than forcing simplification across every generated file.