fix: #981 (#983)

ChenhanYu · kevalmorabia97 · coderabbitai[bot] · web-flow · commit 52cfa4ecffed · 2026-03-18T15:30:23.000-07:00
### What does this PR do? Type of change: Bug fix  An issue is reported in #981 where `str(v)` on some `TransformerConfig` fields will raise `TypeError`. We remove the yaml saving logic entirely as it's unused and can cause future errors still. ### Usage ```python # Add a code snippet demonstrating how to use this ``` ### Testing  ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, using `torch.load(..., weights_only=True)`, avoiding `pickle`, etc.). - Is this change backward compatible?: ✅ / ❌ / N/A  - If you copied code from any other source, did you follow IP policy in [CONTRIBUTING.md](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md#-copying-code-from-other-sources)?: ✅ / ❌ / N/A  - Did you write any new necessary tests?: ✅ / ❌ / N/A  - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ / ❌ / N/A  ### Additional Information   ## Summary by CodeRabbit * **Bug Fixes** * Improved checkpoint loading stability by handling unusual configuration values more gracefully; such values no longer cause failures and are skipped with a warning instead. * Reduced risk of crashes during configuration processing when encountering non-standard or unsupported objects. * **Chores** * Checkpoints no longer include saved run configuration or tool-version metadata, yielding smaller, simpler checkpoint files.  --------- Signed-off-by: Chenhan Yu <chenhany@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com>
diff --git a/modelopt/torch/opt/plugins/mcore_dist_checkpointing.py b/modelopt/torch/opt/plugins/mcore_dist_checkpointing.py
@@ -22,7 +22,6 @@
 from typing import Any
 
 import torch
-import yaml
 from megatron.core import dist_checkpointing, mpu
 from megatron.core.dist_checkpointing.serialization import get_default_load_sharded_strategy
 from megatron.core.dist_checkpointing.strategies.common import COMMON_STATE_FNAME
@@ -36,21 +35,6 @@
 
 SUPPORTED_WRAPPERS[Float16Module] = "module"
 
-DROP_SUBSTRINGS = [
-    "fp4",
-    "fp8",
-    "tp_",
-    "parallel",
-    "cuda_graph",
-    "init_",
-    "cpu",
-    "recompute",
-    "inference",
-    "pipeline",
-    "comm",
-    "batch",
-]
-
 
 def remove_per_module_state(
     modelopt_state: dict[str, Any],
@@ -138,29 +122,6 @@ def save_sharded_modelopt_state(
         sharded_strategy: configures sharded tensors saving behavior and backend
         prefix: the prefix to add to the modelopt_state keys ("model." for NeMo)
     """
-
-    def _parse_transformer_config(transformer_config: dict) -> dict:
-        config = {}
-
-        for k, v in transformer_config.items():
-            if any(substring in k for substring in DROP_SUBSTRINGS):
-                continue
-            if isinstance(v, (bool, int, str)):
-                config[k] = v
-            else:
-                config[k] = str(v)
-
-        return config
-
-    # Save own version of run config, if not already saved by the framework.
-    if dist.is_master() and not os.path.exists(f"{checkpoint_name}/run_config.yaml"):
-        run_config_name = f"{checkpoint_name}/modelopt_run_config.yaml"
-        # We avoid deepcopy since some attributes in Megatron-Bridge config cannot be deepcopied.
-        config_dict = _parse_transformer_config(model[0].config.__dict__)
-        config_dict["nvidia_modelopt_version"] = modelopt.__version__
-        with open(run_config_name, "w") as f:
-            yaml.dump(config_dict, f, default_flow_style=False)
-
     if not mto.ModeloptStateManager.is_converted(model[0]):
         return
     if len(model) > 1: