You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[5750013][5591945][5360813]: AutoCast standalone implementation for type inference (#719)
## What does this PR do?
**Type of change:** New feature
**Overview:**
AutoCast runs full type inference to get the new types after adding
casts. ONNX doesn't have a separate function for type inference, and it
is done as part of shape inference. Shape inference is a much more
complex task than type inference, especially when dynamic shapes are
involved. We're seeing some shape inference related bugs in AutoCast.
Typically we can WAR, but it's cumbersome. A local implementation might
allow users to WAR shape inference related issues. This is opt-in and
marked as experimental.
## Usage
python -m modelopt.onnx.autocast --onnx_path /path/to/input.onnx
[options] --use_standalone_type_inference
## Testing
Added use_standalone_type_inference=True to all existing
PrecisionConverter tests.
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: Yes
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes <!--- Only for new features, API changes, critical bug fixes or bw
breaking changes. -->
## Additional Information
A more permanent fix would be to decouple type and shape inference in
ONNX, we should invest in that when we have the resources - see
onnx/onnx#7100
. This is a quick fix, which is also why it is opt-in and not the
default mode.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added `--use_standalone_type_inference` flag to ONNX AutoCast,
enabling type-only inference as an alternative to standard shape
inference. Useful as a workaround when shape inference fails or to
reduce computational overhead.
* **Documentation**
* Added "Type Inference Control" section with usage examples and caveats
for the new standalone type inference option.
* **Tests**
* Extended test coverage to validate both standard and standalone type
inference paths.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,14 @@
1
1
NVIDIA Model Optimizer Changelog (Linux)
2
2
========================================
3
3
4
+
0.42 (TBD)
5
+
^^^^^^^^^^^^^^^^^
6
+
7
+
**Bug Fixes**
8
+
9
+
**New Features**
10
+
- Add standalone type inference option (``--use_standalone_type_inference``) in ONNX AutoCast as an alternative to ONNX's ``infer_shapes``. This experimental feature performs type-only inference without shape inference, useful as a workaround when shape inference fails or to avoid unnecessary shape inference overhead.
Copy file name to clipboardExpand all lines: docs/source/guides/8_autocast.rst
+18Lines changed: 18 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,6 +42,7 @@ AutoCast can also be used programmatically through its Python API:
42
42
trt_plugins=[], # list of TensorRT plugin library paths in .so format
43
43
max_depth_of_reduction=None, # maximum depth of reduction allowed in low precision
44
44
opset=None, # optional target ONNX opset version (default: 13 for fp16, 22 for bf16)
45
+
use_standalone_type_inference=False, # use standalone type inference instead of ONNX's infer_shapes (WAR)
45
46
)
46
47
47
48
# Save the converted model
@@ -82,6 +83,9 @@ AutoCast follows these steps to convert a model:
82
83
- Converts eligible nodes to lower precision
83
84
- Automatically inserts necessary cast operations
84
85
- Automatically replaces initializers with lower precision values
86
+
- Performs type inference to propagate types through the graph
87
+
- By default, uses ONNX's ``infer_shapes`` which performs both shape and type inference using the ONNX infer_shapes API.
88
+
- Use ``use_standalone_type_inference=True`` to use a standalone type-only inference implementation (experimental).
85
89
86
90
#. **Validation and Export**:
87
91
@@ -145,6 +149,14 @@ Best Practices
145
149
- A warning will be issued if you specify an opset lower than the original model's opset, as downgrading opset versions may cause compatibility issues.
146
150
- The opset may be automatically increased beyond your specified value if certain operations require it (e.g., quantization nodes require opset >= 19).
147
151
152
+
#. **Type Inference Control**
153
+
154
+
- By default, AutoCast uses ONNX's ``infer_shapes`` which performs both shape and type inference.
155
+
- Use ``--use_standalone_type_inference`` to enable a standalone type-only inference implementation.
156
+
- This is a workaround for cases where shape inference fails for any reason, which allows us to bypass the dependency in ONNX's shape inference logic.
157
+
- The standalone implementation uses graphsurgeon for topological sorting and handles special operators like Cast, QuantizeLinear, DequantizeLinear, Constant and ConstantOfShape.
158
+
- Note: The standalone type inference may be less robust than ONNX's implementation for edge cases, but avoids unnecessary shape inference overhead and possible failures.
159
+
148
160
Limitations and Restrictions
149
161
----------------------------
150
162
- AutoCast does not yet support quantized models.
@@ -198,3 +210,9 @@ Convert to BF16 with a specific opset:
0 commit comments