Commit 9d2e608
authored
[Fix]: $HOME in launcher eagle example (#1365)
### What does this PR do?
Type of change: Bug fix <!-- Use one of the following: Bug fix, new
feature, new example, new tests, documentation. -->
<!-- Details about the change. -->
Launcher example bug raised by @cjluo-nv
Before fix: task1 in
tools/launcher/examples/Qwen/Qwen3-8B/hf_online_eagle3.yaml fails
Reason: due to `HOME: /tmp` set in container, enroot credentials in
`$HOME/.config/enroot/.crendential` not found
```
GpuFreq=control_disabled
pyxis: importing docker image: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc10
Apr 28 13:35:59.491365 2515157 slurmstepd 0x155552c3b780: error: pyxis: child 2515158 failed with error code: 1
Apr 28 13:35:59.491415 2515157 slurmstepd 0x155552c3b780: error: pyxis: failed to import docker image
Apr 28 13:35:59.491433 2515157 slurmstepd 0x155552c3b780: error: pyxis: printing enroot log file:
Apr 28 13:35:59.491453 2515157 slurmstepd 0x155552c3b780: error: pyxis: [INFO] Querying registry for permission grant
Apr 28 13:35:59.491469 2515157 slurmstepd 0x155552c3b780: error: pyxis: [INFO] Authenticating with user: <anonymous>
Apr 28 13:35:59.491483 2515157 slurmstepd 0x155552c3b780: error: pyxis: [INFO] Authentication succeeded
Apr 28 13:35:59.491499 2515157 slurmstepd 0x155552c3b780: error: pyxis: [INFO] Fetching image manifest list
Apr 28 13:35:59.491512 2515157 slurmstepd 0x155552c3b780: error: pyxis: [INFO] Fetching image manifest
Apr 28 13:35:59.491524 2515157 slurmstepd 0x155552c3b780: error: pyxis: [ERROR] URL https://registry-1.docker.io/v2/nvcr.io/nvidia/tensorrt-llm/release/manifests/1.3.0rc10 returned error code: 401 Unauthorized
Apr 28 13:35:59.491564 2515157 slurmstepd 0x155552c3b780: error: pyxis: couldn't start container
Apr 28 13:35:59.491579 2515157 slurmstepd 0x155552c3b780: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
Apr 28 13:35:59.491593 2515157 slurmstepd 0x155552c3b780: error: Failed to invoke spank plugin stack
Apr 28 13:35:59.515523 2515146 slurmstepd 0x155552c3b780: error: pyxis: child 2515240 failed with error code: 1
```
After fix:
```
GpuFreq=control_disabled
pyxis: importing docker image: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc10
pyxis: imported docker image: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc10
```
### Usage
```python
# Add a code snippet demonstrating how to use this
```
### Testing
<!-- Mention how have you tested your change if applicable. -->
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain
why. -->
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A
<!--- Mandatory -->
- Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory
for new features or examples. -->
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes
or backward incompatible changes. -->
### Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated example pipeline to use the standardized dataset example path.
* Removed unnecessary per-task overrides of the process home and cache
directory to simplify environment setup.
* Preserved required model checkpoint environment setting for the
relevant task so model resolution continues to work.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>1 parent 50706d1 commit 9d2e608
1 file changed
Lines changed: 1 addition & 6 deletions
Lines changed: 1 addition & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | 47 | | |
51 | 48 | | |
52 | 49 | | |
| |||
68 | 65 | | |
69 | 66 | | |
70 | 67 | | |
71 | | - | |
72 | | - | |
73 | 68 | | |
74 | 69 | | |
75 | 70 | | |
| |||
0 commit comments