Skip to content

Commit eb07f72

Browse files
authored
SUSE AI: removed runTimeClassNames nvidia (#36)
* removed runtimeclassnames nvidia accross all AI files * fixed and removed 1 new instance of runtime ClassName * added info about no need to set runtimeClassName * improved a link
1 parent 8a29045 commit eb07f72

6 files changed

Lines changed: 10 additions & 16 deletions

DC-AI-deployment

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ ADOC_POST=yes
77
ADOC_TYPE=book
88
ADOC_ATTRIBUTES=" --attribute env-daps=1"
99
ADOC_ATTRIBUTES+=" --attribute PROF_PRODUCT=suseai"
10-
ADOC_ATTRIBUTES+=" --attribute PROF_PRODUCT=suseai"
1110
ADOC_ATTRIBUTES+=" --attribute PROF_DEPLOYMENT=standard"
1211

1312
STYLEROOT="/usr/share/xml/docbook/stylesheet/suse2022-ns"
@@ -17,4 +16,5 @@ FALLBACK_STYLEROOT="/usr/share/xml/docbook/stylesheet/suse-ns"
1716
DOCBOOK5_RNG_URI="http://docbook.org/xml/5.2/rng/docbookxi.rng"
1817

1918
#XSLTPARAM+=' --param toc.section.depth=2'
20-
#XSLTPARAM+=' --param bubbletoc.section.depth=3 --param bubbletoc.max.depth=3'
19+
#XSLTPARAM+=' --param bubbletoc.section.depth=3 --param bubbletoc.max.depth=3'
20+
#XSLTPARAM+=' --stringparam generate.toc="book title" '

references/ollama-helmchart.adoc

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,6 @@ global:
3636
ingress:
3737
enabled: false
3838
defaultModel: "gemma:2b"
39-
runtimeClassName: nvidia
4039
ollama:
4140
models:
4241
pull:

references/owui-helm-overrides.adoc

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -450,7 +450,6 @@ Following is an example of the `open-webui-pipelines-values.yaml` override file.
450450
451451
[source,yaml]
452452
----
453-
runtimeClassName: nvidia
454453
global:
455454
imagePullSecrets:
456455
- application-collection

references/pytorch-helm-overrides.adoc

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ include::../snippets/helm-chart-overrides-intro.adoc[]
1111
[source,yaml]
1212
----
1313
# pytorch_custom_overrides.yaml
14-
runtimeClassName: nvidia
1514
global:
1615
imagePullSecrets:
1716
- application-collection <.>
@@ -48,7 +47,6 @@ To create a ConfigMap, run the following command:
4847
[source,yaml]
4948
----
5049
# pytorch_custom_overrides.yaml
51-
runtimeClassName: nvidia
5250
global:
5351
imagePullSecrets:
5452
- application-collection
@@ -82,7 +80,6 @@ Move the `entrypoint.sh` file plus any helper files under the `scripts/` directo
8280
[source,yaml]
8381
----
8482
# pytorch_custom_overrides.yaml
85-
runtimeClassName: nvidia
8683
global:
8784
imagePullSecrets:
8885
- application-collection <.>
@@ -117,7 +114,6 @@ For production use, we recommend using a storage solution suitable for persisten
117114
[source,yaml]
118115
----
119116
# pytorch_custom_overrides.yaml
120-
runtimeClassName: nvidia
121117
global:
122118
imagePullSecrets:
123119
- application-collection <.>
@@ -153,7 +149,6 @@ For production use, we recommend using a storage solution suitable for persisten
153149
[source,yaml]
154150
----
155151
# pytorch_custom_overrides.yaml
156-
runtimeClassName: nvidia
157152
global:
158153
imagePullSecrets:
159154
- application-collection <.>

references/vllm-helm-overrides.adoc

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,6 @@ The following {vllm} override file includes basic configuration options.
111111
* Access to a {huggingface} token (`HF_TOKEN`).
112112
* The model `meta-llama/Llama-3.1-8B-Instruct` from this example is a gated model that requires you to accept the agreement to access it.
113113
For more information, see link:https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct[].
114-
* The `runtimeClassName` specified here is `nvidia`.
115114
* Update the `storageClass:` entry for each `modelSpec`.
116115
117116
[source,yaml]
@@ -121,7 +120,6 @@ global:
121120
imagePullSecrets:
122121
- application-collection
123122
servingEngineSpec:
124-
runtimeClassName: "nvidia"
125123
modelSpec:
126124
- name: "llama3" <.>
127125
registry: "dp.apps.rancher.io" <.>
@@ -263,7 +261,6 @@ global:
263261
imagePullSecrets:
264262
- application-collection
265263
servingEngineSpec:
266-
runtimeClassName: "nvidia"
267264
modelSpec:
268265
- name: "llama3"
269266
registry: "dp.apps.rancher.io"
@@ -383,7 +380,6 @@ global:
383380
imagePullSecrets:
384381
- application-collection
385382
servingEngineSpec:
386-
runtimeClassName: "nvidia"
387383
modelSpec:
388384
- name: "mistral"
389385
registry: "dp.apps.rancher.io"
@@ -432,7 +428,6 @@ global:
432428
imagePullSecrets:
433429
- application-collection
434430
servingEngineSpec:
435-
runtimeClassName: "nvidia"
436431
modelSpec:
437432
- name: "mistral"
438433
registry: "dp.apps.rancher.io"

tasks/NVIDIA-Operator-installation.adoc

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,14 @@ The NVIDIA operator restarts containerd with a hangup call which restarts RKE2.
7676

7777
[IMPORTANT]
7878
====
79-
The envvars `ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED`, `ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS` and `DEVICE_LIST_STRATEGY` are required to properly isolate GPU resources as explained in https://docs.google.com/document/d/1zy0key-EL6JH50MZgwg96RPYxxXXnVUdxLZwGiyqLd8/edit?tab=t.0[Preventing unprivileged access to GPUs in Kubernetes].
79+
The envvars `ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED`, `ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS` and `DEVICE_LIST_STRATEGY` are required to properly isolate GPU resources as explained in link:https://docs.google.com/document/d/1zy0key-EL6JH50MZgwg96RPYxxXXnVUdxLZwGiyqLd8/edit?tab=t.0[Preventing unprivileged access to GPUs in Kubernetes].
80+
====
81+
82+
[IMPORTANT]
83+
====
84+
NVIDIA GPU Operator v25.10.x uses link:https://github.com/cncf-tags/container-device-interface/blob/main/SPEC.md[Container Device Interface (CDI) specification] which simplifies operations.
85+
It is recommended that you enable CDI (the default) and the NRI plug-in on RKE2.
86+
With both features enabled, you no longer need to pass extra environment variables for security requirements or set `runtimeClassName: nvidia` in your pod specifications.
8087
====
8188

8289
[,yaml]
@@ -164,7 +171,6 @@ metadata:
164171
namespace: default
165172
spec:
166173
restartPolicy: OnFailure
167-
runtimeClassName: nvidia
168174
containers:
169175
- name: cuda-container
170176
image: nvcr.io/nvidia/k8s/cuda-sample:nbody

0 commit comments

Comments
 (0)