Provisioning on shared storage without topology.kubernetes.io/zone label by Mrton0121 · Pull Request #357 · sergelogvinov/proxmox-csi-plugin

Mrton0121 · 2025-05-07T13:44:23Z

Pull Request

What? (description)

I modified the CSI in a way that you can set the SHARED_STORAGE environment variable in the pods and if it's "1" then you don't need to set the topology.kubernetes.io/zone label on the kubernetes nodes.

Why? (reasoning)

I have a loadbalancer in front of the proxmox cluster and wanted to spread the API load between the nodes. With the topology.kubernetes.io/zone label, I couldn't do that because it asked that node everytime.

Acceptance

Please use the following checklist:

you linked an issue (if applicable)
you included tests (if applicable)
you ran conformance (make conformance)
you linted your code (make lint)
you linted your code (make unit)

See make help for a description of the available targets.

sergelogvinov · 2025-05-07T14:33:00Z

Hello. Thank you for PR.

The CSI plugin should work with an external load balancer placed in front of the Proxmox API. Could you please share what issue you encountered with the load balancer setup?

Regarding env.SHARED_STORAGE, it introduces implicit logic and additional complexity, which can make the project harder to maintain. In most cases, the Proxmox API still requires the correct node name to be set, even when using shared disk storage. https://pve.proxmox.com/pve-docs/api-viewer/

Mrton0121 · 2025-05-09T11:40:25Z

Hi!
Thanks for your reply!

In a HA proxmox cluster there is no guarantee that the kuberentes node (VM) will be at the same place every time.

I also have ProxLB installed which is moving the VMs with a given schedule to the least used proxmox node.

In an enterprise environment it is required for the CSI to fetch the correct node dinamically.
There were already functions to get the node with the given storage. I just used the already existing functions to make this possible.

We really need this functionality to make our kubernetes clusters work.
Can we somehow find a middle ground?

sergelogvinov · 2025-05-09T12:33:57Z

Let's get back to the issue.

What kind of error do you encounter when using an external load balancer on top of the Proxmox API?

Mrton0121 · 2025-05-12T13:48:42Z

Sorry for being misleading with this LB issue.
I work with a big HA proxmox cluster with multiple hundred machines. I would like to have the option to dinamically generate an api endpoint for the operations of the volumes.

sergelogvinov · 2025-05-13T05:06:39Z

Hello, here small example how to use sidecar as load balancer for proxmox-api

https://github.com/sergelogvinov/proxmox-cloud-controller-manager/blob/main/docs/loadbalancer.md

Try this method for proxmox-csi helm chart.

proxmox-csi-plugin/charts/proxmox-csi-plugin/values.yaml

Lines 236 to 256 in 797dc62

    
           initContainers: 
        
             [] 
        
             # - name: loadbalancer 
        
             #   restartPolicy: Always 
        
             #   image: ghcr.io/sergelogvinov/haproxy:2.8.3-alpine3.18 
        
             #   imagePullPolicy: IfNotPresent 
        
             #   env: 
        
             #     - name: SVC 
        
             #       value: "proxmox.domain.com" 
        
             #     - name: PORT 
        
             #       value: "8006" 
        
             #   securityContext: 
        
             #     runAsUser: 99 
        
             #     runAsGroup: 99 
        
             #   resources: 
        
             #     limits: 
        
             #       cpu: 50m 
        
             #       memory: 64Mi 
        
             #     requests: 
        
             #       cpu: 50m 
        
             #       memory: 32Mi

miberecz · 2025-05-13T08:57:43Z

Let’s set aside the load balancer discussion — that was a misunderstanding in the earlier phrasing. The load balancer is working fine.

The core issue this PR addresses is the use of a fixed zone label for CSI operations. In dynamic HA clusters, this approach becomes problematic. A fixed zone label ties CSI API operations to a single node — but what happens when that node is down for maintenance or fails? This is a common scenario in larger Proxmox clusters where node-level maintenance (BIOS/OS/CEPH upgrades, disk swaps, etc.) is ongoing.

Additionally, using a fixed zone label creates a bottleneck for API operations. Even though requests are routed through a load balancer, the fixed host reference in the API endpoint causes all requests to be redirected to the same node. This leads to scalability and reliability issues — for instance, during high-load operations like provisioning 100 volumes for a Redis deployment with 100 replicas. The Proxmox API simply can't handle that volume of requests efficiently when funneled through a single node.

Our cluster uses an additional layer called proxLB to automatically migrate VMs based on load or availability. In such a setup, VMs regularly move between nodes, and assuming a static zone no longer reflects reality. For shared storage accessible from all nodes, the zone label isn’t required, and in fact causes the issues described above.

This PR introduces an improvement: the ability to handle dynamically changing clusters while preserving compatibility with static setups. We plan to use CSI extensively, and since we are contributing the feature, we are committed to supporting it as well. If we can align on this, it could open the door for more collaboration in the future.

sergelogvinov · 2025-05-15T05:13:58Z

Thank you, I understand now.

The Proxmox API requires the node name for most API calls.
The issue here is that when using a shared block device, we need to specify the correct node name — either the destination node where we intend to mount the storage or the node through which we access the API.

I believe these changes do not require SHARED_STORAGE, which would otherwise add unnecessary complexity to the plugin bootstrap process.

sergelogvinov · 2025-05-15T05:23:17Z

For such a large cluster, we should consider implementing additional changes — for example, adding extra checks when modifying the VM configuration. However, it would be better to address this in a separate issue or pull request

#358

sergelogvinov · 2025-05-15T14:26:47Z

So, would you mind @miberecz squashing the commits into one and signing off the changes (DCO https://github.com/sergelogvinov/proxmox-csi-plugin/blob/main/CONTRIBUTING.md)? I’d be happy to help if you need any assistance with that.

commit 1874eed Author: Mrton0121 <gombimarci@gmail.com> Date: Wed May 7 13:43:17 2025 +0000 finalizee Signed-off-by: Mrton0121 <gombimarci@gmail.com> commit b9e2e0b Author: root <gombimarci@gmail.com> Date: Wed May 7 13:41:02 2025 +0000 finalize Signed-off-by: root <gombimarci@gmail.com> commit cd229da Author: root <gombimarci@gmail.com> Date: Wed May 7 13:33:35 2025 +0000 feat: update README.md commit 1f3bbf8 Author: root <gombimarci@gmail.com> Date: Wed May 7 13:30:50 2025 +0000 feat: using shared storage without applying zone labels for the kubernetes nodes Signed-off-by: Mrton0121 <gombimarci@gmail.com>

miberecz · 2025-05-16T07:26:18Z

Its now signed, squashing can be done at merge. Or if you want we can open a new PR with only one commit.

Mrton0121 · 2025-05-20T08:55:07Z

Hey!
I created the squashed commits and signed them off. Is it okay like this?

BR,
Márton

sergelogvinov · 2025-05-22T02:49:11Z

Its now signed, squashing can be done at merge. Or if you want we can open a new PR with only one commit.

Sorry for delay. Yeah, GitHub Actions requires a single commit with a sign-off message. Squashing commits via GitHub changes the commit message and SHA.

…tes nodes It can be accessed from multiple nodes regardless of their region assignment. This simplifies configuration and eliminates the need for zone topology-based scheduling Signed-off-by: Mrton0121 <gombimarci@gmail.com> Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>

sergelogvinov · 2025-05-22T03:50:52Z

Hello @Mrton0121, could you run the integration tests in your cluster using my latest changes?

The label topology.kubernetes.io/zone must exist on the node, because the node component of the plugin cannot determine the node name by itself — it does not have the credentials to call the Proxmox API. With only shared storage, it's probably not that important, and you can try using any value for this label (based on general topology considerations)

sergelogvinov · 2025-05-25T01:35:46Z

If you have only one storage class, gathering storage capacity information is probably unnecessary, you can disable it #366

Mrton0121 · 2025-05-26T07:34:27Z

Hey! I'll get back to you about the changes ASAP

Signed-off-by: root <gombimarci@gmail.com>

Signed-off-by: Mrton0121 <gombimarci@gmail.com>

Mrton0121 · 2025-05-28T13:40:10Z

Hi!

Sorry for the delay.
I tried out the latest changes. It seems like it's working!

sergelogvinov · 2025-06-15T05:04:16Z

I’ve been thinking about your ideas a few weeks now. I believe the best way to solve this is by adding smart routing logic to the Proxmox API go-module. But right now, it’s quite difficult to add this kind of feature to the current module.

I also thought about using another go module, but it didn’t find better than the current one.
I’ve already added some improvements in the edge version, but they are still not enough.

If you have any ideas or suggestions, feel free to contact me on kubernetes slack. I’ll be happy to talk more.

Mrton0121 added 2 commits May 15, 2025 16:36

feat: update README.md

5631afb

Mrton0121 force-pushed the main branch from c154898 to 89ab091 Compare May 15, 2025 14:44

sergelogvinov force-pushed the main branch 3 times, most recently from b5f9a02 to eb86759 Compare May 22, 2025 03:38

sergelogvinov force-pushed the main branch from eb86759 to dbe2dca Compare May 22, 2025 03:39

Merge branch 'sergelogvinov:main' into main

0168794

Mrton0121 added 3 commits May 27, 2025 10:58

finalize

a77e636

Signed-off-by: root <gombimarci@gmail.com>

finalizee

acc6f67

Signed-off-by: Mrton0121 <gombimarci@gmail.com>

Merge branch 'main' of github.com:Mrton0121/proxmox-csi-plugin

3355390

sergelogvinov mentioned this pull request Jun 11, 2025

feat: get capacity with shared storages #388

Merged

5 tasks

github-actions Bot added the Stale label Aug 14, 2025

Conversation

Mrton0121 commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

What? (description)

Why? (reasoning)

Acceptance

Uh oh!

sergelogvinov commented May 7, 2025

Uh oh!

Mrton0121 commented May 9, 2025

Uh oh!

sergelogvinov commented May 9, 2025

Uh oh!

Mrton0121 commented May 12, 2025

Uh oh!

sergelogvinov commented May 13, 2025

Uh oh!

miberecz commented May 13, 2025

Uh oh!

sergelogvinov commented May 15, 2025

Uh oh!

sergelogvinov commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergelogvinov commented May 15, 2025

Uh oh!

miberecz commented May 16, 2025

Uh oh!

Mrton0121 commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergelogvinov commented May 22, 2025

Uh oh!

sergelogvinov commented May 22, 2025

Uh oh!

sergelogvinov commented May 25, 2025

Uh oh!

Mrton0121 commented May 26, 2025

Uh oh!

Mrton0121 commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergelogvinov commented Jun 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mrton0121 commented May 7, 2025 •

edited

Loading

sergelogvinov commented May 15, 2025 •

edited

Loading

Mrton0121 commented May 20, 2025 •

edited

Loading

Mrton0121 commented May 28, 2025 •

edited

Loading