Skip to content

Provisioning on shared storage without topology.kubernetes.io/zone label#357

Open
Mrton0121 wants to merge 7 commits intosergelogvinov:mainfrom
Mrton0121:main
Open

Provisioning on shared storage without topology.kubernetes.io/zone label#357
Mrton0121 wants to merge 7 commits intosergelogvinov:mainfrom
Mrton0121:main

Conversation

@Mrton0121
Copy link
Copy Markdown

@Mrton0121 Mrton0121 commented May 7, 2025

Pull Request

What? (description)

I modified the CSI in a way that you can set the SHARED_STORAGE environment variable in the pods and if it's "1" then you don't need to set the topology.kubernetes.io/zone label on the kubernetes nodes.

Why? (reasoning)

I have a loadbalancer in front of the proxmox cluster and wanted to spread the API load between the nodes. With the topology.kubernetes.io/zone label, I couldn't do that because it asked that node everytime.

Acceptance

Please use the following checklist:

  • you linked an issue (if applicable)
  • you included tests (if applicable)
  • you ran conformance (make conformance)
  • you linted your code (make lint)
  • you linted your code (make unit)

See make help for a description of the available targets.

@sergelogvinov
Copy link
Copy Markdown
Owner

Hello. Thank you for PR.

The CSI plugin should work with an external load balancer placed in front of the Proxmox API. Could you please share what issue you encountered with the load balancer setup?

Regarding env.SHARED_STORAGE, it introduces implicit logic and additional complexity, which can make the project harder to maintain. In most cases, the Proxmox API still requires the correct node name to be set, even when using shared disk storage. https://pve.proxmox.com/pve-docs/api-viewer/

@Mrton0121
Copy link
Copy Markdown
Author

Hi!
Thanks for your reply!

In a HA proxmox cluster there is no guarantee that the kuberentes node (VM) will be at the same place every time.

I also have ProxLB installed which is moving the VMs with a given schedule to the least used proxmox node.

In an enterprise environment it is required for the CSI to fetch the correct node dinamically.
There were already functions to get the node with the given storage. I just used the already existing functions to make this possible.

We really need this functionality to make our kubernetes clusters work.
Can we somehow find a middle ground?

@sergelogvinov
Copy link
Copy Markdown
Owner

Let's get back to the issue.

What kind of error do you encounter when using an external load balancer on top of the Proxmox API?

@Mrton0121
Copy link
Copy Markdown
Author

Sorry for being misleading with this LB issue.
I work with a big HA proxmox cluster with multiple hundred machines. I would like to have the option to dinamically generate an api endpoint for the operations of the volumes.

@sergelogvinov
Copy link
Copy Markdown
Owner

Hello, here small example how to use sidecar as load balancer for proxmox-api

https://github.com/sergelogvinov/proxmox-cloud-controller-manager/blob/main/docs/loadbalancer.md

Try this method for proxmox-csi helm chart.

initContainers:
[]
# - name: loadbalancer
# restartPolicy: Always
# image: ghcr.io/sergelogvinov/haproxy:2.8.3-alpine3.18
# imagePullPolicy: IfNotPresent
# env:
# - name: SVC
# value: "proxmox.domain.com"
# - name: PORT
# value: "8006"
# securityContext:
# runAsUser: 99
# runAsGroup: 99
# resources:
# limits:
# cpu: 50m
# memory: 64Mi
# requests:
# cpu: 50m
# memory: 32Mi

@miberecz
Copy link
Copy Markdown

Let’s set aside the load balancer discussion — that was a misunderstanding in the earlier phrasing. The load balancer is working fine.

The core issue this PR addresses is the use of a fixed zone label for CSI operations. In dynamic HA clusters, this approach becomes problematic. A fixed zone label ties CSI API operations to a single node — but what happens when that node is down for maintenance or fails? This is a common scenario in larger Proxmox clusters where node-level maintenance (BIOS/OS/CEPH upgrades, disk swaps, etc.) is ongoing.

Additionally, using a fixed zone label creates a bottleneck for API operations. Even though requests are routed through a load balancer, the fixed host reference in the API endpoint causes all requests to be redirected to the same node. This leads to scalability and reliability issues — for instance, during high-load operations like provisioning 100 volumes for a Redis deployment with 100 replicas. The Proxmox API simply can't handle that volume of requests efficiently when funneled through a single node.

Our cluster uses an additional layer called proxLB to automatically migrate VMs based on load or availability. In such a setup, VMs regularly move between nodes, and assuming a static zone no longer reflects reality. For shared storage accessible from all nodes, the zone label isn’t required, and in fact causes the issues described above.

This PR introduces an improvement: the ability to handle dynamically changing clusters while preserving compatibility with static setups. We plan to use CSI extensively, and since we are contributing the feature, we are committed to supporting it as well. If we can align on this, it could open the door for more collaboration in the future.

@sergelogvinov
Copy link
Copy Markdown
Owner

Thank you, I understand now.

The Proxmox API requires the node name for most API calls.
The issue here is that when using a shared block device, we need to specify the correct node name — either the destination node where we intend to mount the storage or the node through which we access the API.

I believe these changes do not require SHARED_STORAGE, which would otherwise add unnecessary complexity to the plugin bootstrap process.

@sergelogvinov
Copy link
Copy Markdown
Owner

sergelogvinov commented May 15, 2025

For such a large cluster, we should consider implementing additional changes — for example, adding extra checks when modifying the VM configuration. However, it would be better to address this in a separate issue or pull request

#358

@sergelogvinov
Copy link
Copy Markdown
Owner

So, would you mind @miberecz squashing the commits into one and signing off the changes (DCO https://github.com/sergelogvinov/proxmox-csi-plugin/blob/main/CONTRIBUTING.md)? I’d be happy to help if you need any assistance with that.

Mrton0121 added 2 commits May 15, 2025 16:36
commit 1874eed
Author: Mrton0121 <gombimarci@gmail.com>
Date:   Wed May 7 13:43:17 2025 +0000

    finalizee

    Signed-off-by: Mrton0121 <gombimarci@gmail.com>

commit b9e2e0b
Author: root <gombimarci@gmail.com>
Date:   Wed May 7 13:41:02 2025 +0000

    finalize

    Signed-off-by: root <gombimarci@gmail.com>

commit cd229da
Author: root <gombimarci@gmail.com>
Date:   Wed May 7 13:33:35 2025 +0000

    feat: update README.md

commit 1f3bbf8
Author: root <gombimarci@gmail.com>
Date:   Wed May 7 13:30:50 2025 +0000

    feat: using shared storage without applying zone labels for the kubernetes nodes

Signed-off-by: Mrton0121 <gombimarci@gmail.com>
@miberecz
Copy link
Copy Markdown

Its now signed, squashing can be done at merge. Or if you want we can open a new PR with only one commit.

@Mrton0121
Copy link
Copy Markdown
Author

Mrton0121 commented May 20, 2025

Hey!
I created the squashed commits and signed them off. Is it okay like this?

BR,
Márton

@sergelogvinov
Copy link
Copy Markdown
Owner

Its now signed, squashing can be done at merge. Or if you want we can open a new PR with only one commit.

Sorry for delay. Yeah, GitHub Actions requires a single commit with a sign-off message. Squashing commits via GitHub changes the commit message and SHA.

@sergelogvinov sergelogvinov force-pushed the main branch 3 times, most recently from b5f9a02 to eb86759 Compare May 22, 2025 03:38
…tes nodes

It can be accessed from multiple nodes regardless of their region assignment.
This simplifies configuration and eliminates the need for zone topology-based scheduling

Signed-off-by: Mrton0121 <gombimarci@gmail.com>
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
@sergelogvinov
Copy link
Copy Markdown
Owner

Hello @Mrton0121, could you run the integration tests in your cluster using my latest changes?

The label topology.kubernetes.io/zone must exist on the node, because the node component of the plugin cannot determine the node name by itself — it does not have the credentials to call the Proxmox API. With only shared storage, it's probably not that important, and you can try using any value for this label (based on general topology considerations)

@sergelogvinov
Copy link
Copy Markdown
Owner

If you have only one storage class, gathering storage capacity information is probably unnecessary, you can disable it #366

@Mrton0121
Copy link
Copy Markdown
Author

Hey! I'll get back to you about the changes ASAP

Mrton0121 added 3 commits May 27, 2025 10:58
Signed-off-by: root <gombimarci@gmail.com>
Signed-off-by: Mrton0121 <gombimarci@gmail.com>
@Mrton0121
Copy link
Copy Markdown
Author

Mrton0121 commented May 28, 2025

Hi!

Sorry for the delay.
I tried out the latest changes. It seems like it's working!

@sergelogvinov
Copy link
Copy Markdown
Owner

I’ve been thinking about your ideas a few weeks now. I believe the best way to solve this is by adding smart routing logic to the Proxmox API go-module. But right now, it’s quite difficult to add this kind of feature to the current module.

I also thought about using another go module, but it didn’t find better than the current one.
I’ve already added some improvements in the edge version, but they are still not enough.

If you have any ideas or suggestions, feel free to contact me on kubernetes slack. I’ll be happy to talk more.

@github-actions github-actions Bot added the Stale label Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants