You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to propose an enhancement to the existing topology-aware volume provisioning in ceph-csi to support multi-ceph cluster environments. Currently, the topology-aware provisioning assumes volume creation within a single Ceph cluster. I'd like to extend this functionality to allow mapping of specific zones to different Ceph clusters, enabling the provisioning system to select the appropriate Ceph cluster based on the zone where a pod is scheduled.
What is the value to the end user? (why is it a priority?)
This feature would provide several benefits to end users:
Elimination of single-point-of-failure: By distributing storage across multiple Ceph clusters aligned with Kubernetes zones, we can avoid having the entire regional Kubernetes setup dependent on a single Ceph cluster.
Improved data locality: Volumes would be created in the Ceph cluster that corresponds to the zone where pods are running, potentially reducing network latency.
Better isolation and fault tolerance: Storage failures would be contained within specific zones/clusters rather than affecting the entire environment.
Enhanced scalability: Organizations can scale their storage infrastructure horizontally by adding new Ceph clusters for new zones.
How will we know we have a good solution? (acceptance criteria)
The solution should meet the following criteria:
StorageClass should support specifying multiple Ceph clusters with their corresponding topology information (zones).
When a PVC is created, the provisioner should be able to identify the appropriate Ceph cluster based on the pod's scheduling constraints or node affinity rules.
The solution should seamlessly integrate with existing topology-aware scheduling in Kubernetes.
No changes should be required in applications using the PVCs.
The feature should include documentation on how to configure and use multi-cluster topology-aware provisioning.
Existing deployments using single-cluster topology should continue to work without modification.
The solution should provide clear error messages when no suitable Ceph cluster can be found for a given topology constraint.
Additional context
Here's a sequence diagram showing the proposed workflow:
sequenceDiagram
participant User
participant K8s as Kubernetes API
participant CM as ConfigMap
participant SC as StorageClass
participant CSI as CSI Controller
participant Scheduler as K8s Scheduler
participant Node as K8s Node
participant CephA as Ceph Cluster A (Zone A)
participant CephB as Ceph Cluster B (Zone B)
User->>K8s: Create cluster topology ConfigMap
K8s-->>User: ConfigMap created
User->>K8s: Create StorageClass with volumeBindingMode: WaitForFirstConsumer
K8s-->>User: StorageClass created
User->>K8s: Create StatefulSet with PVCs using StorageClass
K8s-->>User: StatefulSet created
K8s->>K8s: Create unbound PVCs
Note over K8s, Scheduler: For each pod in StatefulSet
K8s->>Scheduler: Schedule pod
Scheduler->>K8s: Pod assigned to specific node in Zone A
K8s->>CSI: CreateVolumeRequest with selected-node and zone info
CSI->>CSI: pickZoneFromNode() extracts zone from node
CSI->>CM: Get cluster topology configuration
CM-->>CSI: Return topology mapping
CSI->>CSI: Match zone to appropriate Ceph cluster
Note over CSI: Determine that Zone A maps to Ceph Cluster A
CSI->>CephA: Create volume
CephA-->>CSI: Volume created
CSI->>K8s: Create PV with node affinity for Zone A
K8s->>K8s: Bind PVC to PV
K8s->>Node: Start pod with bound volume
Node->>CSI: Stage and publish volume
CSI->>CM: Get cluster info for volume
CM-->>CSI: Return cluster A connection details
CSI->>CephA: Connect to volume
CephA-->>Node: Volume mounted
Note over User,K8s: Later - Update topology (no disruption to existing volumes)
User->>K8s: Update cluster topology ConfigMap
K8s-->>CM: ConfigMap updated
Note over CSI: New volumes use updated topology mapping
Describe the feature you'd like to have
I would like to propose an enhancement to the existing topology-aware volume provisioning in ceph-csi to support multi-ceph cluster environments. Currently, the topology-aware provisioning assumes volume creation within a single Ceph cluster. I'd like to extend this functionality to allow mapping of specific zones to different Ceph clusters, enabling the provisioning system to select the appropriate Ceph cluster based on the zone where a pod is scheduled.
What is the value to the end user? (why is it a priority?)
This feature would provide several benefits to end users:
Elimination of single-point-of-failure: By distributing storage across multiple Ceph clusters aligned with Kubernetes zones, we can avoid having the entire regional Kubernetes setup dependent on a single Ceph cluster.
Improved data locality: Volumes would be created in the Ceph cluster that corresponds to the zone where pods are running, potentially reducing network latency.
Better isolation and fault tolerance: Storage failures would be contained within specific zones/clusters rather than affecting the entire environment.
Enhanced scalability: Organizations can scale their storage infrastructure horizontally by adding new Ceph clusters for new zones.
How will we know we have a good solution? (acceptance criteria)
The solution should meet the following criteria:
StorageClass should support specifying multiple Ceph clusters with their corresponding topology information (zones).
When a PVC is created, the provisioner should be able to identify the appropriate Ceph cluster based on the pod's scheduling constraints or node affinity rules.
The solution should seamlessly integrate with existing topology-aware scheduling in Kubernetes.
No changes should be required in applications using the PVCs.
The feature should include documentation on how to configure and use multi-cluster topology-aware provisioning.
Existing deployments using single-cluster topology should continue to work without modification.
The solution should provide clear error messages when no suitable Ceph cluster can be found for a given topology constraint.
Additional context
Here's a sequence diagram showing the proposed workflow:
The text was updated successfully, but these errors were encountered: