Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add annotations with ceph client address #4913

Closed
gman0 opened this issue Oct 18, 2024 · 10 comments · Fixed by #4944
Closed

Add annotations with ceph client address #4913

gman0 opened this issue Oct 18, 2024 · 10 comments · Fixed by #4944
Labels
dependency/csi-addons Related to a CSI-Addons issue or enhancement

Comments

@gman0
Copy link
Contributor

gman0 commented Oct 18, 2024

Describe the feature you'd like to have

Add annotations with ceph client addresses to relevant k8s objects.

What is the value to the end user? (why is it a priority?)

To reliably blocklist a client during a node split-brain, we need to stash the client's address somewhere, so that an operator (e.g. Rook) can eventually pick it up.

How to get the client's address?

Depending on network setup, nodeplugin's node IP may be different than what ends up connecting to the Ceph cluster. rados_getaddrs (ceph/go-ceph#1037) could be used to reliably retrieve the IP that Ceph sees.

Where to store the annotation?

Annotating the nodeplugin's node should be sufficient if we're aiming for blocklisting the whole node.

When to annotate?

Nodeplugin receives cluster credentials only when it receives the respective RPC calls, so we'll need to do it there. This unfortunately makes the driver Kuberentes aware.

What to annotate?

To be figured out. In general we need a mapping between Ceph cluster ID and client IP. If the IP of clients on a single node is expected to change (e.g. what if the nodeplugin is deployed with hostNetwork: false ?), we may need to keep track of more IPs for one cluster ID entry.

Another option would be to keep track of the full entity address (IP + nonce) -- in that case it might make more sense to annotate the VolumeAttachment object instead?

How will we know we have a good solution? (acceptance criteria)

Node fencing / client blocklisting can be performed reliably.


If nobody else is working on this topic I can send a PR once we agree on all the details mentioned above.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 21, 2024

To be figured out. In general we need a mapping between Ceph cluster ID and client IP. If the IP of clients on a single node is expected to change (e.g. what if the nodeplugin is deployed with hostNetwork: false ?), we may need to keep track of more IPs for one cluster ID entry.

Yes that's correct if we take the route of adding the annotation on the Node Object.

Another option would be to keep track of the full entity address (IP + nonce) -- in that case it might make more sense to annotate the VolumeAttachment object instead?

This might not work because the nonce changes if there is any node restarts etc.

@gman0 i started looking at the design considering the csi-addons as below

Problem:-
It is very problematic to depend on the RBD watcher to get the IP that needs to be fenced

Modified Design as below

Add a new RPC for the csiaddons spec to advertise the Ip’s that need to be fenced and csi-addons sidecar will use the Ip’s to update or create the CSIAddonsNode object Rook/admin will use the csiaddons node object to find the IP of the node that needs to be fenced.

Flow Diagram

cephcsi/csi-addons workflow:

image

Rook/User workflow:

  • The admin adds the taint on the Node
  • Rook Already watching for the Node object to get an events
  • Rook get the csiaddons Node object
  • Rook creates the network fence for the client Ip’s presence in the csiaddons Node object

Work Items:
Spec

  • A new RPC needs to be added to the csi addons spec to get the client IP address

Csi-addon sidecar

  • Functionality to fetch and update the csiaddons object periodically

Rook

  • Fetch the csi-addons node object to figure out the IP that needs to be fenced.

Open questions:

  • If the cephcsi pod is deleted from the node forcefully by the user the csiaddons object will get deleted can this create a problem
    • In most cases, we might not get into the problem but this is possible as the user deletes the resources manually as a cleanup
      • Yes, it's possible because Rook will not get the IP address.
      • Another solution is to not delete the csi-addons node object but rather add manual steps to garbage collect the objects that can avoid the problem.

@gman0 this is just a rough idea but still, we are looking at the different possibilities and looking for is there any better solution for this one.

@gman0
Copy link
Contributor Author

gman0 commented Oct 21, 2024

Would it make sense to tie client address info to an object that already has its lifecycle defined? Why use CSIAddonsNode?

This might not work because the nonce changes if there is any node restarts etc.

If the node restarts, there are no more clients to be blocklisted on that node anyway, so that should be fine. The volumes would remount, at which point the client addresses would refresh. Does this sound ok?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 21, 2024

Would it make sense to tie client address info to an object that already has its lifecycle defined? Why use CSIAddonsNode?

The problem with updating the core objects by external entities is problematic ( i have seen few cases) where we added some annotation on the node/namespace, the operator/entity managing the object removed it as its not native (core) kubernetes annotation, that why I was trying to avoid it but we can try to update both csiAddons Node and kubernetes node object as well. With this, i was also trying to keep specific CO-related code from cephcsi and relay on csi-addons.

This might not work because the nonce changes if there is any node restarts etc.

If the node restarts, there are no more clients to be blocklisted on that node anyway, so that should be fine. The volumes would remount, at which point the client addresses would refresh. Does this sound ok?

Yes that correct, i was thinking about a specific case that i don't recall exactly this is correct that nodeStage happens after Node restart.

Another option would be to keep track of the full entity address (IP + nonce) -- in that case it might make more sense to annotate the VolumeAttachment object instead?

This is another option which we can think about but we still need to think about avoid kubernetes specific code in cephcsi.

Let me think about all the different possibilities for sometime and get back on this one.

@gman0
Copy link
Contributor Author

gman0 commented Oct 24, 2024

The problem with updating the core objects by external entities is problematic ( i have seen few cases) where we added some annotation on the node/namespace, the operator/entity managing the object removed it as its not native (core) kubernetes annotation,

Is there a tracker for that discussion, or general discussion about CSIAddonsNode CRD? I don't know the whole context, but I'm wondering if it wouldn't be simpler to document these ceph-csi "well-known" annotations (or CSI Addons), and users would then configure their annotation-cleaning operators to make exemptions for them. There is a lot of other tools that already store their metadata in annotations, it seems like common practice.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 24, 2024

The problem with updating the core objects by external entities is problematic ( i have seen few cases) where we added some annotation on the node/namespace, the operator/entity managing the object removed it as its not native (core) kubernetes annotation,

Is there a tracker for that discussion, or general discussion about CSIAddonsNode CRD? I don't know the whole context, but I'm wondering if it wouldn't be simpler to document these ceph-csi "well-known" annotations (or CSI Addons), and users would then configure their annotation-cleaning operators to make exemptions for them. There is a lot of other tools that already store their metadata in annotations, it seems like common practice.

The context is that adding Node objects might not be suitable for 2 reasons. 1. Its always problematic to have update access to the Node object as it can cause problem 2. The annotation we are planning to add is cephcsi specific and as its not standard annotation it can easily reconciled as its not kubernetes/openshift specific. am not sure the platform is going make exceptions for this one as its not a standard way.

CSI-Addons is a specific framework we have developed (to be used by storage agnostic functionalities) https://github.com/csi-addons/kubernetes-csi-addons , Each pods in daemonsets will have a sidecar container running which creates the csiaddonNode object as its specific to extra functionality we can add this as a new field in the csiAddons node object.

we already have a pre-defined CR for the network fence https://github.com/csi-addons/kubernetes-csi-addons/blob/main/docs/networkfence.md which is already implemented in cephcsi.

@nixpanic @ShyamsundarR thoughts?

@nixpanic
Copy link
Member

Detecting the IP-address(es?) in a CSI-Addons call, and publishing it in the CSIAddonsNode CR would be pretty clean.

In addition to the current FenceController Service, a FenceNode Service could be added. The FenceNode Service should only have a call to identify itself, and the CSI-Addons sidecar that creates the CSIAddonsNode CR can inject the IP-address.

The difficulty seems to me that it is unknown what cluster/credentials are used for connecting, and therefore detecting the IP-address is not easy either. If there would be a NetworkFenceClass CRD things would be easier to trigger from the kubernetes-csi-addons operator/controller.

It is also possible that different clusters are used with a single Ceph-CSI deployment. The IP-address needs to be identifiable based on the cluster as well.

Maybe someone has a better idea that is simpler?

@gman0
Copy link
Contributor Author

gman0 commented Oct 28, 2024

@nixpanic

The difficulty seems to me that it is unknown what cluster/credentials are used for connecting, and therefore detecting the IP-address is not easy either.

Yes, that's why I suggested retrieving the client address during NodeStageVolume RPC.

An alternative would be for the FenceNode Service to watch StorageClasses with matching provisioner field, retrieve the Secrets based on the csi.storage.k8s.io/node-stage-secret-name{,space} reference, find out the client address using that, and store the (node ID , cluster ID , client address) triple somewhere. This would leave out ceph-csi completely unaware of this functionality, which seems nicer indeed.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 29, 2024

An alternative would be for the FenceNode Service to watch StorageClasses with matching provisioner field, retrieve the Secrets based on the csi.storage.k8s.io/node-stage-secret-name{,space} reference, find out the client address using that, and store the (node ID , cluster ID , client address) triple somewhere. This would leave out ceph-csi completely unaware of this functionality, which seems nicer indeed.

@gman0 i think we are in the same idea where instead of watching for storageclasses we are going to have a csiaddons specific CRD like NetworkFenceClass which contains the provisioner and secret and also the clusterID to point to the monitors

apiVersion: csiaddons.openshift.io/v1alpha1
kind: NetworkFenceClass
metadata:
  name: ceph-nw-fence
spec:
  provisioner: rbd.ceph.csi.com
  parameters:
    clusterID: xyz
    csiaddons.openshift.io/network-fence-secret-name: secret
    csiaddons.openshift.io/network-fence-secret-namespace: default

csi-addons controller will watch for this CR(CR per ceph cluster) and send a request to all the registered csi-addons sidecars(drivers) to fetch the client address in an interval of time and update the status of the csi-addonsNode object CR with the information about clusterID and the client addresses. with this, we will not check for the client address during any CSI specific operation but rather check for it during csi-addons particular operations. This keeps things simple and Storageclass can get deleted right after the PVC is created and this implementation keeps the NetworkFence coupled with this CR and a specific few objects to watch for and keeps it simple.

@gman0
Copy link
Contributor Author

gman0 commented Oct 29, 2024

Sounds good. One advantage that retrieving the address inside NodeStageVolume was that we could get the full entity address (i.e. IP+nonce), but AFAIK we have a consensus on blocklisting the whole node anyway, so that's not an issue. @Madhu-1 let me know if you'd like me to help on PRs or reviews, thanks.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 30, 2024

@gman0 Thanks for the offering , sure i will tag you once i have PR in cephcsi ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependency/csi-addons Related to a CSI-Addons issue or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants