Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the snapshot with error status cannot be removed successfully from cinder backend #16

Open
freesky-edward opened this issue Aug 23, 2017 · 3 comments

Comments

@freesky-edward
Copy link

What happened:
when failed to create volume snapshot(error status cause by some reason from backend), I attempted to delete that volume-snapshot, the data in Kubernetes can be deleted successfully, however, the data in Cinder was not deleted successfully.

What you expected to happen:
I expect the snapshot data in backend should also be deleted when deleting a snapshot with error status. so that there is no residual useless data.

How to reproduce it (as minimally and precisely as possible):

  1. create snapshot "kc create -f ../snapshot.yaml"
  2. check snapshot status:
    root@kube-karbor:/opt/kube/kubernetes# kc describe volumesnapshot
    Name: snapshot-demo
    Namespace: default
    Labels:
    Annotations:
    API Version: volume-snapshot-data.external-storage.k8s.io/v1
    Kind: VolumeSnapshot
    Metadata:
    Cluster Name:
    Creation Timestamp: 2017-08-23T07:38:10Z
    Generation: 0
    Resource Version: 16203
    Self Link: /apis/volume-snapshot-data.external-storage.k8s.io/v1/namespaces/default/volumesnapshots/snapshot-demo
    UID: 0322489e-87d6-11e7-a4d5-fa163e1a1ced
    Spec:
    Persistent Volume Claim Name: cinder-claim1
    Snapshot Data Name: k8s-volume-snapshot-03601c96-87d6-11e7-a4c8-fa163e1a1ced
    Status:
    Conditions:
    Last Transition Time: 2017-08-23T07:38:10Z
    Message: Snapshot created succsessfully
    Reason:
    Status: True
    Type: Ready
    Creation Timestamp:
    Events:
    `
  3. check cinder "openstack volume snapshot list"
    +--------------------------------------+-------------------------------------------------------------+---------------------+--------+------+
    | ID | Name | Description | Status | Size |
    +--------------------------------------+-------------------------------------------------------------+---------------------+--------+------+
    | 904bb9d1-00ea-4004-9899-a1c111b7a970 | pvc-1f93e356-87b7-11e7-a4d5-fa163e1a1ced1503473890359873214 | kubernetes snapshot | error | 1 |
    +--------------------------------------+-------------------------------------------------------------+---------------------+--------+------+
  4. delete the snapshot "kc delete volumesnapshot snapshot-demo"
  5. check snapshot in Kubernetes "kc get volumesnapshot"
    No resources found.
  6. check snapshot in Cinder "openstack volume snapshot list"
    +--------------------------------------+-------------------------------------------------------------+---------------------+--------+------+
    | ID | Name | Description | Status | Size |
    +--------------------------------------+-------------------------------------------------------------+---------------------+--------+------+
    | 904bb9d1-00ea-4004-9899-a1c111b7a970 | pvc-1f93e356-87b7-11e7-a4d5-fa163e1a1ced1503473890359873214 | kubernetes snapshot | error | 1 |
    +--------------------------------------+-------------------------------------------------------------+---------------------+--------+------+

Anything else we need to know?:
logs when creating snapshot:
E0823 07:44:33.472653 15606 snapshotter.go:381] Failed to schedule the operation "createdefault/snapshot-democinder-claim1": Failed to create operation with name "createdefault/snapshot-democinder-claim1". An operation with that name failed at 2017-08-23 07:44:18.310211022 +0000 UTC m=+592.603224177. No retries permitted until 2017-08-23 07:46:18.310211022 +0000 UTC m=+712.603224177 (2m0s). Last error: "snapshot is not completed yet: current snapshot status is: error".
E0823 07:44:33.572948 15606 snapshotter.go:381] Failed to schedule the operation "createdefault/snapshot-democinder-claim1": Failed to create operation with name "createdefault/snapshot-democinder-claim1". An operation with that name failed at 2017-08-23 07:44:18.310211022 +0000 UTC m=+592.603224177. No retries permitted until 2017-08-23 07:46:18.310211022 +0000 UTC m=+712.603224177 (2m0s). Last error: "snapshot is not completed yet: current snapshot status is: error".

logs when deleting snapshot:
I0823 07:49:53.722128 15606 snapshot-controller.go:240] [CONTROLLER] OnDelete /apis/volume-snapshot-data.external-storage.k8s.io/v1/namespaces/default/volumesnapshots/snapshot-demo, snapshot name: default/snapshot-demo
I0823 07:49:53.722300 15606 desired_state_of_world.go:83] Deleting snapshot from desired state of world: default/snapshot-demo

Environment:

  • Kubernetes version (use kubectl version): build from master
  • Cloud provider or hardware configuration**: openstack
  • OS (e.g. from /etc/os-release): ubuntu 16.04
  • Kernel (e.g. uname -a): Linux kube-karbor 4.4.0-81-generic #104-Ubuntu SMP Wed Jun 14 08:17:06 UTC 2017 x86_64 x86_64 x86_64
  • Install tools: N/A
  • snapshot: build from master
  • openstack: build from master
@rootfs
Copy link
Owner

rootfs commented Aug 23, 2017

This is a very interesting test case. It points to a case not addressed in snapshot controller.

During create:

  • kubectl create snapshot triggers a VolumeSnapshot object created in dsw
  • reconciler sees a new VolumeSnapshot object in dsw but missing in asw, it then calls snapshotter controllers to create it a VolumeSnapshotData, which represents a snapshot in the backend storage
  • When backend successfully creates the snapshot, the VolumeSnapshotData is bound to the VolumeSnapshot. asw also records this object.

During delete:

  • dsw deletes the snapshot object
  • reconcilers sees a snapshot in asw but not in dsw, it calls snapshotter to delete the snapshot in the backend storage

Now in this test case, snapshot was not created, hence it was not added to asw. As a result, reconciler didn't call snapshotter to delete the snapshot in the backend storage.

@xing-yang has made changes to address some error paths. We'll fix this.

@freesky-edward thanks for providing the info

@xing-yang
Copy link
Collaborator

Thanks @freesky-edward for reporting this bug. I'm working on improving the error handling. Will take this test case into account.

@freesky-edward
Copy link
Author

@rootfs
yeah, that's the cause , the volume data was not found in asw, so there was no request from reconciler. actually, Cinder can delete that data by command.
@xing-yang thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants