- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable
- (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests for meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- "Implementation History" section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Users can leverage the VolumeSnapshot
feature, which GA'd in Kubernetes 1.20, to
create a PersistentVolumeClaim
or PVC
from a previously taken VolumeSnapshot.
This is done by pointing the Spec.dataSource
parameter of the PVC
to an existing
VolumeSnapshot
instance. There is no logic that validates whether the original
volume mode of the PVC
, whose snapshot was taken, matches the volume mode of
the newly created PVC
that is being created from the existing VolumeSnapshot
.
This KEP proposes a solution to prevent unauthorized conversion of the volume
mode during such an operation.
Malicious users may expose a vulnerability in the kernel by exploiting this gap. Here is an example of how a malicious user can exploit this gap to crash the kernel.
- User creates a
PVC
withvolumeMode: Block
and runs a pod with it. - User writes malformed ext4 data to it (simple dd)
- User takes snapshot of this volume.
- User creates a
PVC
withvolumeMode: Filesystem
from the above snapshot. - User uses this
PVC
in a pod.- kubelet tries to mount it during pod creation. If there is a CVE in the kernel, the user can crash it.
Note that, as of this writing, there is no known CVE in the kernel that a malicious user can exploit. However CVE's are regularly discovered that affect filesystems. For example https://access.redhat.com/security/cve/cve-2020-12655 allows an attacker to trigger a DoS attack on the kernel. This proposal aims to prevent a security vulnerability in the event that a CVE is discovered.
We cannot simply block this operation as some backup vendors try to create a volume with the exact same mode as the original volume but may need to do the conversion for efficiency. An example workflow of a backup vendor could look like:
- Assume the original
PVC
is created withvolumeMode: Filesystem
. - During backup, the backup software will create a
PVC
from aVolumeSnapshot
withvolumeMode: Block
. This steps needs volume mode conversion. The purpose of creating thisPVC
with block mode is to be able to copy data efficiently and save it to a backup target. - The
PVC
created in the previous step is temporary and will be deleted after data is copied. - Finally at restore time, another
PVC
will be created withvolumeMode: Filesystem
.
Define a mechanism to mitigate the vulnerability of restoring volumes without hampering valid use cases.
Design that is generic and can be extended to other storage related security aspects.
The proposal aims to mitigate this issue by modifying the VolumeSnapshotContent
API spec as well as the control flows of snapshot-controller
and external-provisioner
.
VolumeSnapshotContent
API will include a field that denotes the volume mode of
the volume that the snapshot was created from.
This proposal also introduces a new annotation on the VolumeSnapshotContent
resource
that a trusted user (like a backup software) needs to apply on a VolumeSnapshot.
By introducing these changes, we will leverage existing user access rights to determine
whether the volume mode of a volume can be altered when a PVC
is being created
from a VolumeSnapshot
.
When a VolumeSnapshot
is created from an existing PVC
, a corresponding
VolumeSnapshotContent
is created by the snapshot-controller
.
Alternatively, a VolumeSnapshotContent
can be manually created by an admin
if the Spec.Source.SnapshotHandle
refers to a pre-existing snapshot on the
underlying storage system. In either case, VolumeSnapshots
and VolumeSnapshotContents
maintain a 1:1 mapping.
Backup vendors that need to convert the volume mode when creating a PVC
need to identify the VolumeSnapshotContent
mapped to the VolumeSnapshot
from which the PVC
is being created.
Either through software or via manual intervention, the annotation
snapshot.storage.kubernetes.io/allow-volume-mode-change: true
needs to be applied
to the VolumeSnapshotContent
. If the backup software is a privileged user,
it will have Update
and Patch
permissions on VolumeSnapshotContents
.
Then the backup software can continue with the operation by creating a PVC
with Spec.DataSource
pointing to the VolumeSnapshot
instance.
Here is an example of how this change prevents a malicious user from exploiting this vulnerability.
- User creates a
PVC
withvolumeMode: Block
and runs a pod with it. - User writes malformed ext4 data to it (simple dd)
- User takes snapshot of this volume.
- User attempts to create a
PVC
withvolumeMode: Filesystem
from the snapshot.- This is blocked as the user does not have
Update
orPatch
permissions onVolumeSnapshotContent
resources.
- This is blocked as the user does not have
A new out-of-tree flag named PreventVolumeModeConversion
will be introduced on
snapshot-controller
and csi-provisioner
. Both of these components are
out-of-tree so this proposal will not require any in-tree feature gates.
With this design, we will introduce two new changes to the VolumeSnapshotContent API:
- A new optional field, called
SourceVolumeMode
will be added to theSpec
ofVolumeSnapshotContents
. This field will be immutable.
type VolumeSnapshotContentSpec struct {
...
// SourceVolumeMode is the mode of the volume whose snapshot is taken.
// Can be either “Filesystem” or “Block”.
// If left empty, will be treated as “Unknown”.
// +optional
SourceVolumeMode *SourceVolumeMode
...
- A new annotation to
VolumeSnapshotContent
objects. The onus is on the backup vendor (via s/w or manually) to add this annotation to theVolumeSnapshotContent
if they intend to alter the volume mode. TheVolumeSnapshotContent
must look like below after this change:
kind: VolumeSnapshotContent
metadata:
annotations:
- snapshot.storage.kubernetes.io/allow-volume-mode-change: "true"
...
There are two cases to consider:
- Dynamic Provisioning
VolumeSnapshot
is created by the user, withVolumeSnapshotClass
optionally specified in the spec.VolumeSnapshotContent
is created by thesnapshot-controller
in response to (i).snapshot-controller
populates theSpec
of the givenVolumeSnapshotContent
.- With this change, the controller will fetch the
Spec.PersistentVolumeMode
of thePV
and add that to newly introducedSpec.SourceVolumeMode
field of the VolumeSnapshotContent to be created.
- With this change, the controller will fetch the
- Static Provisioning
VolumeSnapshotContent
is created by the admin. With this change, the admin will be expected to fill theSpec.SourceVolumeMode
field appropriately. If left nil,Unknown
mode will be assumed to preserve existing behavior.
This design leverages the access rights of a user on VolumeSnapshotContents
to
determine whether the volume mode can be modified when a PVC
is being created
with a VolumeSnapshot
as the source.
The volume mode can be altered if the requesting user has Update
and Patch
rights
on VolumeSnapshotContents
(which is a cluster scoped resource).
The control flow for creating a PVC
from a VolumeSnapshot
will look like below:
- A user attempts to create a
PVC
from aVolumeSnapshot
by specifying theSpec.DataSource
parameter of thePVC
YAML. external-provisioner
receives a callback to dynamically create the volume. As part of the preprocessing steps, it will:- Get the
Spec.SourceVolumeMode
of theVolumeSnapshotContent
.- If
Spec.SourceVolumeMode
doesn't exist or is nil, then continue with volume provisioning to preserve existing behavior.
- If
- Get the
Spec.VolumeMode
of thePVC
being created. If they do not match:- Get all annotations on the
VolumeSnapshotContent
and verify ifsnapshot.storage.kubernetes.io/allow-volume-mode-change: true
exists. If it does not exist, block volume provisioning by returning an error.
- Get all annotations on the
- Get the
- In all other cases, let volume provisioning continue.
NOTE: external-provisioner
maintains a reference to PVC
and VolumeSnapshotContent
during volume creation. This proposal leverages those references to make additional
decisions.
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
None. New E2E tests will be added for the transition to beta.
The unit tests were added to the CSI external-provisioner repo.
- No integration tests added.
The feature flag will be enabled for e2e tests. The tests will attempt to convert volume
mode when creating a PVC
from a VolumeSnapshot
:
- With
Spec.SourceVolumeMode
populated andsnapshot.storage.kubernetes.io/allow-volume-mode-change: true
annotation present - https://github.com/kubernetes-csi/external-provisioner/pull/867/files: https://testgrid.k8s.io/sig-storage-csi-external-provisioner#canary - With
Spec.SourceVolumeMode
populated but nosnapshot.storage.kubernetes.io/allow-volume-mode-change: true
annotation - kubernetes-csi/external-provisioner#832: https://testgrid.k8s.io/sig-storage-csi-external-provisioner#canary - With
Spec.SourceVolumeMode
set tonil
- https://github.com/kubernetes-csi/external-provisioner/pull/867/files: https://testgrid.k8s.io/sig-storage-csi-external-provisioner#canary
- Feature implemented behind an out-of-tree feature flag.
- Feedback from users.
- Implementation of unit and e2e tests.
- One release with positive feedback from users.
- Deployed in production and in use by backup software.
- Gone through one kubernetes upgrade.
- Upgrading
external-snapshotter
andexternal-provisioner
withPreventVolumeModeConversion
enabled:
VolumeSnapshots
created after the upgrade will maintain a reference to the source volume mode. Newly createdPVCs
will undergo an additional check before the provisioning is performed on the storage backend.VolumeSnapshots
created before the upgrade will leave the new API field unpopulated.
- Downgrading
external-snapshotter
andexternal-provisioner
withPreventVolumeModeConversion
disabled:
VolumeSnapshots
created prior to the upgrade will still maintain a reference to the source volume mode, butPVCs
can be created from them without the additional check.
This proposal requires changes to three components - VolumeSnapshotContent
API,
external-snapshotter
and external-provisioner
.
If any of the components are not upgraded to a version supporting this feature, then the feature will not work as expected. From an end user perspective, the existing behavior will continue, ie, there will be no check to prevent unauthorized conversion of the volume mode.
- Other
- Describe the mechanism: Out-of-tree flag named
PreventVolumeModeConversion
, which will be enabled inexternal-provisioner
andexternal-snapshotter
. - Will enabling / disabling the feature require downtime of the control
plane?
external-provisioner
andexternal-snapshotter
will need to be restarted for the changes to take effect. This means that there will be a few seconds of downtime until the newer Pods are Running. There will not be any effect on the previously running applications. - Will enabling / disabling the feature require downtime or reprovisioning of a node? No
- Describe the mechanism: Out-of-tree flag named
Yes. Users without requisite privileges cannot alter the volume mode of VolumeSnapshot
when it is being used to create a PVC
. Users with privileges need to add an
annotation to the corresponding VolumeSnapshotContent
instance if they
require the volume mode to be converted.
The default behavior does not make any validations prior to provisioning a volume.
The volume mode can be converted by any user when a PVC
is created from a
VolumeSnapshot
.
Yes. Disabling the feature is supported and will fall back to the existing behavior.
The new behaviour will be re enabled. VolumeSnapshots
created when the feature
was disabled will not have the new capabilities.
We will add unit tests with and without the feature flag enabled. The expectation
is for new fields in VolumeSnapshotContent
to be dropped when the feature flag
is disabled.
Due to the feature gate on the external-provisioner, rolling out this feature does not affect existing Pods that use PVCs. It also does not affect VolumeSnapshots that are created prior to rolling out the feature, ie, the volume mode of an existing VolumeSnapshot can be modified while creating a PVC.
- persistentvolumeclaim_provision_failed_total
Yes. The feature flag was enabled and disabled separately in the csi-provisioner and snapshot-controller.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
If the feature gate is enabled in the external-provisioner and snapshot-controller, this feature will always be in use when creating a PVC from a VolumeSnapshot.
- Events
- Event Reason: ProvisioningFailed
- Event Message: Failed to provision volume with StorageClass "csi-hostpath-sc": error getting handle for DataSource Type VolumeSnapshot by Name new-snapshot-demo: requested volume default/hpvc-restore modifies the mode of the source volume but does not have permission to do so. snapshot.storage.kubernetes.io/allow-volume-mode-change annotation is not present on snapshotcontent snapcontent-8d709f2e-db04-444f-aae2-e17d6c5398dd
We will add new labels to the existing persistentvolumeclaim_provision_failed_total metric for the volume data source and status code. The per-day percentage of calls with error status code <= 1. However the failure will always happen as long as the feature is correctly enabled and the annotations are not applied correctly to VolumeSnapshotContent objects.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name: persistentvolumeclaim_provision_failed_total
- [Optional] Aggregation method:
- Components exposing the metric: external-provisioner
Are there any missing metrics that would be useful to have to improve observability of this feature?
There are no metrics for persistentvolumeclaims created from volumesnapshots. This KEP aims to add those metrics to the external-provisioner.
- [external-provisioner]
- Usage description: Failure events are emitted as events by the external-provisioner.
- Impact of its outage on the feature: Outage of this component will prevent error reporting to users.
- Impact of its degraded performance or high-error rates on the feature: Outage of this component will prevent error reporting to users.
- Usage description: Failure events are emitted as events by the external-provisioner.
This feature adds an event write to the API server when PVC creation is blocked.
This feature adds a new field to the existing VolumeSnapshotContent
API.
No.
The size of VolumeSnapshotContents
will increase as we will introduce a new
field to the API. Also, users will be adding an annotation to individual
objects on a need basis.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
The latency of CSI's CreateVolume
may increase due to this change, when the
Spec.DataSource
field points to a VolumeSnapshot
instance. This is because
there is an additional check to determine whether volume provisioning must
continue. However, this increase is expected to be minimal as there are no new
API calls and the volume spec has already been loaded into memory of the external-provisioner.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
No.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No. This feature does not introduce any resource exhaustive operations.
In case PVC creation is blocked due to this feature, the failure event will not be emitted due to the unavailability of the API server. Users will need to refer to the external-provisioner logs to determine why PVC creation is failing.
There are no other known failure modes.
The user needs to read the logs of the external-provisioner to determine the reason behind why PVC creation is failing.
- 2023-02-06: KEP updated to mark transition to beta off-by-default
- 2023-12-23: KEP updated to mark transition to beta on-by-default
- 2024-01-24: KEP updated to mark transition to stable
Proposal to create a new policy called VolumeSecurityPolicy
, which will be used
to control access for creation of PVCs
.
This proposal also includes an admission controller that prevents PVCs
from being
restored with the wrong volume mode, unless the user that attempts to do so is a
privileged user (as defined by the VolumeSecurityPolicy
).
As part of this proposal, there will be only a single field in the Spec
-
allowVolumeModeModification
, which can be set to true
or false
.
Once a VolumeSecurityPolicy
is created, it must be tied to a user or a service
account, similar to tying a PSP
to a user/service account.
An admission controller will be introduced that intercepts requests to create a
PVC
. In case the PVC
is being restored from a snapshot and is modifying the
volumeMode, it validates that the user requesting the PVC
has the allowed
privileges. If not, the admission controller rejects the PVC
create request.
Rejected as PSP was recently deprecated in lieu of PodSecurityStandards. If we need a standard for storage security, we should follow that approach.
Introduce VolumeSecurityStandards
that enforceable by any mechanism, including
webhooks, similar to PodSecurityStandards
.
We will define two policies as part of this design:
Privileged
- least restrictive policy that allows the widest level of permissions.Restricted
- most restrictive policy that follows security best practices.
A Mode
defines how a violation of the given security policy is handled.
There are three modes:
Enforce
: violations of the policy are not allowed.Audit
: violations trigger an audit annotation, but are otherwise allowed.Warn
: violations trigger a user-facing warning, but are otherwise allowed.
A VolumeSecurityStandard
is applied on a per-namespace basis. This gives an
admin the ability to apply different standards based on the users of a namespace.
An admission controller will be introduced that intercepts requests to create a PVC. The VolumeSecurityStandards will be hardcoded into this admission controller.
Rejected as the solution was too generic for a very specific use case. If and when there are more storage related security aspects that need a generic solution, we can reconsider this approach.
This proposal introduced a new annotation on the VolumeSnapshotClass
object
allowModeConversionForUsers: <comma separated list of allowed users>
.
The above comma separated list of users are set by the admin. They will be allowed
to modify the volume mode when restoring a PVC from a Snapshot.
The annotation allowModeConversionForUsers
will be copied to the VolumeSnapshotContent
by the snapshot-controller
from the VolumeSnapshotClass
.
VolumeSnapshotClass
is cluster-scoped therefore applying this annotation is
restricted to privileged users only.
An admission controller will be introduced that intercepts requests to create a PVC.
Rejected due to issues with immutability of this lists. For example, if a users access is revoked, does the admin need to modify all existing resources that allow this user to modify volume mode? Also there were concerns with introducing a new mechanism for access control.