Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFBUGS-1656: [release-4.18] core: add back the CSI_DISABLE_HOLDER_PODS key on the cm #3050

Merged

Conversation

parth-gr
Copy link
Member

@parth-gr parth-gr commented Feb 20, 2025

If the OCS operator is upgraded before the rook operator, It will go and remove the CSI_DISABLE_HOLDER_PODS key from the ocs-configmap, Removing the key from the cm will re-start the rook operator pods,

As Rook has not been upgraded and is still on 4.17, it looks for the env variable value which was coming from ocs-configmap key, And as the key is deleted rook operator got stuck.

Revert of the pr #2609

And Revert the fix in 4.19.

Signed-off-by: parth-gr [email protected]
(cherry picked from commit 843bd4c)

@malayparida2000 malayparida2000 changed the title core: add back the CSI_DISABLE_HOLDER_PODS key on the cm [release-4.18] core: add back the CSI_DISABLE_HOLDER_PODS key on the cm Feb 20, 2025
@malayparida2000 malayparida2000 changed the title [release-4.18] core: add back the CSI_DISABLE_HOLDER_PODS key on the cm DFBUGS-1656: [release-4.18] core: add back the CSI_DISABLE_HOLDER_PODS key on the cm Feb 20, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid jira ticket of any type jira/valid-bug Indicates that the referenced jira bug is valid for the branch this PR is targeting labels Feb 20, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 20, 2025

@parth-gr: This pull request references [Jira Issue DFBUGS-1656](https://issues.redhat.com//browse/DFBUGS-1656), which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (odf-4.18) matches configured target version for branch (odf-4.18)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

If the OCS operator is upgraded before the rook operator, It will go and remove the CSI_DISABLE_HOLDER_PODS key from the ocs-configmap, Removing the key from the ocs-cm will re-start the rook operator pods,

As Rook has not been upgraded and is still on 4.17, it looks for the env variable value which was coming from ocs-configmap key, And as the key is deleted rook operator got stuck.

Revert of the pr #2609

And Revert the fix in 4.19.

Signed-off-by: parth-gr [email protected]
(cherry picked from commit 843bd4c)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

If the OCS operator is upgraded before the rook operator,
It will go and remove the CSI_DISABLE_HOLDER_PODS key
from the ocs-configmap,
Removing the key from the ocs-cm will re-start
the rook operator pods,

As Rook has not been upgraded and is still on 4.17,
it looks for the env variable
value which was coming from ocs-configmap key,
And as the key is deleted
rook operator got stuck.

Revert of the pr red-hat-storage#2609

And Revert the fix in 4.19.

Signed-off-by: parth-gr <[email protected]>
(cherry picked from commit 843bd4c)
Signed-off-by: parth-gr <[email protected]>
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 20, 2025

@parth-gr: This pull request references [Jira Issue DFBUGS-1656](https://issues.redhat.com//browse/DFBUGS-1656), which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (odf-4.18) matches configured target version for branch (odf-4.18)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

If the OCS operator is upgraded before the rook operator, It will go and remove the CSI_DISABLE_HOLDER_PODS key from the ocs-configmap, Removing the key from the cm will re-start the rook operator pods,

As Rook has not been upgraded and is still on 4.17, it looks for the env variable value which was coming from ocs-configmap key, And as the key is deleted rook operator got stuck.

Revert of the pr #2609

And Revert the fix in 4.19.

Signed-off-by: parth-gr [email protected]
(cherry picked from commit 843bd4c)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@malayparida2000 malayparida2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 20, 2025
@agarwal-mudit
Copy link
Member

/cc @subhamkrai

@openshift-ci openshift-ci bot requested a review from subhamkrai February 20, 2025 10:59
@agarwal-mudit
Copy link
Member

@parth-gr
Did you test the fresh install also or only upgrade?
Also, it will not impact the holder pod functionality?

@agarwal-mudit
Copy link
Member

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 20, 2025
@subhamkrai
Copy link
Contributor

@parth-gr changes looks good to me but please test with upgrade case. Thanks

@malayparida2000
Copy link
Contributor

@parth-gr Also, it will not impact the holder pod functionality?

No as in 4.18 rook CSV doesn’t reference to this Key after upgrade, so it just stays in the ocs-operator-config CM as a placeholder with no suage after upgrade to 4.18 is complete.

@parth-gr
Copy link
Member Author

Did you test the fresh install also or only upgrade?
Also, it will not impact the holder pod functionality?

@agarwal-mudit the holder pod is already deprecated rook/rook#14819 that's why this key was removed,

For testing:

  1. On a 4.18 cluster -> try adding this empty key -> it worked
  2. Tried a 4.17 to 4.18 upgrade

changes looks good to me but please test with upgrade case

We cant control the OLM to upgrade the ocs operator first, so did 2 upgrades with @malayparida2000 but in those cases rook operator only upgraded first

@malayparida2000
Copy link
Contributor

malayparida2000 commented Feb 20, 2025

Did you test the fresh install also or only upgrade?

We have tested the upgrade scenarior by replacing our fix image in a affected cluster.
And we have not directly tested the fresh install but we did some testing by scaling down deployments & editing the CM to see how it would behave upon fresh install

@malayparida2000
Copy link
Contributor

We cant control the OLM to upgrade the ocs operator first, so did 2 upgrades with @malayparida2000 but in those cases rook operator only upgraded first

Yeah it's very difficult to reproduce this issue we had 2 extra clusters and in both rook operator upgraded first so we couldn't hit the issue. We did test our image in the affected cluster we had from QE & that worked there.

@parth-gr
Copy link
Member Author

parth-gr commented Feb 20, 2025

Upgrade testing(mirrored the scenario where upgrade was failing):

  1. We scale down the ocs -operator
  2. changed the ocs cm to set CSI_DISABLE_HOLDER_PODS: ""
  3. Restarted the rook operator, 4.17 Rook operator was still running fine, with reconcile failures(unable to parse "" empty string )
  4. UPgraded the cluster, csv was successfully updated to 4.18 as the 4.17 rook operator was up and running.

@agarwal-mudit agarwal-mudit removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 20, 2025
@malayparida2000 malayparida2000 added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 20, 2025
Copy link
Contributor

openshift-ci bot commented Feb 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: malayparida2000, parth-gr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 1f99ab6 into red-hat-storage:release-4.18 Feb 20, 2025
11 checks passed
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 20, 2025

@parth-gr: [Jira Issue DFBUGS-1656](https://issues.redhat.com//browse/DFBUGS-1656): All pull requests linked via external trackers have merged:

[Jira Issue DFBUGS-1656](https://issues.redhat.com//browse/DFBUGS-1656) has been moved to the MODIFIED state.

In response to this:

If the OCS operator is upgraded before the rook operator, It will go and remove the CSI_DISABLE_HOLDER_PODS key from the ocs-configmap, Removing the key from the cm will re-start the rook operator pods,

As Rook has not been upgraded and is still on 4.17, it looks for the env variable value which was coming from ocs-configmap key, And as the key is deleted rook operator got stuck.

Revert of the pr #2609

And Revert the fix in 4.19.

Signed-off-by: parth-gr [email protected]
(cherry picked from commit 843bd4c)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that the referenced jira bug is valid for the branch this PR is targeting jira/valid-reference Indicates that this PR references a valid jira ticket of any type lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants