Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFBUGS-1654: [release-4.18] Delete the succeeded pods with duplicate tolerations to avoid alert #3056

Conversation

malayparida2000
Copy link
Contributor

PrometheusDuplicateTimestamps alert is generated due to the presence of duplicate tolerations on the osd-prepare job pods & osd-key-rotation cronjob pods. Although the root of the issue has been fixed with another fix, the existing succeeded pods need to be cleaned up to stop the alert.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid jira ticket of any type jira/invalid-bug Indicates that the referenced jira bug is invalid for the branch this PR is targeting labels Feb 24, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 24, 2025

@malayparida2000: This pull request references [Jira Issue DFBUGS-1654](https://issues.redhat.com//browse/DFBUGS-1654), which is invalid:

  • expected the bug to target the "odf-4.18" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

PrometheusDuplicateTimestamps alert is generated due to the presence of duplicate tolerations on the osd-prepare job pods & osd-key-rotation cronjob pods. Although the root of the issue has been fixed with another fix, the existing succeeded pods need to be cleaned up to stop the alert.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@malayparida2000
Copy link
Contributor Author

/retest

@malayparida2000 malayparida2000 force-pushed the succeded_pod_delete_4_18 branch 2 times, most recently from 58df7b6 to c79ab1c Compare February 25, 2025 05:39
@malayparida2000
Copy link
Contributor Author

/hold for testing

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 25, 2025
@malayparida2000 malayparida2000 force-pushed the succeded_pod_delete_4_18 branch 3 times, most recently from aeb74fb to ac4b95f Compare February 25, 2025 06:52
PrometheusDuplicateTimestamps alert is generated due to the presence of
duplicate tolerations on the osd-prepare job pods & osd-key-rotation
cronjob pods. Although the root of the issue has been fixed with another
fix, the existing succeeded pods need to be cleaned up to stop the
alert.

Signed-off-by: Malay Kumar Parida <[email protected]>
@malayparida2000 malayparida2000 force-pushed the succeded_pod_delete_4_18 branch from ac4b95f to 197133b Compare February 25, 2025 07:11
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 25, 2025
Copy link
Contributor

openshift-ci bot commented Feb 25, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iamniting, malayparida2000

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 25, 2025
@malayparida2000
Copy link
Contributor Author

Testing result-

{"level":"info","ts":"2025-02-25T09:21:08Z","logger":"controllers.StorageCluster","msg":"Deleting pod with duplicate tolerations","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","pod":"rook-ceph-osd-prepare-ocs-deviceset-0-data-0xb5s5-bfgv7"}
{"level":"info","ts":"2025-02-25T09:21:08Z","logger":"controllers.StorageCluster","msg":"Deleting pod with duplicate tolerations","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","pod":"rook-ceph-osd-prepare-ocs-deviceset-2-data-07bc4n-tfdgl"}
{"level":"info","ts":"2025-02-25T09:21:08Z","logger":"controllers.StorageCluster","msg":"Deleting pod with duplicate tolerations","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","pod":"rook-ceph-osd-prepare-ocs-deviceset-1-data-05cdqr-h859p"}

osd-prepare pods with duplicate tolerations have been cleaned up.

@malayparida2000
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 25, 2025
@malayparida2000
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that the referenced jira bug is valid for the branch this PR is targeting label Feb 26, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 26, 2025

@malayparida2000: This pull request references [Jira Issue DFBUGS-1654](https://issues.redhat.com//browse/DFBUGS-1654), which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (odf-4.18) matches configured target version for branch (odf-4.18)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that the referenced jira bug is invalid for the branch this PR is targeting label Feb 26, 2025
@iamniting
Copy link
Member

/override ci/prow/ci-bundle-ocs-operator-bundle
/override ci/prow/images
/override ci/prow/ocs-operator-bundle-e2e-aws

Copy link
Contributor

openshift-ci bot commented Feb 26, 2025

@iamniting: Overrode contexts on behalf of iamniting: ci/prow/ci-bundle-ocs-operator-bundle, ci/prow/images, ci/prow/ocs-operator-bundle-e2e-aws

In response to this:

/override ci/prow/ci-bundle-ocs-operator-bundle
/override ci/prow/images
/override ci/prow/ocs-operator-bundle-e2e-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 46c97ac into red-hat-storage:release-4.18 Feb 26, 2025
11 checks passed
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 26, 2025

@malayparida2000: [Jira Issue DFBUGS-1654](https://issues.redhat.com//browse/DFBUGS-1654): All pull requests linked via external trackers have merged:

[Jira Issue DFBUGS-1654](https://issues.redhat.com//browse/DFBUGS-1654) has been moved to the MODIFIED state.

In response to this:

PrometheusDuplicateTimestamps alert is generated due to the presence of duplicate tolerations on the osd-prepare job pods & osd-key-rotation cronjob pods. Although the root of the issue has been fixed with another fix, the existing succeeded pods need to be cleaned up to stop the alert.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@malayparida2000
Copy link
Contributor Author

/cherry-pick release-4.17

@openshift-cherrypick-robot

@malayparida2000: #3056 failed to apply on top of branch "release-4.17":

Applying: Delete the succeeded pods with duplicate tolerations to avoid alert
Using index info to reconstruct a base tree...
M	controllers/storagecluster/cephcluster.go
M	controllers/util/k8sutil.go
A	metrics/vendor/github.com/red-hat-storage/ocs-operator/v4/controllers/util/k8sutil.go
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): metrics/vendor/github.com/red-hat-storage/ocs-operator/v4/controllers/util/k8sutil.go deleted in HEAD and modified in Delete the succeeded pods with duplicate tolerations to avoid alert. Version Delete the succeeded pods with duplicate tolerations to avoid alert of metrics/vendor/github.com/red-hat-storage/ocs-operator/v4/controllers/util/k8sutil.go left in tree.
Auto-merging controllers/util/k8sutil.go
Auto-merging controllers/storagecluster/cephcluster.go
CONFLICT (content): Merge conflict in controllers/storagecluster/cephcluster.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Delete the succeeded pods with duplicate tolerations to avoid alert

In response to this:

/cherry-pick release-4.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that the referenced jira bug is valid for the branch this PR is targeting jira/valid-reference Indicates that this PR references a valid jira ticket of any type lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants