Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding an alert mechanism into lvm-operator #103

Merged

Conversation

aruniiird
Copy link
Contributor

Adding an alert mixins framework into LVM Operator.
Makefile is modified to generate the prometheus alert yaml file.

Added 'VolumeGroupUsageAtThreshold' alert which will be triggered if
volumegroup usage goes beyond the threshold percentage (default set to 75%)

Signed-off-by: Arun Kumar Mohan [email protected]

nbalacha
nbalacha previously approved these changes Feb 9, 2022
Copy link
Contributor

@nbalacha nbalacha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to just create a static YAML file for this?

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 9, 2022
@nbalacha nbalacha dismissed their stale review February 9, 2022 14:15

Hit approve by mistake.

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2022
@nbalacha
Copy link
Contributor

Please update the commit msg to match the commitlint requirements

Copy link
Member

@iamniting iamniting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing This with @aruniiird I understand that we can create/update the YAML files using the jsonnet tool via just changing the values in the monitoring/config.libsonnet file instead of changing the YAML file directly if we require any change in the future. Also, we can create different YAML files too using this tool for different use cases eg. Managed Services, snow, etc.

So the question to @nbalacha is do we really want to have that flexibility? if not we can create a static file too as you suggested.

If we are gonna stick to the same format we need to make sure that

  • Make target also download jsonnet tool if not already present on the machine.
  • We should use kustomize to use metadata from a different file and spec from a different file as this tool just create only spec and not metadata at all.
  • We should also upload the generated YAML file.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 21, 2022
@aruniiird aruniiird force-pushed the add-alert-mixins-framework branch from ddb5d9e to dcceb04 Compare February 23, 2022 09:37
@iamniting iamniting force-pushed the add-alert-mixins-framework branch 7 times, most recently from b1a071e to c9b6f94 Compare February 24, 2022 09:12
@aruniiird aruniiird force-pushed the add-alert-mixins-framework branch from c9b6f94 to 00b8875 Compare February 24, 2022 10:19
@iamniting iamniting force-pushed the add-alert-mixins-framework branch 4 times, most recently from 3a5e73a to 61aaa69 Compare February 25, 2022 13:01
},
annotations: {
description: 'VolumeGroup is nearing full. Data deletion or VolumeGroup expansion is required.',
message: 'VolumeGroup {{ $labels.device_class }} utilization has crossed 75% percent on node {{ $labels.node }}. Free up some space or expand the VolumeGroup.',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message: 'VolumeGroup {{ $labels.device_class }} utilization has crossed 75% percent on node {{ $labels.node }}. Free up some space or expand the VolumeGroup.',
message: 'VolumeGroup {{ $labels.device_class }} utilization has crossed %(vgUsageThresholdNearFull)0.2f percent on node {{ $labels.node }}. Free up some space or expand the VolumeGroup.',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

},
annotations: {
description: 'VolumeGroup is critically full. Data deletion or VolumeGroup expansion is required.',
message: 'VolumeGroup {{ $labels.device_class }} utilization has crossed 85% percent on node {{ $labels.node }}. Free up some space or expand the VolumeGroup immediately.',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message: 'VolumeGroup {{ $labels.device_class }} utilization has crossed 85% percent on node {{ $labels.node }}. Free up some space or expand the VolumeGroup immediately.',
message: 'VolumeGroup {{ $labels.device_class }} utilization has crossed %(vgUsageThresholdCritical)0.2f percent on node {{ $labels.node }}. Free up some space or expand the VolumeGroup immediately.',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

annotations: {
description: 'VolumeGroup is critically full. Data deletion or VolumeGroup expansion is required.',
message: 'VolumeGroup {{ $labels.device_class }} utilization has crossed 85% percent on node {{ $labels.node }}. Free up some space or expand the VolumeGroup immediately.',
severity_level: 'error',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
severity_level: 'error',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

annotations: {
description: 'VolumeGroup is nearing full. Data deletion or VolumeGroup expansion is required.',
message: 'VolumeGroup {{ $labels.device_class }} utilization has crossed 75% percent on node {{ $labels.node }}. Free up some space or expand the VolumeGroup.',
severity_level: 'warning',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
severity_level: 'warning',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@umangachapagain umangachapagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complexity of generating alerts using mixin seems to be more than the complexity of maintaining a single YAML file. So, for now I'd suggest using just a single static YAML file.
Once it gets fairly complex to manage, you can move to mixins or some other solutions.

@aruniiird aruniiird force-pushed the add-alert-mixins-framework branch from 61aaa69 to 868d797 Compare February 28, 2022 11:34
@aruniiird aruniiird force-pushed the add-alert-mixins-framework branch 2 times, most recently from f3ee12c to 38108b9 Compare February 28, 2022 11:52
@agarwal-mudit
Copy link
Contributor

/cherry-pick release-4.10

@openshift-cherrypick-robot

@agarwal-mudit: once the present PR merges, I will cherry-pick it on top of release-4.10 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aruniiird and others added 3 commits March 1, 2022 11:27
Adding an alert mixins framework into LVM Operator.
Makefile is modified to generate the prometheus alert yaml file.

Added 'VolumeGroupUsageAtThreshold' alert which will be triggered if
volumegroup usage goes beyond the threshold percentage (default set to 75%)

Signed-off-by: Arun Kumar Mohan <[email protected]>
@iamniting iamniting force-pushed the add-alert-mixins-framework branch from 38108b9 to 56f94ed Compare March 1, 2022 05:57
@nbalacha
Copy link
Contributor

nbalacha commented Mar 1, 2022

Complexity of generating alerts using mixin seems to be more than the complexity of maintaining a single YAML file. So, for now I'd suggest using just a single static YAML file. Once it gets fairly complex to manage, you can move to mixins or some other solutions.

We did discuss whether we should just use a static YAML and decided to put that effort in now in order to make it easier in the future.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 1, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 1, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aruniiird, nbalacha, umangachapagain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 1, 2022
@openshift-merge-robot openshift-merge-robot merged commit 53b2b41 into openshift:main Mar 1, 2022
@openshift-cherrypick-robot

@agarwal-mudit: new pull request created: #121

In response to this:

/cherry-pick release-4.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants