-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alerting: Do not store series values from past evaluations in state manager for no reason #87525
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
c47b011
to
ad83b8c
Compare
ad83b8c
to
142ac24
Compare
/deploy-to-hg |
|
|
yuri-tceretian
approved these changes
May 9, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement. It's long overdue! LGTM
JacobsonMT
approved these changes
May 9, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
3 tasks
alexweav
added a commit
that referenced
this pull request
May 14, 2024
…in state manager for no reason (#87845) Alerting: Do not store series values from past evaluations in state manager for no reason (#87525) Do not store previous execution results on states (cherry picked from commit a6a9ab4) Co-authored-by: Alexander Weaver <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
add to changelog
area/alerting
Grafana Alerting
area/backend
backport v11.0.x
Mark PR for automatic backport to v11.0.x
type/bug
type/performance
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is this feature?
States in the state manager store the results of several prior evaluations. This includes values, which is the measured dataframe results for that series:
grafana/pkg/services/ngalert/state/manager.go
Lines 361 to 368 in 6e4d35e
But, the only code that ever reads this field, only looks at the most recent 1 evaluation. Any values in that slice beyond the last one are stored for no reason:
grafana/pkg/services/ngalert/state/state.go
Lines 467 to 471 in 6e4d35e
The state manager is storing every measured value for every series in every query, including intermediate dataframes. Basically, all queried dataframes for all rules, for multiplpe evaluations in the past, needlessly.
The length of the history is based on
For
- and can be raised arbitrarily high with a longfor
. Even for rules with afor
of 0, the length is hardcoded to 10 evaluations. For a typical rule with 3 query nodes, this results in 30 frames worth of series kept in memory (spread across states) when we only use 3 of them.By creating a long
For
with many dimensions, you can write a rule that eventually consumes arbitrary memory and OOMs grafana provided the process does not restart.This is a truly incredible amount of data in some cases on instances even with simpler alerting usage it can quickly dominate the entire memory space of unified alerting.
Why do we need this feature?
Likely, most memory consumption of unified alerting across the board is occupied by this throwaway data.
This PR simply stores the most recent single evaluation, rather than a running slice of them. The result is transparent to users, but we stop hanging on to data that we won't use.
Which issue(s) does this PR fix?:
n/a
Special notes for your reviewer:
Please check that: