Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting: Do not store series values from past evaluations in state manager for no reason #87525

Merged
merged 1 commit into from
May 9, 2024

Conversation

alexweav
Copy link
Contributor

@alexweav alexweav commented May 8, 2024

What is this feature?

States in the state manager store the results of several prior evaluations. This includes values, which is the measured dataframe results for that series:

currentState.Results = append(currentState.Results, Evaluation{
EvaluationTime: result.EvaluatedAt,
EvaluationState: result.State,
Values: NewEvaluationValues(result.Values),
Condition: alertRule.Condition,
})
currentState.LastEvaluationString = result.EvaluationString
currentState.TrimResults(alertRule)

But, the only code that ever reads this field, only looks at the most recent 1 evaluation. Any values in that slice beyond the last one are stored for no reason:

if len(a.Results) <= 0 {
return nil
}
lastResult := a.Results[len(a.Results)-1]

The state manager is storing every measured value for every series in every query, including intermediate dataframes. Basically, all queried dataframes for all rules, for multiplpe evaluations in the past, needlessly.

The length of the history is based on For - and can be raised arbitrarily high with a long for. Even for rules with a for of 0, the length is hardcoded to 10 evaluations. For a typical rule with 3 query nodes, this results in 30 frames worth of series kept in memory (spread across states) when we only use 3 of them.

By creating a long For with many dimensions, you can write a rule that eventually consumes arbitrary memory and OOMs grafana provided the process does not restart.

This is a truly incredible amount of data in some cases on instances even with simpler alerting usage it can quickly dominate the entire memory space of unified alerting.

Why do we need this feature?

Likely, most memory consumption of unified alerting across the board is occupied by this throwaway data.

This PR simply stores the most recent single evaluation, rather than a running slice of them. The result is transparent to users, but we stop hanging on to data that we won't use.

Which issue(s) does this PR fix?:
n/a

Special notes for your reviewer:

Please check that:

  • It works as expected from a user's perspective.
  • If this is a pre-GA feature, it is behind a feature toggle.
  • The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

@alexweav alexweav added this to the 11.1.x milestone May 8, 2024
@alexweav alexweav force-pushed the alexweav/do-not-store-old-results branch from c47b011 to ad83b8c Compare May 8, 2024 18:30
@alexweav alexweav changed the title Alerting: Do not store stale intermediate results in state manager Alerting: Do not store stale values from past evaluations in state manager May 8, 2024
@alexweav alexweav force-pushed the alexweav/do-not-store-old-results branch from ad83b8c to 142ac24 Compare May 8, 2024 19:07
@alexweav
Copy link
Contributor Author

alexweav commented May 8, 2024

/deploy-to-hg

@ephemeral-instances-bot
Copy link

  • Preparing your instance. A comment containing your instance's url will be added to this PR when the instance is ready.
  • Your instance will be ready in ~10 minutes. Follow the workflow progress
  • Slack channel: #proj-ephemeral-hg-instances
  • Building instance with alexweav/do-not-store-old-results oss branch and main enterprise branch. How to choose a branch

@ephemeral-instances-bot
Copy link

@grafana grafana deleted a comment from ephemeral-instances-bot bot May 8, 2024
@grafana grafana deleted a comment from ephemeral-instances-bot bot May 8, 2024
@grafana grafana deleted a comment from ephemeral-instances-bot bot May 8, 2024
@grafana grafana deleted a comment from ephemeral-instances-bot bot May 8, 2024
@grafana grafana deleted a comment from PoorlyDefinedBehaviour May 8, 2024
@grafana grafana deleted a comment from ephemeral-instances-bot bot May 8, 2024
@grafana grafana deleted a comment from ephemeral-instances-bot bot May 8, 2024
@grafana grafana deleted a comment from PoorlyDefinedBehaviour May 8, 2024
@grafana grafana deleted a comment from ephemeral-instances-bot bot May 8, 2024
@grafana grafana deleted a comment from ephemeral-instances-bot bot May 8, 2024
@grafana grafana deleted a comment from ephemeral-instances-bot bot May 8, 2024
@grafana grafana deleted a comment from ephemeral-instances-bot bot May 8, 2024
@alexweav alexweav changed the title Alerting: Do not store stale values from past evaluations in state manager Alerting: Do not store series values from past evaluations in state manager for no reason May 8, 2024
@alexweav alexweav marked this pull request as ready for review May 9, 2024 13:47
@alexweav alexweav requested review from a team and rwwiv and removed request for a team May 9, 2024 13:47
Copy link
Contributor

@yuri-tceretian yuri-tceretian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement. It's long overdue! LGTM

Copy link
Member

@JacobsonMT JacobsonMT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@alexweav alexweav merged commit a6a9ab4 into main May 9, 2024
19 checks passed
@alexweav alexweav deleted the alexweav/do-not-store-old-results branch May 9, 2024 20:51
@alexweav alexweav added the backport v11.0.x Mark PR for automatic backport to v11.0.x label May 14, 2024
grafana-delivery-bot bot pushed a commit that referenced this pull request May 14, 2024
…anager for no reason (#87525)

Do not store previous execution results on states

(cherry picked from commit a6a9ab4)
alexweav added a commit that referenced this pull request May 14, 2024
…in state manager for no reason (#87845)

Alerting: Do not store series values from past evaluations in state manager for no reason (#87525)

Do not store previous execution results on states

(cherry picked from commit a6a9ab4)

Co-authored-by: Alexander Weaver <[email protected]>
@kevinwcyu kevinwcyu modified the milestones: 11.1.x, 11.1.0 Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants