You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which component are you using?:
/area cluster-autoscaler
What version of the component are you using?:
v1.26.2
Component version:
What k8s version are you using (kubectl version)?:
1.30.0
kubectl version Output
$ kubectl version
What environment is this in?:
AWS
What did you expect to happen?:
Cluster autoscaler to scale down nodes when no longer needed
What happened instead?:
Cluster autoscaler was getting OOMKilled (We where using about ~340 nodes, with 0.6G memory limit).
When a new node of cluster autoscaler gets created, the node cooldown before scaling down gets back to 0s, Cluster Autoscaler was continuously getting OOMKilled after 1 loop and thus never managing to scale down nodes.
How to reproduce it (as minimally and precisely as possible):
Install Cluster Autoscaler with a memory limit, scale up nodes within the existing node groups until it gets OOMKilled .
Anything else we need to know?:
Getting OOMKilled from ~340 nodes with 0.6G ram limit is very surprising, but the thing that made this bug truly devastating, is that is was able to run 1 loop, which could've scaled down the nodes, but the timer had restarted, this makes me question the HA-ness of cluster autoscaler. I'd like to submit a fix where this data is added as an annotation to the node itself, thus making the deployment stateless, in that regard.
relevant configs are that the memory limit was 600mb, and that cooldown on down scaling was 10m.
The text was updated successfully, but these errors were encountered:
Which component are you using?:
/area cluster-autoscaler
What version of the component are you using?:
v1.26.2
Component version:
What k8s version are you using (
kubectl version
)?:1.30.0
kubectl version
OutputWhat environment is this in?:
AWS
What did you expect to happen?:
Cluster autoscaler to scale down nodes when no longer needed
What happened instead?:
Cluster autoscaler was getting OOMKilled (We where using about ~340 nodes, with 0.6G memory limit).
When a new node of cluster autoscaler gets created, the node cooldown before scaling down gets back to 0s, Cluster Autoscaler was continuously getting OOMKilled after 1 loop and thus never managing to scale down nodes.
How to reproduce it (as minimally and precisely as possible):
Install Cluster Autoscaler with a memory limit, scale up nodes within the existing node groups until it gets OOMKilled .
Anything else we need to know?:
Getting OOMKilled from ~340 nodes with 0.6G ram limit is very surprising, but the thing that made this bug truly devastating, is that is was able to run 1 loop, which could've scaled down the nodes, but the timer had restarted, this makes me question the HA-ness of cluster autoscaler. I'd like to submit a fix where this data is added as an annotation to the node itself, thus making the deployment stateless, in that regard.
relevant configs are that the memory limit was 600mb, and that cooldown on down scaling was 10m.
The text was updated successfully, but these errors were encountered: