-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruler: v0.25.2 no query API server unreachable #5321
Comments
The configuration of Thanos Ruler:
There was no change in the config. Just the version change from v0.25.2 to 0.24.0 fixed the problem. |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
I'm seeing the same issue after upgrading to v0.29.0, but couple of findings that I have is, Additional info from the logs are,
Also, can someone help me in understanding if all rules are being executed simultaneously? |
Hi Team, 5903 as per the suggestion, we have upgraded to 0.29.0, since then we are seeing this issue, is there any workaround or could you please help on how to deal about this issue? |
@bwplotka Can I know if this issue is addressed in version 0.30.0? or any pointers on this issue would be helpful. |
@bwplotka we have the same problem with 0.30.0 ruler. |
@bwplotka it looks like that |
Hey @Cellebyte I'm having the same issue, could you clarify better how you fixed it? As per Thanos documentation: "It is recommended to keep partial response as abort for alerts and that is the default as well." What exactly did you enable and how? I'm using ThanosRuler CRD if that helps |
@Migueljfs you need to set it to |
We are covering the problem which is mentioned above by an additional alert which checks if our remote query is reachable by using vector(0) or the up metric for the remote cluster. |
We have identified the issue, in our case looks like issue was with one of the prometheus shard, which has used up all the memory and was not responding, on cleaning up of data, which is removing WAL, head_chunks and TSDB ( it may cause data loss) and bringing up the shards clean, it started working. |
did anyone get a fix for the above issue? I have set Can someone help with this issue? or any other version of thanos handling this error? |
having similar issue running thanos |
I changed |
@bwplotka hey 👋 I ran into this issue today as well. I can normally resolve the thanos query address from within thanos ruler container. I am using v0.29 Thanos version via Prometheus operator as well, the configuration seems to be passed correctly to thanos. Any clues or hints on what might be the issue? Thanks! |
Any leads here on what signals to check? Seeing this on 0.37.2 as well |
One user shared that our Rulers were having hiccups with finding the right Qurier endpoints resulting in gaps:
Apparently reverting to v0.24.0 resolved the issue. This seems to be a stateful Ruler.
We will need to have more information e.g:
The text was updated successfully, but these errors were encountered: