Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not getting debug output for "failed to sync cache: timed out waiting for the condition" errors #1983

Closed
red8888 opened this issue Mar 2, 2021 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@red8888
Copy link

red8888 commented Mar 2, 2021

I set --log-level=debug but the only error I see is "failed to sync cache: timed out waiting for the condition"

Looking at the many other issues with this error it might be the cluster role, but I want to know how to turn on debug level logging.

I see level=debug in the log output but the only error I see is "failed to sync cache: timed out waiting for the condition"

That cant be the debug log level because that error provides no info and I get no other errors explaining what is happening. I'm assuming I have not fully configured debug logging correctly or there is a bug?

@red8888 red8888 added the kind/bug Categorizes issue or PR as related to a bug. label Mar 2, 2021
@SamMousa
Copy link

SamMousa commented Mar 5, 2021

@red8888 are you using cluster roles or not?

@Raffo
Copy link
Contributor

Raffo commented Mar 15, 2021

@Raffo Raffo closed this as completed Mar 15, 2021
@atsai1220
Copy link

This is documented in the FAQ: https://github.com/kubernetes-sigs/external-dns/blob/7569fa4ae537be6e46ddcb2607ea6527be994c7f/docs/faq.md#why-am-i-seeing-time-out-errors-even-though-i-have-connectivity-to-my-cluster . Closing.

In my RKE2 cluster, I added this to my ClusterRoleBinding and this error went away.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: external-dns
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
- kind: ServiceAccount
  name: external-dns
  namespace: external-dns
# New entry below <----------------------
- kind: ServiceAccount
  name: default
  namespace: default

@SGStino
Copy link

SGStino commented Sep 21, 2021

@Raffo, can it at least log what is going wrong somehow, which specific call failed maybe?
get nodes? list services? watch pods?
Because "something is wrong with the permissions" doesn't give a lot of usefull information.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: external-dns
rules:
- apiGroups: [""]
  resources: ["services","endpoints","pods"]
  verbs: ["get","watch","list"]
- apiGroups: ["extensions","networking.k8s.io"]
  resources: ["ingresses"]
  verbs: ["get","watch","list"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: external-dns-viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
- kind: ServiceAccount
  name: external-dns
  namespace: dns
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-dns
  namespace: dns
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns
  namespace: dns
spec:
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: external-dns
  template:
    metadata:
      labels:
        app: external-dns
    spec:
      serviceAccountName: external-dns
      containers:
      - name: external-dns
        image: k8s.gcr.io/external-dns/external-dns:v0.9.0
        args:
        - --source=ingress
        - --source=service
        - --provider=coredns
        - --log-level=debug # debug only
        env:
        - name: ETCD_URLS
          value: http://etcd.dns.svc.cluster.local:2379

The extra note about the namespaces being wrong doesn't apply here, since it's a ClusterRole I'd figure?

Perhaps logging the API calls being done when at log-level=debug wouldn't be such a bad suggestion?

Especially since the above configuration works on our local test cluster, but not our production cluster, even with @atsai1220's suggestion of adding:

- kind: ServiceAccount
  name: default
  namespace: default

which looks like it shouldn't do much, since I'd hope external-dns to use its configured service account, not the default one.

both logs look like this:

time="2021-09-21T12:01:28Z" level=info msg="config: { ... }"
time="2021-09-21T12:01:28Z" level=info msg="Instantiating new Kubernetes client"
time="2021-09-21T12:01:28Z" level=debug msg="apiServerURL: "
time="2021-09-21T12:01:28Z" level=debug msg="kubeConfig: "
time="2021-09-21T12:01:28Z" level=info msg="Using inCluster-config based on serviceaccount-token"
time="2021-09-21T12:01:28Z" level=info msg="Created Kubernetes client https://10.96.0.1:443"

one continues with:

time="2021-09-21T12:02:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

the other one continues with:

time="2021-09-21T12:01:34Z" level=debug msg="Getting service (&{172.24.2.52 0 10 0 \"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/dashboards/grafana\" false 0 1  /skydns/local/k8s/grafana/2be8baf8}) with service host (172.24.2.52)"
$ kubectl auth can-i --as=system:serviceaccount:dns:external-dns --list
Resources                                       Non-Resource URLs                     Resource Names   Verbs
selfsubjectaccessreviews.authorization.k8s.io   []                                    []               [create]
selfsubjectrulesreviews.authorization.k8s.io    []                                    []               [create]
endpoints                                       []                                    []               [get watch list]
pods                                            []                                    []               [get watch list]
services                                        []                                    []               [get watch list]
ingresses.extensions                            []                                    []               [get watch list]
ingresses.networking.k8s.io                     []                                    []               [get watch list]
                                                [/.well-known/openid-configuration]   []               [get]
                                                [/api/*]                              []               [get]
                                                [/api]                                []               [get]
                                                [/apis/*]                             []               [get]
                                                [/apis]                               []               [get]
                                                [/healthz]                            []               [get]
                                                [/healthz]                            []               [get]
                                                [/livez]                              []               [get]
                                                [/livez]                              []               [get]
                                                [/openapi/*]                          []               [get]
                                                [/openapi]                            []               [get]
                                                [/openid/v1/jwks]                     []               [get]
                                                [/readyz]                             []               [get]
                                                [/readyz]                             []               [get]
                                                [/version/]                           []               [get]
                                                [/version/]                           []               [get]
                                                [/version]                            []               [get]
                                                [/version]                            []               [get]
nodes                                           []                                    []               [list]

Considering it's probably #2168 i'm running into, having some more debug logging would probably help?
Since using smartdeploy/external-dns:latest from #2281 suddenly makes it work.
So I'd guess the ingress list calls failing would have been the reason for the sync timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants