You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I set the heartbeat interval to 3 seconds instead of default 5 seconds.
We can see the most time it will send heatbeat every 3 seconds as configured. But I also find that between 2025-02-19 19:42:50.947 and 2025-02-19 19:43:51.502, it is more than 60 seconds.
Arthas debug
I trace the nacos client hearbeat method. And find sometime it takes tens of seconds to get a response from nacos server
after changing above configuration, issue occur less frequently. But it is not gone.
Any suggestions?? Thanks.
Expected behavior
A clear and concise description of what you expected to happen.
Actually behavior
A clear and concise description of what you actually to happen.
How to Reproduce
Steps to reproduce the behavior:
Go to '...'
Click on '....'
Scroll down to '....'
See errors
Desktop (please complete the following information):
OS: [centos]
Version [e.g. nacos-server 2.2.3, nacos-client 1.4.2]
Module [e.g. naming]
SDK [e.g. original]
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
ThePeterQin
changed the title
nacos hearbeat issue: microservice health state is constantly changing while I am sure it is health
nacos heartbeat issue: microservice health state is constantly changing while I am sure it is health
Feb 19, 2025
Describe the bug
issue description
Hi nacos experts,
I am using nacos server 2.23。 standalone mode. nacos client version is 1.4.2
We have many microservcies but only one of them has this issue.
I get 503 from the webpage for the API of this microsevices.
But I am sure the microserivces is heath.
nacos server log
I check the nacos logs ,two log
naming-server.log
2025-02-19 19:33:29,507 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:33:29,621 INFO Client connection 10.2.14.217:33030#true connect
below is the grep output
[root@gvadmin logs]# grep 33030 naming-server.log |grep dis
2025-02-19 19:07:04,447 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:08:34,450 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:09:24,450 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:10:04,451 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:11:24,456 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:14:24,465 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:15:19,466 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:17:29,468 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:19:49,476 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:22:44,484 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:24:09,485 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:25:59,491 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:29:09,496 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:30:49,502 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:32:49,506 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:33:29,507 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:35:39,522 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:36:49,523 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
2025-02-19 19:38:34,527 INFO Client connection 10.2.14.217:33030#true disconnect, remove instances and subscribers
naming-event.log
I also check the naming-event.log
2025-02-19 13:55:39,277 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739944509276
2025-02-19 13:57:06,169 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739944595156
2025-02-19 13:58:29,181 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739944675508
2025-02-19 13:59:33,846 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739944743845
2025-02-19 14:01:11,792 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739944838972
2025-02-19 14:03:21,752 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739944968408
2025-02-19 14:04:27,376 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739945037374
2025-02-19 14:05:42,372 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739945110905
2025-02-19 14:06:20,943 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739945150942
2025-02-19 14:06:58,701 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739945188700
2025-02-19 14:07:48,971 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739945237444
2025-02-19 14:08:58,091 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739945308090
2025-02-19 14:09:56,895 INFO {POS} {IP-DISABLED} valid: 10.2.14.217:33030@DEFAULT@acx-collector, region: unknown, msg: client last beat: 1739945366894
nacos client log
I enable nacos client log
2025-02-19 19:42:41.874 [TraceId:,SpanId:][DEBUG][com.alibaba.nacos.naming.beat.sender][com.alibaba.nacos.client.naming:421] com.alibaba.nacos.client.naming - [BEAT] gridview sending beat to server: BeatInfo{port=33089, ip='10.10.10.10', weight=1.0, serviceName='DEFAULT_GROUP@@acx-collector', cluster='DEFAULT', metadata={preserved.heart.beat.timeout=60000, preserved.ip.delete.timeout=120000, preserved.register.source=SPRING_CLOUD, preserved.heart.beat.interval=3000}, scheduled=false, period=3000, stopped=false}
2025-02-19 19:42:44.944 [TraceId:,SpanId:][DEBUG][com.alibaba.nacos.naming.beat.sender][com.alibaba.nacos.client.naming:421] com.alibaba.nacos.client.naming - [BEAT] gridview sending beat to server: BeatInfo{port=33089, ip='10.10.10.10', weight=1.0, serviceName='DEFAULT_GROUP@@acx-collector', cluster='DEFAULT', metadata={preserved.heart.beat.timeout=60000, preserved.ip.delete.timeout=120000, preserved.register.source=SPRING_CLOUD, preserved.heart.beat.interval=3000}, scheduled=false, period=3000, stopped=false}
2025-02-19 19:42:47.945 [TraceId:,SpanId:][DEBUG][com.alibaba.nacos.naming.beat.sender][com.alibaba.nacos.client.naming:421] com.alibaba.nacos.client.naming - [BEAT] gridview sending beat to server: BeatInfo{port=33089, ip='10.10.10.10', weight=1.0, serviceName='DEFAULT_GROUP@@acx-collector', cluster='DEFAULT', metadata={preserved.heart.beat.timeout=60000, preserved.ip.delete.timeout=120000, preserved.register.source=SPRING_CLOUD, preserved.heart.beat.interval=3000}, scheduled=false, period=3000, stopped=false}
2025-02-19 19:42:50.947 [TraceId:,SpanId:][DEBUG][com.alibaba.nacos.naming.beat.sender][com.alibaba.nacos.client.naming:421] com.alibaba.nacos.client.naming - [BEAT] gridview sending beat to server: BeatInfo{port=33089, ip='10.10.10.10', weight=1.0, serviceName='DEFAULT_GROUP@@acx-collector', cluster='DEFAULT', metadata={preserved.heart.beat.timeout=60000, preserved.ip.delete.timeout=120000, preserved.register.source=SPRING_CLOUD, preserved.heart.beat.interval=3000}, scheduled=false, period=3000, stopped=false}
2025-02-19 19:43:51.502 [TraceId:,SpanId:][INFO][com.alibaba.nacos.client.naming.updater][com.alibaba.nacos.client.naming:81] com.alibaba.nacos.client.naming - [BEAT] adding beat: BeatInfo{port=33089, ip='10.10.10.10', weight=1.0, serviceName='DEFAULT_GROUP@@acx-collector', cluster='DEFAULT', metadata={preserved.heart.beat.timeout=60000, preserved.ip.delete.timeout=120000, preserved.register.source=SPRING_CLOUD, preserved.heart.beat.interval=3000}, scheduled=false, period=3000, stopped=false} to beat map.
I set the heartbeat interval to 3 seconds instead of default 5 seconds.
We can see the most time it will send heatbeat every 3 seconds as configured. But I also find that between 2025-02-19 19:42:50.947 and 2025-02-19 19:43:51.502, it is more than 60 seconds.
Arthas debug
I trace the nacos client hearbeat method. And find sometime it takes tens of seconds to get a response from nacos server
---ts=2025-02-19 19:56:22;thread_name=com.alibaba.nacos.naming.beat.sender;id=1981;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@5c7fa833
---[1.686438ms] com.alibaba.nacos.client.naming.net.NamingProxy:callServer()+---[1.44% 0.02433ms ] com.alibaba.nacos.client.naming.net.NamingProxy:injectSecurityInfo() #588
+---[1.92% 0.032399ms ] com.alibaba.nacos.client.naming.net.NamingProxy:builderHeader() #589
+---[0.52% 0.008784ms ] com.alibaba.nacos.common.utils.IPUtil:containsPort() #595
+---[0.31% 0.005271ms ] com.alibaba.nacos.client.naming.net.NamingHttpClientManager:getInstance() #598
+---[0.28% 0.004714ms ] com.alibaba.nacos.client.naming.net.NamingHttpClientManager:getPrefix() #598
+---[0.30% 0.005135ms ] com.alibaba.nacos.common.http.param.Query:newInstance() #603
+---[0.47% 0.008003ms ] com.alibaba.nacos.common.http.param.Query:initParams() #603
+---[82.76% 1.395717ms ] com.alibaba.nacos.common.http.client.NacosRestTemplate:exchangeForm() #603
+---[0.41% 0.006901ms ] com.alibaba.nacos.common.http.HttpRestResult:getCode() #606
+---[0.56% 0.009461ms ] com.alibaba.nacos.client.monitor.MetricsMonitor:getNamingRequestMonitor() #606
+---[0.60% 0.01019ms ] io.prometheus.client.Histogram$Child:observe() #607
+---[0.30% 0.005074ms ] com.alibaba.nacos.common.http.HttpRestResult:ok() #609
`---[0.30% 0.005124ms ] com.alibaba.nacos.common.http.HttpRestResult:getData() #610
---ts=2025-02-19 19:56:25;thread_name=com.alibaba.nacos.naming.beat.sender;id=1981;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@5c7fa833
---[1.527801ms] com.alibaba.nacos.client.naming.net.NamingProxy:callServer()+---[2.56% 0.039069ms ] com.alibaba.nacos.client.naming.net.NamingProxy:injectSecurityInfo() #588
+---[1.90% 0.028974ms ] com.alibaba.nacos.client.naming.net.NamingProxy:builderHeader() #589
+---[0.51% 0.007851ms ] com.alibaba.nacos.common.utils.IPUtil:containsPort() #595
+---[0.65% 0.009856ms ] com.alibaba.nacos.client.naming.net.NamingHttpClientManager:getInstance() #598
+---[0.42% 0.006376ms ] com.alibaba.nacos.client.naming.net.NamingHttpClientManager:getPrefix() #598
+---[0.34% 0.005146ms ] com.alibaba.nacos.common.http.param.Query:newInstance() #603
+---[0.53% 0.008146ms ] com.alibaba.nacos.common.http.param.Query:initParams() #603
+---[80.94% 1.236647ms ] com.alibaba.nacos.common.http.client.NacosRestTemplate:exchangeForm() #603
+---[0.41% 0.006272ms ] com.alibaba.nacos.common.http.HttpRestResult:getCode() #606
+---[0.51% 0.007815ms ] com.alibaba.nacos.client.monitor.MetricsMonitor:getNamingRequestMonitor() #606
+---[0.60% 0.009191ms ] io.prometheus.client.Histogram$Child:observe() #607
+---[0.31% 0.004791ms ] com.alibaba.nacos.common.http.HttpRestResult:ok() #609
`---[0.27% 0.004196ms ] com.alibaba.nacos.common.http.HttpRestResult:getData() #610
---ts=2025-02-19 19:56:28;thread_name=com.alibaba.nacos.naming.beat.sender;id=1981;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@5c7fa833
---[29588.063595ms] com.alibaba.nacos.client.naming.net.NamingProxy:callServer()+---[0.00% 0.037133ms ] com.alibaba.nacos.client.naming.net.NamingProxy:injectSecurityInfo() #588
+---[0.00% 0.048256ms ] com.alibaba.nacos.client.naming.net.NamingProxy:builderHeader() #589
+---[0.00% 0.008791ms ] com.alibaba.nacos.common.utils.IPUtil:containsPort() #595
---ts=2025-02-19 19:56:27;thread_name=com.alibaba.nacos.client.naming.updater;id=1de2;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@5c7fa833
---[31279.890666ms] com.alibaba.nacos.client.naming.net.NamingProxy:callServer()+---[0.00% 0.026814ms ] com.alibaba.nacos.client.naming.net.NamingProxy:injectSecurityInfo() #588
+---[0.00% 0.031838ms ] com.alibaba.nacos.client.naming.net.NamingProxy:builderHeader() #589
+---[0.00% 0.008217ms ] com.alibaba.nacos.common.utils.IPUtil:containsPort() #595
+---[0.00% 0.004847ms ] com.alibaba.nacos.client.naming.net.NamingHttpClientManager:getInstance() #598
+---[0.00% 0.00501ms ] com.alibaba.nacos.client.naming.net.NamingHttpClientManager:getPrefix() #598
+---[0.00% 0.00509ms ] com.alibaba.nacos.common.http.param.Query:newInstance() #603
+---[0.00% 0.008567ms ] com.alibaba.nacos.common.http.param.Query:initParams() #603
+---[100.00% 31279.576339ms ] com.alibaba.nacos.common.http.client.NacosRestTemplate:exchangeForm() #603
+---[0.00% 0.006175ms ] com.alibaba.nacos.common.http.HttpRestResult:getCode() #606
+---[0.00% 0.011123ms ] com.alibaba.nacos.client.monitor.MetricsMonitor:getNamingRequestMonitor() #606
+---[0.00% 0.011019ms ] io.prometheus.client.Histogram$Child:observe() #607
+---[0.00% 0.010112ms ] com.alibaba.nacos.common.http.HttpRestResult:ok() #609
`---[0.00% 0.005058ms ] com.alibaba.nacos.common.http.HttpRestResult:getData() #610
What we have tried:
we changed below configurations,
discovery:
enabled: true
namespace: ${nacos_namespace}
ip: 10.10.10.10
register-enabled: true
heart-beat-interval: 3000
heart-beat-timeout: 60000
ip-delete-timeout: 120000
after changing above configuration, issue occur less frequently. But it is not gone.
Any suggestions?? Thanks.
Expected behavior
A clear and concise description of what you expected to happen.
Actually behavior
A clear and concise description of what you actually to happen.
How to Reproduce
Steps to reproduce the behavior:
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: