Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with calico 3.29 - nodeport only accessible from the node which the pod is located on and internal connections between pods located on different nodes also fail #11940

Open
sinawic opened this issue Feb 3, 2025 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@sinawic
Copy link

sinawic commented Feb 3, 2025

What happened?

i had a very normal and default installation with 3 nodes
1 control-plane node
2 worker nodes

i didn't change any configurations
and im using inventory files from my previous cluster that is working and is generated by python3 contrib/inventory_builder/inventory.py in previous version and also i commented out the kubeadm_patches in group_vars/k8s_cluster/k8s-cluster.yml

also i get the following errors during the installation
but the installation succeeds

fatal: [node2]: FAILED! => {"msg": "The conditional check '(modprobe_conntrack_module|default({'rc': 1})).rc != 0' failed. The error was: error while evaluating conditional ((modprobe_conntrack_module|default({'rc': 1})).rc != 0): 'dict object' has no attribute 'rc'. 'dict object' has no attribute 'rc'\n\nThe error appears to be in '/home/ubuntu/kubespray/roles/kubernetes/node/tasks/main.yml': line 121, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Modprobe conntrack module\n  ^ here\n"}
...ignoring
fatal: [node1]: FAILED! => {"msg": "The conditional check '(modprobe_conntrack_module|default({'rc': 1})).rc != 0' failed. The error was: error while evaluating conditional ((modprobe_conntrack_module|default({'rc': 1})).rc != 0): 'dict object' has no attribute 'rc'. 'dict object' has no attribute 'rc'\n\nThe error appears to be in '/home/ubuntu/kubespray/roles/kubernetes/node/tasks/main.yml': line 121, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Modprobe conntrack module\n  ^ here\n"}
...ignoring
fatal: [node3]: FAILED! => {"msg": "The conditional check '(modprobe_conntrack_module|default({'rc': 1})).rc != 0' failed. The error was: error while evaluating conditional ((modprobe_conntrack_module|default({'rc': 1})).rc != 0): 'dict object' has no attribute 'rc'. 'dict object' has no attribute 'rc'\n\nThe error appears to be in '/home/ubuntu/kubespray/roles/kubernetes/node/tasks/main.yml': line 121, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Modprobe conntrack module\n  ^ here\n"}
...ignoring

and

fatal: [node1]: FAILED! => {"changed": false, "cmd": ["/usr/local/bin/calicoctl.sh", "get", "felixconfig", "default", "-o", "json"], "delta": "0:00:00.157751", "end": "2025-02-03 10:02:16.714554", "msg": "non-zero return code", "rc": 1, "start": "2025-02-03 10:02:16.556803", "stderr": "resource does not exist: FelixConfiguration(default) with error: felixconfigurations.crd.projectcalico.org \"default\" not found", "stderr_lines": ["resource does not exist: FelixConfiguration(default) with error: felixconfigurations.crd.projectcalico.org \"default\" not found"], "stdout": "null", "stdout_lines": ["null"]}
...ignoring

fatal: [node1]: FAILED! => {"changed": false, "cmd": ["/usr/local/bin/calicoctl.sh", "get", "ippool", "default-pool", "-o", "json"], "delta": "0:00:00.151831", "end": "2025-02-03 10:02:18.738497", "msg": "non-zero return code", "rc": 1, "start": "2025-02-03 10:02:18.586666", "stderr": "resource does not exist: IPPool(default-pool) with error: ippools.crd.projectcalico.org \"default-pool\" not found", "stderr_lines": ["resource does not exist: IPPool(default-pool) with error: ippools.crd.projectcalico.org \"default-pool\" not found"], "stdout": "null", "stdout_lines": ["null"]}
...ignoring

fatal: [node1]: FAILED! => {"changed": false, "cmd": ["/usr/local/bin/calicoctl.sh", "get", "bgpconfig", "default", "-o", "json"], "delta": "0:00:00.112038", "end": "2025-02-03 10:02:22.153068", "msg": "non-zero return code", "rc": 1, "start": "2025-02-03 10:02:22.041030", "stderr": "resource does not exist: BGPConfiguration(default) with error: bgpconfigurations.crd.projectcalico.org \"default\" not found", "stderr_lines": ["resource does not exist: BGPConfiguration(default) with error: bgpconfigurations.crd.projectcalico.org \"default\" not found"], "stdout": "null", "stdout_lines": ["null"]}
...ignoring

What did you expect to happen?

the installation completes successfully
and i expect when i have a service type nodeport to be accessible from all three nodes
and also internal communications between pods located in different nodes to be successful

How can we reproduce it (as minimally and precisely as possible)?

bec i used everything as default steps would be:

git clone https://github.com/kubernetes-sigs/kubespray.git
VENVDIR=kubespray-venv
KUBESPRAYDIR=kubespray
sudo apt install python3-pip
sudo apt install python3.12-venv
python3 -m venv $VENVDIR
source $VENVDIR/bin/activate
cd $KUBESPRAYDIR

# i did the following three commands in previous version 'release-2.26'
# cp -rfp inventory/sample inventory/mycluster
# declare -a IPS=( 10.10.10.100 10.10.10.101 10.10.10.102 )
# CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}

# add following to inventory/mycluster/hosts.yaml
# all:
#   vars:
#     ansible_user: root

# comment the kubeadm_patches

ansible-playbook -i inventory/mycluster/hosts.yaml  --become --become-user=root cluster.yml

OS

Linux node1 6.8.0-52-generic #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 11 00:06:25 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Version of Ansible

ansible [core 2.16.14]
config file = /home/ubuntu/kubespray/ansible.cfg
configured module search path = ['/home/ubuntu/kubespray/library']
ansible python module location = /home/ubuntu/kubespray-venv/lib/python3.12/site-packages/ansible
ansible collection location = /home/ubuntu/.ansible/collections:/usr/share/ansible/collections
executable location = /home/ubuntu/kubespray-venv/bin/ansible
python version = 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0] (/home/ubuntu/kubespray-venv/bin/python3)
jinja version = 3.1.5
libyaml = True

Version of Python

Python 3.12.3

Version of Kubespray (commit)

59e1638

Network plugin used

calico

Full inventory with variables

ans.json

Command used to invoke ansible

ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml

Output of ansible run

installation completes successfully
and i can interact with the cluster using kubectl
i can deploy pods and services ...

Anything else we need to know

No response

@sinawic sinawic added the kind/bug Categorizes issue or PR as related to a bug. label Feb 3, 2025
@sinawic
Copy link
Author

sinawic commented Feb 3, 2025

i tried and used the inventory/sample with the latest changes also
still i get the same errors in my installation
and installation succeeds
but again i cant access nodeport from other nodes, but only from the node that the pod is located on

despite the port is open on three nodes

@sinawic
Copy link
Author

sinawic commented Feb 4, 2025

here's nmap port scan results on my three nodes:

sudo nmap 10.10.10.102 -p 30000-33000
Starting Nmap 7.94SVN ( https://nmap.org ) at 2025-02-04 12:13 +0330
Nmap scan report for 10.10.10.102
Host is up (0.00030s latency).
Not shown: 3000 closed tcp ports (reset)
PORT      STATE    SERVICE
32317/tcp filtered unknown
MAC Address: 00:0C:29:74:5F:AA (VMware)
Nmap done: 1 IP address (1 host up) scanned in 1.55 seconds


sudo nmap 10.10.10.101 -p 30000-33000
Starting Nmap 7.94SVN ( https://nmap.org ) at 2025-02-04 12:14 +0330
Nmap scan report for 10.10.10.101
Host is up (0.00026s latency).
Not shown: 3000 closed tcp ports (reset)
PORT      STATE SERVICE
32317/tcp open  unknown
MAC Address: 00:0C:29:7C:9D:59 (VMware)
Nmap done: 1 IP address (1 host up) scanned in 0.36 seconds


sudo nmap 10.10.10.100 -p 30000-33000
Starting Nmap 7.94SVN ( https://nmap.org ) at 2025-02-04 12:14 +0330
Nmap scan report for 10.10.10.100
Host is up (0.00031s latency).
Not shown: 3000 closed tcp ports (reset)
PORT      STATE    SERVICE
32317/tcp filtered unknown
MAC Address: 00:0C:29:0B:72:35 (VMware)
Nmap done: 1 IP address (1 host up) scanned in 1.55 seconds

@sinawic
Copy link
Author

sinawic commented Feb 5, 2025

i can confirm that calico version v3.28.1 is working correctly
and any version above that is not

damnnn i tried everything and every fix i found - none worked

although my deep knowledge about kubernetes is not perfect! it just got a little better :(((( :)))))))

@sinawic sinawic changed the title nodeport only accessible from the node which the pod is located on and internal connections between pods located on different nodes also fail issue with calico 3.29 - nodeport only accessible from the node which the pod is located on and internal connections between pods located on different nodes also fail Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant