Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] MTU larger than 1500 bytes does not work over Linux bridges #1985

Open
ipspace opened this issue Feb 26, 2025 · 3 comments
Open

[BUG] MTU larger than 1500 bytes does not work over Linux bridges #1985

ipspace opened this issue Feb 26, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@ipspace
Copy link
Owner

ipspace commented Feb 26, 2025

We're wasting tons of time arguing about MTU settings and ignoring the elephant in the room: it won't work unless we adjust the underlying virtualization infrastructure

FWIW, it would be "great fun" troubleshooting BGP sessions or large OSPF networks :((

Lab topology

You can use any two devices as long as one of them is a host (forcing the Linux bridge to be used).

nodes:
  r:
    device: iosv
  h:
    device: linux

links:
- r:
  h:
  mtu: 1600

To Reproduce

Start the lab and try to do ping -s 1500 r from h:(

The "oversized" packets get dropped at the ingress interface of the Linux bridge. Once that MTU is adjusted, ping works.

It looks like we have to set the ":libvirt__mtu" setting in Vagrantfile. I have no idea what happens with vrnetlab-based containers.

@ipspace ipspace added the bug Something isn't working label Feb 26, 2025
@jbemmel
Copy link
Collaborator

jbemmel commented Feb 26, 2025

It looks like we have to set the ":libvirt__mtu" setting in Vagrantfile. I have no idea what happens with vrnetlab-based containers.

See https://github.com/srl-labs/containerlab/blob/main/docs/manual/network.md#link-mtu

MTU defaults to 9500, bridge "should" inherit the minimum MTU

@ipspace
Copy link
Owner Author

ipspace commented Feb 26, 2025

Update(s):

  • Libvirt LAN links definitely have a problem
  • UDP tunnels (libvirt P2P links) seem to be OK
  • Pure containers are OK as the change in MTU changes the MTU of the underlying Linux interface
  • vrnetlab containers are OK -- containerlab changes the MTU to 9500, and the corresponding QEMU tap interface has the MTU 65000

The truly bizarre part: it seems like the Linux bridge is doing IPv4 fragmentation while bridging packets. I can't decide whether to be amazed or disgusted.

@jbemmel
Copy link
Collaborator

jbemmel commented Feb 26, 2025

The truly bizarre part: it seems like the Linux bridge is doing IPv4 fragmentation while bridging packets. I can't decide whether to be amazed or disgusted.

https://www.spinics.net/lists/netdev/msg596072.html

When the "/proc/sys/net/bridge/bridge-nf-call-iptables" is on, bridge will do defragment at PREROUTING and re-fragment at POSTROUTING. At the re-fragment bridge will check if the max frag size is larger than the bridge's MTU in  br_nf_ip_fragment(), if it is true packets will be dropped. And this patch use the outdev's MTU instead of the bridge's MTU to do the br_nf_ip_fragment.

Could it be br_netfilter doing the (de)fragmentation, in support of firewall filters? See https://github.com/torvalds/linux/blob/master/net/bridge/br_netfilter_hooks.c#L807

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants