Bug #247

Kernel BUG in __netdev_adjacent_dev_remove in complex bridge/VLAN setups

Added by Matthias Schiffer about 1 year ago. Updated about 1 year ago.

Status:RejectedStart date:03/18/2016
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

We've gotten the report that batman-adv will crash the kernel in the following setup:

  • 5 VLANs (eth0.2, eth0.3, eth0.100, eth0.101, eth0.102)
  • bat0 on eth0.100, eth0.101, eth0.102
  • br-wan on eth0.2
  • br-client on bat0, eth0.3

When eth0 goes down and OpenWrt's netifd subsequently removes eth0.* from bat0, the crash in the attached kernel log occurs (the kernel tries to remove eth0 from br-client, but eth0 isn't a port of br-client)

The issue was reported for the OpenWrt sunxi target; I was not able to reproduce it using the same version and setup on x86, therefore I'm not sure which parts of the setup are relevant.

Kernel: 3.18.27 (current OpenWrt CC HEAD)
batman-adv: 2016.0

Gluon issue tracker reference: https://github.com/freifunk-gluon/gluon/issues/680

dmesg.txt Magnifier (23.4 KB) Matthias Schiffer, 03/18/2016 01:08 PM

dmesg.txt Magnifier (46.5 KB) Matthias Schiffer, 03/25/2016 08:20 PM

History

#1 Updated by Sven Eckelmann about 1 year ago

Sounds like this one https://patchwork.ozlabs.org/patch/587118/. The CC kernel and the upstream kernel report both hit this BUG https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/net/core/dev.c?h=v4.5#n5531

The other patch I found was (but mostly unrelated to this problem but still output from this BUG): https://patchwork.ozlabs.org/patch/525329/

#2 Updated by Matthias Schiffer about 1 year ago

Indeed there is also a macvlan device on top of br-client, which I forgot in my report.

The reporter of the Gluon ticket has tested that

#3 Updated by Sven Eckelmann about 1 year ago

Interesting. Can you or the original reporter reply to the patch from this guy (you can find the mail in mbox format on the patchwork page).

#4 Updated by Matthias Schiffer about 1 year ago

Which patch are you talking about?

#5 Updated by Sven Eckelmann about 1 year ago

https://patchwork.ozlabs.org/patch/587118/ - because he knows that it is not the right way ("incomplete") to fix it but asked for comments.

#6 Updated by Matthias Schiffer about 1 year ago

https://patchwork.ozlabs.org/patch/587118/ changes the log output of the crash: instead of trying to remove eth0 from br-client, it is now trying to remove eth0 from local-node (local-node is the macvlan device on top of br-client.) I've attached the new dmesg sent by the original reporter.

#7 Updated by Sven Eckelmann about 1 year ago

Did you contact Andrew Collins + David Miller?

#8 Updated by Matthias Schiffer about 1 year ago

I just answered Andrew's mail. I forgot to CC David though.

#9 Updated by Sven Eckelmann about 1 year ago

Thanks :)

#10 Updated by Sven Eckelmann about 1 year ago

  • Status changed from New to Rejected

I am marking this bug as rejected because it looks like it was fixed by https://patchwork.ozlabs.org/patch/602124/ in a core net code of the kernel (see https://github.com/freifunk-gluon/gluon/issues/680#issuecomment-202438927). But please feel free to reopen it in case you find more information about required changes in batman-adv

Also available in: Atom PDF