Project

General

Profile

Actions

Bug #247

closed

Kernel BUG in __netdev_adjacent_dev_remove in complex bridge/VLAN setups

Added by Anonymous about 8 years ago. Updated about 8 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
03/18/2016
Due date:
% Done:

0%

Estimated time:

Description

We've gotten the report that batman-adv will crash the kernel in the following setup:

  • 5 VLANs (eth0.2, eth0.3, eth0.100, eth0.101, eth0.102)
  • bat0 on eth0.100, eth0.101, eth0.102
  • br-wan on eth0.2
  • br-client on bat0, eth0.3

When eth0 goes down and OpenWrt's netifd subsequently removes eth0.* from bat0, the crash in the attached kernel log occurs (the kernel tries to remove eth0 from br-client, but eth0 isn't a port of br-client)

The issue was reported for the OpenWrt sunxi target; I was not able to reproduce it using the same version and setup on x86, therefore I'm not sure which parts of the setup are relevant.

Kernel: 3.18.27 (current OpenWrt CC HEAD)
batman-adv: 2016.0

Gluon issue tracker reference: https://github.com/freifunk-gluon/gluon/issues/680


Files

dmesg.txt (23.4 KB) dmesg.txt Anonymous, 03/18/2016 01:08 PM
dmesg.txt (46.5 KB) dmesg.txt Anonymous, 03/25/2016 08:20 PM
Actions #1

Updated by Sven Eckelmann about 8 years ago

Sounds like this one https://patchwork.ozlabs.org/project/netdev/patch/56CCDD4D.4080303@cradlepoint.com/. The CC kernel and the upstream kernel report both hit this BUG https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/net/core/dev.c?h=v4.5#n5531

The other patch I found was (but mostly unrelated to this problem but still output from this BUG): https://patchwork.ozlabs.org/patch/525329/

Actions #2

Updated by Anonymous about 8 years ago

Indeed there is also a macvlan device on top of br-client, which I forgot in my report.

The reporter of the Gluon ticket has tested that
Actions #3

Updated by Sven Eckelmann about 8 years ago

Interesting. Can you or the original reporter reply to the patch from this person (you can find the mail in mbox format on the patchwork page).

Actions #4

Updated by Anonymous about 8 years ago

Which patch are you talking about?

Actions #5

Updated by Sven Eckelmann about 8 years ago

https://patchwork.ozlabs.org/project/netdev/patch/56CCDD4D.4080303@cradlepoint.com/ - because he knows that it is not the right way ("incomplete") to fix it but asked for comments.

Actions #6

Updated by Anonymous about 8 years ago

https://patchwork.ozlabs.org/patch/587118/ changes the log output of the crash: instead of trying to remove eth0 from br-client, it is now trying to remove eth0 from local-node (local-node is the macvlan device on top of br-client.) I've attached the new dmesg sent by the original reporter.

Actions #7

Updated by Sven Eckelmann about 8 years ago

Did you contact Andrew Collins + David Miller?

Actions #8

Updated by Anonymous about 8 years ago

I just answered Andrew's mail. I forgot to CC David though.

Actions #9

Updated by Sven Eckelmann about 8 years ago

Thanks :)

Actions #10

Updated by Sven Eckelmann about 8 years ago

  • Status changed from New to Rejected

I am marking this bug as rejected because it looks like it was fixed by https://patchwork.ozlabs.org/project/netdev/patch/56F5B746.9050401@cradlepoint.com/ in a core net code of the kernel (see https://github.com/freifunk-gluon/gluon/issues/680#issuecomment-202438927). But please feel free to reopen it in case you find more information about required changes in batman-adv

Actions

Also available in: Atom PDF