Project

General

Profile

Actions

Bug #294

closed

batman-adv panic on brcm47xx with ethernet + wifi in bat0

Added by Russell Senior over 7 years ago. Updated over 7 years ago.

Status:
Rejected
Priority:
Normal
Target version:
-
Start date:
08/26/2016
Due date:
% Done:

0%

Estimated time:

Description

I have a small mesh, consisting of 9 netgear wgt634u and three ubiquiti bullet m5's. Two of the three bullet m5's run batman-adv over ethernet to an ethernet port (vlan 2) on two corresponding netgear wgt634u's. The wgt634u's all mesh on an adhoc wifi interface, and provide an ap on the same radio. I have noticed instability on the two wgt634u's that run batman-adv over their ethernet interfaces. The observable symptom is that the device hangs, isn't pingable, doesn't pass traffic. A power cycle fixes it. The panics seem more frequent on the wgt634u that seems to get more traffic. Today I rigged up a raspberry pi to capture serial console traffic from the most problematic device and managed to capture an oops. Because these two wgt634u's send data over ethernet, and the ethernet MTU is 1500, packets going over ethernet are often fragmented.

The devices are all running recent LEDE-Project firmware, with batman-adv 2016.2 release 2. Most of the devices are running lede-1365-g27f47f6, the wgt634u where I captured the oops is lede-1439-g6fdc527 (kernel v4.1.20).

The network config looks like this:

config interface 'loopback'
    option ifname 'lo'
    option proto 'static'
    option ipaddr '127.0.0.1'
    option netmask '255.0.0.0'

config interface 'pub'
    option type 'bridge'
    option proto 'static'
    option ip6assign '60'
    option ifname 'eth0.1 bat0'
    option ipaddr '10.11.80.11'
    option netmask '255.255.252.0'
    option gateway '10.11.80.1'
    option dns '10.11.80.1'

config switch
    option name 'switch0'
    option reset '1'
    option enable_vlan '1'

config switch_vlan
    option device 'switch0'
    option vlan '1'
    option ports '0 1 2 3 5t'

config switch_vlan
    option device 'switch0'
    option vlan '2'
    option ports '4 5t'

config interface 'mesh'
    option mtu '1532'
    option proto 'batadv'
    option mesh 'bat0'

config interface 'meshwire'
    option proto 'batadv'
    option mesh 'bat0'
    option ifname 'eth0.2'

The batman-adv config is minimal, just turning of bridge loop avoidance:

config mesh 'bat0'
    option bridge_loop_avoidance '0'

The wireless config is as follows:

config wifi-device 'radio0'
    option type 'mac80211'
    option hwmode '11g'
    option path 'pci0000:00/0000:00:01.0'
    option disabled '0'
    option channel '6'

config wifi-iface
    option device 'radio0'
    option ifname 'mesh0'
    option network 'mesh'
    option mode 'adhoc'
    option ssid 'ptp-mesh'
    option encryption 'none'

config wifi-iface
    option device 'radio0'
    option ifname 'wlan0'
    option network 'pub'
    option mode 'ap'
    option ssid 'www.personaltelco.net/mesh1'
    option encryption 'none'

The neighbor table looks like this:

# batctl n
[B.A.T.M.A.N. adv 2016.2, MainIF/MAC: eth0.2/00:0f:b5:0f:2b:cb (bat0 BATMAN_IV)]
           IF        Neighbor      last-seen
       eth0.2   morrison-roof_eth0    0.690s
        mesh0         mesh8_mesh0    0.310s
        mesh0    jerry-mesh_mesh0    0.740s
        mesh0         mesh7_mesh0    7.670s
        mesh0     pete-mesh_mesh0    0.980s
        mesh0      pam-mesh_mesh0    1.690s
        mesh0   michael-mesh_mesh0    1.110s
        mesh0   howard-mesh_mesh0    0.460s

The oops is as follows:

[13900.016381] Unhandled kernel unaligned access[#1]:
[13900.021285] CPU: 0 PID: 8 Comm: kworker/u2:1 Not tainted 4.1.20 #0
[13900.027740] Workqueue: phy0 ieee80211_ibss_leave [mac80211]
[13900.033427] task: 81841550 ti: 81880000 task.ti: 81880000
[13900.038875] $ 0   : 00000000 1000b800 00000001 00200000
[13900.044336] $ 4   : 4a86b583 00010000 00000000 00000000
[13900.049787] $ 8   : 041a6e1f 815d6e54 819c1289 5a16f70e
[13900.055258] $12   : 193d1c2a 00000fb5 00000000 fde176f8
[13900.060710] $16   : 816b3560 00000001 816b3588 81a99320
[13900.066179] $20   : 81bf7320 818819b8 8169dcd4 80cc18c4
[13900.071639] $24   : 00000000 8001ea88                  
[13900.077074] $28   : 81880000 818818e0 00000000 801d85bc
[13900.082546] Hi    : 00000004
[13900.085491] Lo    : 0037eeda
[13900.088496] epc   : 8007842c put_page+0x0/0x4c
[13900.093064] ra    : 801d85bc skb_release_data+0xa8/0x10c
[13900.098434] Status: 1000b803 KERNEL EXL IE
[13900.102773] Cause : 00800010
[13900.105719] BadVA : 4a86b583
[13900.108674] PrId  : 00029007 (Broadcom BMIPS3300)
[13900.113424] Modules linked in: ath5k mac80211 ath batman_adv libcrc32c cfg80211 compat crypto_null leds_gpio ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common crc16 ssb_hcd aead crc32c_generic crypto_hash
[13900.134142] Process kworker/u2:1 (pid: 8, threadinfo=81880000, task=81841550, tls=00000000)
[13900.142524] Stack : 00000020 00000000 801db418 801dabec 81bb4000 81a99320 00000000 00100100
          00200200 801d8648 00000000 00000001 81bf7320 801db498 80d75500 81b49a74
          0000000a 81a374a0 81bf7320 80cc1850 80cc1800 81bf7320 00000000 00000564
          00000001 81b49ec4 80360000 81bb8800 81bf7320 81bb4400 81bb4000 00000000
          80c8985c 0000009c 81bb8800 81bb4000 81bb4400 80cc1800 8169dcde 00000000
          ...
[13900.179481] Call Trace:
[13900.182065] [<8007842c>] put_page+0x0/0x4c
[13900.186289] [<801d85bc>] skb_release_data+0xa8/0x10c
[13900.191375] [<801d8648>] __kfree_skb+0x28/0xb4
[13900.196032] [<81b49a74>] batadv_dat_drop_broadcast_packet+0x10c/0x138 [batman_adv]
[13900.203827] [<81b49ec4>] batadv_frag_skb_buffer+0x394/0x3d8 [batman_adv]
[13900.210757] [<81b54144>] batadv_recv_frag_packet+0x244/0x2c4 [batman_adv]
[13900.217773] [<81b4e518>] batadv_batman_skb_recv+0x180/0x1f4 [batman_adv]
[13900.224673] [<801e7168>] __netif_receive_skb_core+0x620/0x6e0
[13900.230548] [<801e7ddc>] netif_receive_skb_internal+0x60/0x70
[13900.236426] [<801c0ff0>] b44_poll+0x384/0x44c
[13900.240917] [<801e81c0>] net_rx_action+0x124/0x2f0
[13900.245875] [<80024fa4>] __do_softirq+0x184/0x2b0
[13900.250735] [<80025198>] do_softirq+0x48/0x68
[13900.255247] [<80025248>] __local_bh_enable_ip+0x90/0xb0
[13900.260793] [<80c12084>] ieee80211_get_vht_mask_from_cap+0x1900/0x1b8c [mac80211]
[13900.268516]
[13900.270075]
Code: 00003021  0801e0bc  24a57840 <8c820000> 3042c000  10400003  00801821  0801df83  00000000
[13900.280781] ---[ end trace af00a3cf0771ea56 ]---
[13900.294268] Kernel panic - not syncing: Fatal exception in interrupt
[13900.306101] Rebooting in 3 seconds..
}}}
Actions

Also available in: Atom PDF