Open Mesh: Issueshttps://www.open-mesh.org/https://www.open-mesh.org/favicon.ico?16699090422021-04-20T09:49:13ZOpen Mesh
Redmine batman-adv - Bug #422 (New): General protection fault in batadv_orig_router_gethttps://www.open-mesh.org/issues/4222021-04-20T09:49:13ZLinus Lüssinglinus.luessing@c0d3.blue
<p>In a VM with kvm and a 5.11.9 kernel and a recent batman-adv from the master branch I get a general protection fault when putting the VM host to sleep and waking it up again later. The VM guest runs a few mesh instances (here bat1 to bat8).</p>
<p>Looks like some race condition where the orig node is deleted due to timeout but there is still an OGM in the queue from this node for further processing. Without putting a node in stand-by this seems unlikely to happen as when a node timeouts then there typically will be no OGM in the queue.</p>
<pre><code>
[308421.793525] batman_adv: bat3: IGMP Querier disappeared - multicast optimizations disabled
[308421.795414] batman_adv: bat3: MLD Querier disappeared - multicast optimizations disabled
[308421.801542] batman_adv: bat6: IGMP Querier disappeared - multicast optimizations disabled
[308421.802905] batman_adv: bat6: MLD Querier disappeared - multicast optimizations disabled
[308421.804257] batman_adv: bat5: IGMP Querier disappeared - multicast optimizations disabled
[308421.805761] batman_adv: bat5: MLD Querier disappeared - multicast optimizations disabled
[308421.813031] batman_adv: bat4: IGMP Querier disappeared - multicast optimizations disabled
[308421.814303] batman_adv: bat4: MLD Querier disappeared - multicast optimizations disabled
[308421.815716] batman_adv: bat2: IGMP Querier disappeared - multicast optimizations disabled
[308421.816779] batman_adv: bat2: MLD Querier disappeared - multicast optimizations disabled
[308421.819384] batman_adv: bat8: IGMP Querier disappeared - multicast optimizations disabled
[308421.820670] batman_adv: bat8: MLD Querier disappeared - multicast optimizations disabled
[308421.821942] batman_adv: bat7: IGMP Querier disappeared - multicast optimizations disabled
[308421.823706] batman_adv: bat7: MLD Querier disappeared - multicast optimizations disabled
[308422.813967] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP PTI
[308422.816150] CPU: 0 PID: 12563 Comm: kworker/u2:1 Tainted: G OE 5.11.9 #41
[308422.818045] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014
[308422.819949] Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet [batman_adv]
[308422.821797] RIP: 0010:batadv_orig_router_get+0x10/0x70 [batman_adv]
[308422.823032] Code: 03 00 00 00 4c 89 c7 e9 de d9 0d ea 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 41 54 48 8b 47 08 48 85 c0 74 0e <48> 39 70 10 74 16 48 8b 00 48 85 c0 75 f2 45 31 e4 e8 0a f2 d9
[308422.826658] RSP: 0018:ffffa0d140003d50 EFLAGS: 00010202
[308422.827700] RAX: 6b6b6b6b6b6b6b6b RBX: ffff90a4c7b2104e RCX: 000000000000000b
[308422.829049] RDX: 000000000000000a RSI: 0000000000000000 RDI: ffff90a4c4e68400
[308422.830400] RBP: ffff90a4c0763bd8 R08: ffff90a4c008e8c0 R09: 00000000000002c0
[308422.831759] R10: ffff90a4c7b21000 R11: 0000000000000001 R12: ffff90a4c008e878
[308422.833103] R13: ffff90a4c7b21040 R14: 0000000000000000 R15: 0000000000000001
[308422.834425] FS: 0000000000000000(0000) GS:ffff90a4cde00000(0000) knlGS:0000000000000000
[308422.835926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[308422.836984] CR2: 00007ffced2d6000 CR3: 0000000009cde000 CR4: 00000000000006f0
[308422.838301] Call Trace:
[308422.838789] <IRQ>
[308422.839318] batadv_iv_ogm_process_per_outif+0x261/0xff0 [batman_adv]
[308422.840332] ? enqueue_entity+0x163/0x760
[308422.841867] batadv_iv_ogm_receive+0x26a/0x4a0 [batman_adv]
[308422.842560] batadv_batman_skb_recv+0x117/0x1d0 [batman_adv]
[308422.843357] __netif_receive_skb_one_core+0x8e/0xa0
[308422.844492] process_backlog+0x96/0x160
[308422.845036] net_rx_action+0x146/0x430
[308422.845594] __do_softirq+0xc5/0x275
[308422.846510] asm_call_irq_on_stack+0x12/0x20
[308422.847059] </IRQ>
[308422.847348] do_softirq_own_stack+0x37/0x40
[308422.848588] do_softirq+0x5e/0x70
[308422.849420] __local_bh_enable_ip+0x4b/0x50
[308422.850164] __dev_queue_xmit+0x376/0x8b0
[308422.850989] batadv_send_skb_packet+0xcc/0xf0 [batman_adv]
[308422.851950] batadv_iv_send_outstanding_bat_ogm_packet+0x18d/0x1b0 [batman_adv]
[308422.853154] process_one_work+0x1ec/0x380
[308422.853946] worker_thread+0x53/0x3e0
[308422.854566] ? process_one_work+0x380/0x380
[308422.855276] kthread+0x11b/0x140
[308422.855827] ? __kthread_bind_mask+0x60/0x60
[308422.856577] ret_from_fork+0x22/0x30
[308422.857199] Modules linked in: batman_adv(OE) bridge(OE) veth(E) dummy(E) libcrc32c(E) crc32c_generic(E) crc32_generic(E) crc16(E) mac80211(E) cfg80211(E) rfkill(E) libarc4(E) stp(E) llc(E) rpcsec_gss_krb5(E]
[308422.867669] ---[ end trace 3d57397987128d5a ]---
[308422.868423] RIP: 0010:batadv_orig_router_get+0x10/0x70 [batman_adv]
[308422.869429] Code: 03 00 00 00 4c 89 c7 e9 de d9 0d ea 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 41 54 48 8b 47 08 48 85 c0 74 0e <48> 39 70 10 74 16 48 8b 00 48 85 c0 75 f2 45 31 e4 e8 0a f2 d9
[308422.872321] RSP: 0018:ffffa0d140003d50 EFLAGS: 00010202
[308422.873143] RAX: 6b6b6b6b6b6b6b6b RBX: ffff90a4c7b2104e RCX: 000000000000000b
[308422.874270] RDX: 000000000000000a RSI: 0000000000000000 RDI: ffff90a4c4e68400
[308422.875433] RBP: ffff90a4c0763bd8 R08: ffff90a4c008e8c0 R09: 00000000000002c0
[308422.876548] R10: ffff90a4c7b21000 R11: 0000000000000001 R12: ffff90a4c008e878
[308422.877735] R13: ffff90a4c7b21040 R14: 0000000000000000 R15: 0000000000000001
[308422.878778] FS: 0000000000000000(0000) GS:ffff90a4cde00000(0000) knlGS:0000000000000000
[308422.880257] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[308422.881197] CR2: 00007ffced2d6000 CR3: 0000000009cde000 CR4: 00000000000006f0
[308422.882512] Kernel panic - not syncing: Fatal exception in interrupt
[308422.884049] Kernel Offset: 0x29600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[308422.886580] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
</code></pre> batman-adv - Bug #421 (New): Misconfig or bug: received packet on bat0 with own address as source...https://www.open-mesh.org/issues/4212020-10-28T15:06:47ZAdrian Schmutzler
<a name="General-setup"></a>
<h2 >General setup:<a href="#General-setup" class="wiki-anchor">¶</a></h2>
<p>Freifunk Franken firmware fork, where Batman is used on a distributed Layer-2 network connected to gateways via fastd tunnels.</p>
<p>Each node offers client and mesh via ethernet (e.g. via vlans, eth0.1 for client and eth0.3 for mesh) and via WiFi (e.g. w2ap for 2.4 GHz AP und w2mesh for 2.4 GHz mesh (802.11s), w5ap for 5 GHz AP etc.)<br />We make sure that all of the "mesh" interfaces (e.g. eth0.3, w2mesh, w5mesh, i.e. what you see with batctl if) have distinct MAC addresses.<br />Same for all "client" interfaces, i.e. members of the same bridge br-mesh alongside bat0 (e.g. eth0.1, w2ap, w5ap)</p>
<p>MAC addresses are allowed to overlap <em>between</em> those groups, though, e.g. eth0.3 (="mesh") could have the same address as w2ap (="client/ap").</p>
<a name="Test-setup"></a>
<h2 >Test setup:<a href="#Test-setup" class="wiki-anchor">¶</a></h2>
<p>Isolated device configured as above and connected to Freifunk network via layer-3 (WAN), i.e. no batman neighbors ("batctl o" and "batctl n" are empty).<br />Device is acting as batman server (gw_mode server), but similar behavior can be produced with batman client nodes. BLA is active (=default).<br />TP-Link TL-WDR4900 v1<br />OpenWrt 19.07 (Tested with .3 on the device, the problem itself is present across all subversions including .4 observed on different devices)<br />Batman-adv openwrt-2019.2-7 (openwrt-routing 19.07 branch; I also tested with the recent 2019.2-10 including a recent BLA patch on a different device)</p>
<a name="Problem"></a>
<h2 >Problem:<a href="#Problem" class="wiki-anchor">¶</a></h2>
<p>dmesg (and logread) show the following every 10 seconds:</p>
<pre>
[ 179.939430] br-mesh: received packet on bat0 with own address as source address (addr:fa:1a:67:xx:xx:fb, vlan:0)
</pre>
<a name="Discussion"></a>
<h2 >Discussion:<a href="#Discussion" class="wiki-anchor">¶</a></h2>
<p>I can remove the warning via one of two measures:</p>
<ol>
<li>Remove the MAC address collision of eth0.3 ("mesh") and w5ap ("client") by giving an arbitrary unique MAC address to eth0.3</li>
<li>Disable BLA via uci set network.bat0.bridge_loop_avoidance='0'</li>
</ol>
<a name="Actual-question"></a>
<h2 >Actual question:<a href="#Actual-question" class="wiki-anchor">¶</a></h2>
<p>From my conceptual understanding, I do not see a reason why an overlap between "client" and "mesh" MAC addresses should be forbidden.<br />Actually, it's quite strange that particularly the overlap of eth0.3 ("mesh") and w5ap ("client") causes the warning, while the still existing overlap between eth0.1 ("client") and w5mesh ("mesh") is <em>not</em> harmful.</p>
<p>Therefore, my actual question is: is this intended behavior, i.e. is this MAC overlap actually forbidden? Or this is a bug (possibly caused/created by BLA)?<br />Keep in mind that this happens on an isolated device.</p>
<p>As a consequence, since disabling BLA removes the warning, would disabling BLA "solve" the problem then for the moment, since the packets sent by BLA are the root cause, or would disabling BLA just remove a detection tool for the misconfiguration that still exists?</p>
<a name="Further-info"></a>
<h2 >Further info:<a href="#Further-info" class="wiki-anchor">¶</a></h2>
<p>MAC addresses:</p>
<pre>
bat0: random
br-mesh: f8:...:fb
eth0: f8:...:fb (same as eth0.1)
eth0.2: f8:...:fc
eth0.3: fa:...:fb
w2ap: fa:...:fa
w2mesh: f8:...:fa
w5ap: fa:...:fb
w5mesh: f8:...:fb
</pre>
<p>(There are additional AP networks configured, but those have separate addresses and also are completely separate from batman)</p>
<p>OpenWrt network config:</p>
<pre>
config interface 'loopback'
option ifname 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'fdff:0::/64'
config interface 'wan'
option ifname 'eth0.2'
option proto 'dhcp'
config device 'wan_eth0_2_dev'
option name 'eth0.2'
option macaddr 'f8:1a:67:xx:xx:fc'
config switch
option name 'switch0'
option reset '1'
option enable_vlan '1'
config switch_vlan 'vlan1'
option device 'switch0'
option vlan '1'
option ports '0t 1t 4 5'
config switch_vlan 'vlan2'
option device 'switch0'
option vlan '2'
option ports '0t 1t'
config interface 'eth0_3'
option proto 'batadv_hardif'
option master 'bat0'
option ifname 'eth0.3'
config interface 'mesh'
option type 'bridge'
option auto '1'
option ifname 'bat0 eth0.1'
option macaddr 'f8:1a:67:xx:xx:fb'
list ip6addr 'fdff:0::0:f81a:67xx:xxfb/64'
...
option proto 'static'
list ipaddr '10.xx.xx.1/24'
option ip4table 'fff'
option ip6table 'fff'
config switch_vlan 'vlan3'
option device 'switch0'
option vlan '3'
option ports '0t 1t 2 3'
config device 'ethmesh_dev'
option name 'eth0.3'
option macaddr 'fa:1a:67:xx:xx:fb'
config interface 'w5mesh'
option mtu '1560'
option proto 'batadv_hardif'
option master 'bat0'
config interface 'configap5'
option proto 'static'
option ip6addr 'fe80::1/64'
config interface 'w2mesh'
option mtu '1560'
option proto 'batadv_hardif'
option master 'bat0'
config interface 'configap2'
option proto 'static'
option ip6addr 'fe80::1/64'
config interface 'bat0'
option proto 'batadv'
option gw_mode 'server'
option gw_sel_class '200000'
option network_coding '0'
option network_coding '0'
option aggregated_ogms '1'
option ap_isolation '0'
option bonding '0'
option fragmentation '1'
option orig_interval '1000'
option distributed_arp_table '1'
option hop_penalty '30'
# followed by various rules and wireguard interfaces
</pre> batman-adv - Bug #420 (New): KMSAN: uninit-value in batadv_nc_workerhttps://www.open-mesh.org/issues/4202020-10-01T11:49:33ZSven Eckelmann
<pre>
Hello,
syzbot found the following issue on:
HEAD commit: 5edb1df2 kmsan: drop the _nosanitize string functions
git tree: https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=10cc55a7900000
kernel config: https://syzkaller.appspot.com/x/.config?x=4991d22eb136035c
dashboard link: https://syzkaller.appspot.com/bug?extid=da9194708de785081f11
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
Unfortunately, I don't have any reproducer for this issue yet.
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+da9194708de785081f11@syzkaller.appspotmail.com
=====================================================
BUG: KMSAN: uninit-value in batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:408 [inline]
BUG: KMSAN: uninit-value in batadv_nc_worker+0x1c0/0x1d70 net/batman-adv/network-coding.c:718
CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: bat_events batadv_nc_worker
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x21c/0x280 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:201
batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:408 [inline]
batadv_nc_worker+0x1c0/0x1d70 net/batman-adv/network-coding.c:718
process_one_work+0x1688/0x2140 kernel/workqueue.c:2269
worker_thread+0x10bc/0x2730 kernel/workqueue.c:2415
kthread+0x551/0x590 kernel/kthread.c:293
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:143 [inline]
kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:126
kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:80
slab_alloc_node mm/slub.c:2907 [inline]
slab_alloc mm/slub.c:2916 [inline]
__kmalloc+0x2bb/0x4b0 mm/slub.c:3982
kmalloc_array+0x90/0x140 include/linux/slab.h:594
batadv_hash_new+0x129/0x530 net/batman-adv/hash.c:52
batadv_originator_init+0x9b/0x370 net/batman-adv/originator.c:211
batadv_mesh_init+0x4dc/0x9d0 net/batman-adv/main.c:204
batadv_softif_init_late+0x6d8/0xa30 net/batman-adv/soft-interface.c:857
register_netdevice+0xbbc/0x37d0 net/core/dev.c:9760
__rtnl_newlink net/core/rtnetlink.c:3454 [inline]
rtnl_newlink+0x2e77/0x3ed0 net/core/rtnetlink.c:3500
rtnetlink_rcv_msg+0x142b/0x18c0 net/core/rtnetlink.c:5563
netlink_rcv_skb+0x6d7/0x7e0 net/netlink/af_netlink.c:2470
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5581
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x11c8/0x1490 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x173a/0x1840 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
__sys_sendto+0x9dc/0xc80 net/socket.c:1992
__do_sys_sendto net/socket.c:2004 [inline]
__se_sys_sendto+0x107/0x130 net/socket.c:2000
__x64_sys_sendto+0x6e/0x90 net/socket.c:2000
do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
entry_SYSCALL_64_after_hwframe+0x44/0xa9
=====================================================
</pre>
<p>See also:</p>
<ul>
<li><a class="external" href="https://lists.open-mesh.org/mailman3/hyperkitty/list/b.a.t.m.a.n@lists.open-mesh.org/message/TFZFXLUH5GYL5NCR4CCAANDB2IPUPIYU/">https://lists.open-mesh.org/mailman3/hyperkitty/list/b.a.t.m.a.n@lists.open-mesh.org/message/TFZFXLUH5GYL5NCR4CCAANDB2IPUPIYU/</a></li>
<li><a class="external" href="https://lists.open-mesh.org/mailman3/hyperkitty/list/b.a.t.m.a.n@lists.open-mesh.org/message/HZN6NKEIY6JRCOFXE3O7OGPPUXGBVC3U/">https://lists.open-mesh.org/mailman3/hyperkitty/list/b.a.t.m.a.n@lists.open-mesh.org/message/HZN6NKEIY6JRCOFXE3O7OGPPUXGBVC3U/</a></li>
</ul> batman-adv - Feature #419 (New): BLA: redundant and superficial GW checkhttps://www.open-mesh.org/issues/4192020-09-14T09:24:08ZLinus Lüssinglinus.luessing@c0d3.blue
<p>The source address check in batadv_recv_unicast_packet() here is both superficial and redundant:</p>
<pre><code> 989 /* packet for me */
990 if (batadv_is_my_mac(bat_priv, unicast_packet->dest)) {
991 /* If this is a unicast packet from another backgone gw,
992 * drop it.
993 */
994 orig_addr_gw = eth_hdr(skb)->h_source;
995 orig_node_gw = batadv_orig_hash_find(bat_priv, orig_addr_gw);
996 if (orig_node_gw) {
997 is_gw = batadv_bla_is_backbone_gw(skb, orig_node_gw,
998 hdr_size);
999 batadv_orig_node_put(orig_node_gw);
1000 if (is_gw) {
1001 batadv_dbg(BATADV_DBG_BLA, bat_priv,
1002 "%s(): Dropped unicast pkt received from another backbone gw %pM.\n",
1003 __func__, orig_addr_gw);
1004 goto free_skb;
1005 }
1006 }
1007
</code></pre>
<p><a class="external" href="https://git.open-mesh.org/batman-adv.git/blob/f2a2e0310dc1c570bdd1439553e897649b000292:/net/batman-adv/routing.c#l1000">https://git.open-mesh.org/batman-adv.git/blob/f2a2e0310dc1c570bdd1439553e897649b000292:/net/batman-adv/routing.c#l1000</a></p>
<p>Redundant, because the sender is already supposed to perform this check, so no need to do it again on reception.</p>
Superficial, because it only works if:
<ul>
<li>The BLA backbone gateway we share a LAN with is a direct neighbor of us.</li>
<li>The BLA backbone gateway we share a LAN with transmits the packet via its primary interface to us.</li>
</ul>
<p>In all other cases, like received via multiple hops or via a secondary interface from the other BLA gateway does not work.</p>
Suggestion:
<ul>
<li>Either remove this check.</li>
<li>Or turn the according batadv_dbg() into a pr_warn_ratelimited() to help in spotting potential bugs</li>
</ul>
<p>(This check initially made it hard to reproduce the issue this patch is supposed to fix: <a class="external" href="https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/20200914012136.5278-2-linus.luessing@c0d3.blue/">https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/20200914012136.5278-2-linus.luessing@c0d3.blue/</a>. Initially it was easy to reproduce in a physical setup but then difficult to reproduce in a virtual one, because they had different configurations regarding primary vs. secondary interfaces.)</p> batman-adv - Bug #418 (New): BLA: claiming race condition with multicast from meshhttps://www.open-mesh.org/issues/4182020-09-14T08:54:59ZLinus Lüssinglinus.luessing@c0d3.blue
Scenario:
<ul>
<li>Two BLA backbone gateways sharing the same LAN, receiving a multicast packet from the mesh.</li>
</ul>
Issue:
<ul>
<li>Both BLA backbone gateways will race for a claim. And can end up with both gateways thinking the other one claimed the client. Resulting in packetloss for traffic from the mesh into the BLA backbone.</li>
</ul>
<p>This actually seems to be acknowledged in the code:</p>
<pre><code>1857 if (!claim) {
1858 /* possible optimization: race for a claim */
1859 /* No claim exists yet, claim it for us!
1860 */
</code></pre>
<p><a class="external" href="https://git.open-mesh.org/batman-adv.git/blob/f2a2e0310dc1c570bdd1439553e897649b000292:/net/batman-adv/bridge_loop_avoidance.c#l1858">https://git.open-mesh.org/batman-adv.git/blob/f2a2e0310dc1c570bdd1439553e897649b000292:/net/batman-adv/bridge_loop_avoidance.c#l1858</a></p>
<p>Typically this seems to resolve when the claim times out a bit earlier on one of the BLA backbone gateways. However it unfortunately seems quite persistent when the two nodes were set up on the same host at the same time via a script, for instance. And is probably also persistent when physically similar devices / nodes are booted at the same time, for instance after a power outage.</p>
<p>This should be reproducable with the attached script. It creates a fully meshed topology, with nodes 1 and 2 bridged via LAN, like the following:</p>
<pre><code> --[LAN/br0]--
| |
(1) (2)
| |
---[mesh]----
/ \
(3) ... (8)
</code></pre>
<p>To reproduce, run:</p>
<pre><code>$ ./test-mcast-bla.sh setup
$ ping6 ff12::123%br8
</code></pre>
<p>Then compare <i>"batctl meshif bat1 cl"</i> and <i>"batctl meshif bat2 cl"</i>. You should see that both nodes assume that the other one claimed the MAC from node 8. Furthermore, a <i>"tcpdump -i br0 'icmp6 and dst ff12::123'"</i> should stay silent, showing that the multicasted ICMPv6 Echo Requests are wrongly dropped into the BLA backbone by both node 1 and 2.</p>
<p>You can teardown the test mesh via "$ ./test-mcast-bla.sh teardown" (or restart it via "reload").</p>
<p>Tested in a x86 Debian VM running Linux 5.8.7.</p>
<p>Further notes:</p>
<ul>
<li>Probably more likely to trigger via multicast-to-unicasts than via classic flooding. As the latter adds some jitter on forwarding, making a race less likely.</li>
<li>More likely to trigger if the two BLA backbone gateways are direct neighbors so that they receive the multicast packets at the same time.</li>
</ul> batman-adv - Bug #417 (New): BLA, crash: null pointer dereference in batadv_bla_loopdetect_report...https://www.open-mesh.org/issues/4172020-08-27T08:07:08ZLinus Lüssinglinus.luessing@c0d3.blue
<p>Version:</p>
<pre><code>
$ batctl -v
batctl 2020.2-openwrt-1 [batman-adv: 2020.2-openwrt-1]
</code></pre>
<p>Setup:</p>
<p>8 nodes, 3 of those interconnected on the LAN with BLA. Two of the three LAN devices crashed after about 15 hours with the following trace:</p>
<pre><code>
Time: 1598432655.394152
Modules: pppoe@bf627000+5000 ppp_async@bf61e000+5000 batman_adv@bf5e6000+2c000 ath10k_pci@bf5d5000+b000 ath10k_core@bf55b000+67000 ath@bf552000+6000 pppox@bf54a000+4000 ppp_generic@bf53d000+8000 nft_set_rbtree@bf535000+4000 nft_set_hash@bf52b000+6000 nft_reject_ipv6@bf523000+4000 nft_reject_ipv4@bf51b000+4000 nft_reject_inet@bf513000+4000 nft_reject_bridge@bf50b000+4000 nft_reject@bf504000+4000 nft_redir@bf4fd000+4000 nft_quota@bf4f5000+4000 nft_numgen@bf4ed000+4000 nft_meta_bridge@bf4e5000+4000 nft_meta@bf4dd000+4000 nft_log@bf4d5000+4000 nft_limit@bf4cd000+4000 nft_fwd_netdev@bf4c5000+4000 nft_exthdr@bf4bd000+4000 nft_dup_netdev@bf4b5000+4000 nft_ct@bf4ad000+4000 nft_counter@bf4a5000+4000 nft_chain_route_ipv6@bf49d000+4000 nft_chain_route_ipv4@bf495000+4000 nf_tables_netdev@bf48d000+4000 nf_tables_ipv6@bf485000+4000 nf_tables_ipv4@bf47d000+4000 nf_tables_inet@bf475000+4000 nf_tables_bridge@bf46d000+4000 nf_tables@bf453000+14000 mac80211@bf3ae000+7e000 iptable_nat@bf3a6000+4000 ipt_REJECT@bf39e000+4000 ipt_MASQUERADE@bf396000+4000 cfg80211@bf345000+44000 xt_time@bf33d000+4000 xt_tcpudp@bf335000+4000 xt_tcpmss@bf32d000+4000 xt_statistic@bf325000+4000 xt_state@bf31d000+4000 xt_nat@bf315000+4000 xt_multiport@bf30d000+4000 xt_mark@bf305000+4000 xt_mac@bf2fd000+4000 xt_limit@bf2f5000+4000 xt_length@bf2ed000+4000 xt_hl@bf2e5000+4000 xt_ecn@bf2dd000+4000 xt_dscp@bf2d5000+4000 xt_conntrack@bf2cd000+4000 xt_comment@bf2c5000+4000 xt_TCPMSS@bf2bd000+4000 xt_REDIRECT@bf2b5000+4000 xt_LOG@bf2ad000+4000 xt_HL@bf2a5000+4000 xt_FLOWOFFLOAD@bf29d000+4000 xt_DSCP@bf295000+4000 xt_CLASSIFY@bf28d000+4000 slhc@bf286000+4000 openvswitch@bf265000+1a000 nfnetlink@bf25c000+4000 nf_reject_ipv4@bf255000+4000 nf_nat_redirect@bf24e000+4000 nf_nat_masquerade_ipv6@bf247000+4000 nf_nat_masquerade_ipv4@bf240000+4000 nf_conntrack_ipv6@bf238000+4000 nf_nat_ipv6@bf230000+4000 nf_conntrack_ipv4@bf228000+4000 nf_nat_ipv4@bf220000+4000 nf_nat@bf215000+6000 nf_log_ipv4@bf20d000+4000 nf_flow_table_hw@bf205000+4000 nf_flow_table@bf1fa000+6000 nf_dup_netdev@bf1f3000+4000 nf_defrag_ipv6@bf1eb000+4000 nf_defrag_ipv4@bf1e3000+4000 nf_conntrack_rtcache@bf1db000+4000 nf_conntrack@bf1c2000+11000 libcrc32c@bf1ba000+4000 iptable_mangle@bf1b2000+4000 iptable_filter@bf1aa000+4000 ipt_ECN@bf1a2000+4000 ip_tables@bf198000+6000 crc_ccitt@bf191000+4000 compat@bf188000+5000 ledtrig_usbport@bf180000+4000 nf_log_ipv6@bf178000+4000 nf_log_common@bf170000+4000 ip6table_mangle@bf168000+4000 ip6table_filter@bf160000+4000 ip6_tables@bf156000+6000 ip6t_REJECT@bf14e000+4000 x_tables@bf143000+6000 nf_reject_ipv6@bf13c000+4000 mpls_iptunnel@bf134000+4000 mpls_router@bf128000+7000 mpls_gso@bf120000+4000 leds_gpio@bf118000+4000 xhci_plat_hcd@bf10f000+4000 xhci_pci@bf106000+4000 xhci_hcd@bf0e7000+18000 dwc3@bf0db000+7000 dwc3_of_simple@bf0d3000+4000 ohci_platform@bf0ca000+4000 ohci_hcd@bf0bd000+8000 phy_qcom_dwc3@bf0b5000+4000 ahci@bf0ab000+6000 ehci_platform@bf0a2000+4000 sd_mod@bf094000+9000 ahci_platform@bf08c000+4000 libahci_platform@bf085000+4000 libahci@bf07a000+7000 libata@bf046000+27000 scsi_mod@bf020000+1a000 ehci_hcd@bf010000+b000 gpio_button_hotplug@bf008000+4000 crc32c_generic@bf000000+4000
<3>[ 21.105247] ath10k_pci 0002:01:00.0: DANGER! You're overriding EEPROM-defined regulatory domain
<3>[ 21.105294] ath10k_pci 0002:01:00.0: from: 0x0 to 0x348 (svc-ready-work)
<3>[ 21.112758] ath10k_pci 0002:01:00.0: Your card was not certified to operate in the domain you chose.
<3>[ 21.119792] ath10k_pci 0002:01:00.0: This might result in a violation of your local regulatory rules.
<3>[ 21.128906] ath10k_pci 0002:01:00.0: Do not ever do this unless you really know what you are doing!
<4>[ 21.139581] ath10k_pci 0002:01:00.0: 10.4 wmi init: vdevs: 8 peers: 180 tid: 450
<6>[ 21.146922] ath10k_pci 0002:01:00.0: using 7 firmware rate-ctrl objects
<4>[ 21.154531] ath10k_pci 0002:01:00.0: msdu-desc: 2200 skid: 360
<6>[ 21.237597] ath10k_pci 0002:01:00.0: wmi print 'P 180/180 V 8 K 540 PH 556 T 656 msdu-desc: 2200 sw-crypt: 0 ct-sta: 0'
<6>[ 21.238894] ath10k_pci 0002:01:00.0: wmi print 'free: 11368 iram: 8424 sram: 512'
<6>[ 21.405609] ath10k_pci 0002:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 180 raw 0 hwcrypto 1
<7>[ 21.626332] ath: EEPROM regdomain: 0x8348
<7>[ 21.626345] ath: EEPROM indicates we should expect a country code
<7>[ 21.626364] ath: doing EEPROM country->regdmn map search
<7>[ 21.626377] ath: country maps to regdmn code: 0x3a
<7>[ 21.626391] ath: Country alpha2 being used: US
<7>[ 21.626401] ath: Regpair used: 0x3a
<6>[ 21.638889] batman_adv: B.A.T.M.A.N. advanced 2020.2-openwrt-1 (compatibility version 15) loaded
<14>[ 21.641464] kmodloader: done loading kernel modules from /etc/modules.d/*
<6>[ 24.336245] Atheros 8031 ethernet gpio-0:00: attached PHY driver [Atheros 8031 ethernet] (mii_bus:phy_addr=gpio-0:00, irq=POLL)
<6>[ 24.337418] dwmac1000: Master AXI performs any burst length
<6>[ 24.346656] ipq806x-gmac-dwmac 37600000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported
<6>[ 24.352226] ipq806x-gmac-dwmac 37600000.ethernet eth1: registered PTP clock
<6>[ 24.365549] br-lan: port 1(eth1) entered blocking state
<6>[ 24.367972] br-lan: port 1(eth1) entered disabled state
<6>[ 24.374348] IPv6: ADDRCONF(NETDEV_UP): br-lan: link is not ready
<6>[ 24.506123] Atheros 8031 ethernet gpio-0:01: attached PHY driver [Atheros 8031 ethernet] (mii_bus:phy_addr=gpio-0:01, irq=POLL)
<6>[ 24.513244] dwmac1000: Master AXI performs any burst length
<6>[ 24.516426] ipq806x-gmac-dwmac 37400000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
<6>[ 24.522078] ipq806x-gmac-dwmac 37400000.ethernet eth0: registered PTP clock
<6>[ 24.534347] br-wan: port 1(eth0) entered blocking state
<6>[ 24.537849] br-wan: port 1(eth0) entered disabled state
<6>[ 24.544469] br-wan: port 1(eth0) entered blocking state
<6>[ 24.548271] br-wan: port 1(eth0) entered forwarding state
<6>[ 24.751737] 8021q: adding VLAN 0 to HW filter on device bat0
<6>[ 24.752006] br-lan: port 2(bat0) entered blocking state
<6>[ 24.756560] br-lan: port 2(bat0) entered disabled state
<6>[ 24.761574] device bat0 entered promiscuous mode
<6>[ 24.766771] device eth1 entered promiscuous mode
<6>[ 24.771559] br-lan: port 2(bat0) entered blocking state
<6>[ 24.776138] br-lan: port 2(bat0) entered forwarding state
<6>[ 24.798361] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
<6>[ 25.263219] batman_adv: bat0: No IGMP Querier present - multicast optimizations disabled
<6>[ 25.263251] batman_adv: bat0: No MLD Querier present - multicast optimizations disabled
<6>[ 25.382134] br-wan: port 1(eth0) entered disabled state
<6>[ 27.516892] ipq806x-gmac-dwmac 37600000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx
<4>[ 31.710745] ath10k_pci 0002:01:00.0: 10.4 wmi init: vdevs: 8 peers: 180 tid: 450
<6>[ 31.710774] ath10k_pci 0002:01:00.0: using 7 firmware rate-ctrl objects
<4>[ 31.717361] ath10k_pci 0002:01:00.0: msdu-desc: 2200 skid: 360
<6>[ 31.800345] ath10k_pci 0002:01:00.0: wmi print 'P 180/180 V 8 K 540 PH 556 T 656 msdu-desc: 2200 sw-crypt: 0 ct-sta: 0'
<6>[ 31.801611] ath10k_pci 0002:01:00.0: wmi print 'free: 11368 iram: 8424 sram: 512'
<4>[ 32.193725] ath10k_pci 0002:01:00.0: Firmware lacks feature flag indicating a retry limit of > 2 is OK, requested limit: 4
<6>[ 32.193942] IPv6: ADDRCONF(NETDEV_UP): client1: link is not ready
<6>[ 32.208336] br-lan: port 1(eth1) entered blocking state
<6>[ 32.209836] br-lan: port 1(eth1) entered forwarding state
<6>[ 32.237464] br-lan: port 3(client1) entered blocking state
<6>[ 32.237491] br-lan: port 3(client1) entered disabled state
<6>[ 32.242616] device client1 entered promiscuous mode
<6>[ 32.394857] br-wan: port 1(eth0) entered disabled state
<6>[ 32.457821] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
<6>[ 32.762860] ath10k_pci 0002:01:00.0: NOTE: Firmware DBGLOG output disabled in debug_mask: 0x10000000
<6>[ 32.770474] IPv6: ADDRCONF(NETDEV_CHANGE): client1: link becomes ready
<6>[ 32.771583] br-lan: port 3(client1) entered blocking state
<6>[ 32.777580] br-lan: port 3(client1) entered forwarding state
<6>[ 32.886573] Atheros 8031 ethernet gpio-0:01: attached PHY driver [Atheros 8031 ethernet] (mii_bus:phy_addr=gpio-0:01, irq=POLL)
<6>[ 32.893172] dwmac1000: Master AXI performs any burst length
<6>[ 32.897375] ipq806x-gmac-dwmac 37400000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
<6>[ 32.902530] ipq806x-gmac-dwmac 37400000.ethernet eth0: registered PTP clock
<6>[ 32.916682] br-wan: port 1(eth0) entered blocking state
<6>[ 32.918477] br-wan: port 1(eth0) entered disabled state
<6>[ 32.928463] IPv6: ADDRCONF(NETDEV_UP): br-wan: link is not ready
<4>[ 38.723594] ath10k_pci 0001:01:00.0: 10.4 wmi init: vdevs: 8 peers: 180 tid: 450
<6>[ 38.723623] ath10k_pci 0001:01:00.0: using 7 firmware rate-ctrl objects
<4>[ 38.731093] ath10k_pci 0001:01:00.0: msdu-desc: 2200 skid: 360
<6>[ 38.811599] ath10k_pci 0001:01:00.0: wmi print 'P 180/180 V 8 K 540 PH 556 T 656 msdu-desc: 2200 sw-crypt: 0 ct-sta: 0'
<6>[ 38.812877] ath10k_pci 0001:01:00.0: wmi print 'free: 11368 iram: 8424 sram: 512'
<4>[ 39.201170] ath10k_pci 0001:01:00.0: Firmware lacks feature flag indicating a retry limit of > 2 is OK, requested limit: 4
<6>[ 39.201339] IPv6: ADDRCONF(NETDEV_UP): client0: link is not ready
<6>[ 39.216274] br-lan: port 4(client0) entered blocking state
<6>[ 39.217340] br-lan: port 4(client0) entered disabled state
<6>[ 39.222873] device client0 entered promiscuous mode
<6>[ 39.528761] IPv6: ADDRCONF(NETDEV_CHANGE): client0: link becomes ready
<6>[ 39.529012] br-lan: port 4(client0) entered blocking state
<6>[ 39.534302] br-lan: port 4(client0) entered forwarding state
<6>[ 40.691860] IPv6: ADDRCONF(NETDEV_UP): mesh0: link is not ready
<6>[ 40.731037] IPv6: ADDRCONF(NETDEV_UP): mesh1: link is not ready
<6>[ 40.823371] ath10k_pci 0001:01:00.0: mac flush null vif, drop 0 queues 0xffff
<6>[ 40.898665] ath10k_pci 0002:01:00.0: mac flush null vif, drop 0 queues 0xffff
<6>[ 41.760998] br-lan: port 4(client0) entered disabled state
<6>[ 41.762528] br-lan: port 3(client1) entered disabled state
<6>[ 44.641968] br-lan: port 3(client1) entered blocking state
<6>[ 44.642010] br-lan: port 3(client1) entered forwarding state
<6>[ 44.649723] batman_adv: bat0: Adding interface: mesh1
<6>[ 44.652216] batman_adv: bat0: Interface activated: mesh1
<6>[ 44.658352] IPv6: ADDRCONF(NETDEV_CHANGE): mesh1: link becomes ready
<4>[ 44.891497] ath10k_pci 0002:01:00.0: Invalid peer id 0 or peer stats buffer, peer: cb915000 sta: (null)
<6>[ 47.250647] br-lan: port 4(client0) entered blocking state
<6>[ 47.250682] br-lan: port 4(client0) entered forwarding state
<6>[ 47.257308] batman_adv: bat0: Adding interface: mesh0
<6>[ 47.260838] batman_adv: bat0: Interface activated: mesh0
<6>[ 47.308240] IPv6: ADDRCONF(NETDEV_CHANGE): mesh0: link becomes ready
<4>[ 75.993607] NOHZ: local_softirq_pending 08
<4>[ 219.354201] NOHZ: local_softirq_pending 08
<4>[ 444.633812] NOHZ: local_softirq_pending 08
<4>[ 506.073732] NOHZ: local_softirq_pending 08
<4>[ 526.553749] NOHZ: local_softirq_pending 08
<4>[ 669.913737] NOHZ: local_softirq_pending 08
<4>[ 751.833940] NOHZ: local_softirq_pending 08
<4>[ 772.313676] NOHZ: local_softirq_pending 08
<4>[ 792.793586] NOHZ: local_softirq_pending 08
<4>[ 813.273624] NOHZ: local_softirq_pending 08
<4>[ 848.273737] ath10k_pci 0002:01:00.0: peer-unmap-event: unknown peer id 8
<4>[ 850.884255] ath10k_pci 0001:01:00.0: peer-unmap-event: unknown peer id 5
<1>[ 1263.833736] Unable to handle kernel NULL pointer dereference at virtual address 00000038
<1>[ 1263.833770] pgd = c0204000
<1>[ 1263.840947] [00000038] *pgd=00000000
<0>[ 1263.843483] Internal error: Oops: 17 [#1] SMP ARM
<4>[ 1263.847138] Modules linked in: pppoe ppp_async batman_adv ath10k_pci ath10k_core ath pppox ppp_generic nft_set_rbtree nft_set_hash nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject_bridge nft_reject nft_redir nft_quota nft_numgen nft_meta_bridge nft_meta nft_log nft_limit nft_fwd_netdev nft_exthdr nft_dup_netdev nft_ct nft_counter nft_chain_route_ipv6 nft_chain_route_ipv4 nf_tables_netdev nf_tables_ipv6 nf_tables_ipv4 nf_tables_inet nf_tables_bridge nf_tables mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CLASSIFY slhc openvswitch nfnetlink nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv6
<4>[ 1263.901905] nf_nat_masquerade_ipv4 nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_dup_netdev nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt compat ledtrig_usbport nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 mpls_iptunnel mpls_router mpls_gso leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug crc32c_generic
<4>[ 1263.961027] CPU: 0 PID: 2470 Comm: kworker/u4:2 Not tainted 4.14.187 #0
<4>[ 1263.983254] Hardware name: Generic DT based system
<4>[ 1263.989826] Workqueue: bat_events batadv_bla_periodic_work [batman_adv]
<4>[ 1263.994629] task: cbcbed00 task.stack: ca46e000
<4>[ 1264.001153] PC is at batadv_bit_get_packet+0xc4/0xf4 [batman_adv]
<4>[ 1264.005669] LR is at batadv_bit_get_packet+0xb8/0xf4 [batman_adv]
<4>[ 1264.011901] pc : [<bf5ebe30>] lr : [<bf5ebe24>] psr: 60000013
<4>[ 1264.017976] sp : ca46fec0 ip : 000000c8 fp : cd804200
<4>[ 1264.024050] r10: ccd38000 r9 : 00000007 r8 : 00000000
<4>[ 1264.029260] r7 : cb8a64c0 r6 : cb90e300 r5 : cc6f7e8c r4 : 00000000
<4>[ 1264.034471] r3 : 00000038 r2 : 00000002 r1 : 00000000 r0 : cc6f7e8c
<4>[ 1264.041069] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
<4>[ 1264.047578] Control: 10c5787d Table: 4dd7406a DAC: 00000051
<0>[ 1264.054784] Process kworker/u4:2 (pid: 2470, stack limit = 0xca46e210)
<0>[ 1264.060513] Stack: (0xca46fec0 to 0xca470000)
<0>[ 1264.066960] fec0: cc6f7e80 cb90e300 cb90e300 bf5ecbf4 cb95c0c0 00000200 c0b02d00 00000007
<0>[ 1264.071393] fee0: cb8a678c cb8a669c cb90e300 cb8a64c0 cb8a669c 00000000 00000000 00000080
<0>[ 1264.079554] ff00: cd804200 bf5ecd70 cb8a677c cb8a677c 60000013 cb8a669c cb93b080 cd804200
<0>[ 1264.087713] ff20: cbd6c600 00000000 00000000 00000080 cd804200 c0337120 cd804218 ffffe000
<0>[ 1264.095873] ff40: cb93b080 cd804200 cb93b098 cd804218 ffffe000 c0b02d00 00000088 c033761c
<0>[ 1264.104032] ff60: cc72febc ccdd6400 ca46e000 c8fe6880 cc72febc ccdd641c cb93b080 c03372d8
<0>[ 1264.112192] ff80: 00000000 c033d2bc 00000000 c8fe6880 c033d174 00000000 00000000 00000000
<0>[ 1264.120351] ffa0: 00000000 00000000 00000000 c0307d28 00000000 00000000 00000000 00000000
<0>[ 1264.128511] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<0>[ 1264.136670] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
<4>[ 1264.144917] [<bf5ebe30>] (batadv_bit_get_packet [batman_adv]) from [<bf5ecbf4>] (batadv_bla_loopdetect_report+0xa48/0xb44 [batman_adv])
<4>[ 1264.153012] [<bf5ecbf4>] (batadv_bla_loopdetect_report [batman_adv]) from [<bf5ecd70>] (batadv_bla_periodic_work+0x80/0xb04 [batman_adv])
<4>[ 1264.164984] [<bf5ecd70>] (batadv_bla_periodic_work [batman_adv]) from [<c0337120>] (process_one_work+0x28c/0x444)
<4>[ 1264.177456] [<c0337120>] (process_one_work) from [<c033761c>] (worker_thread+0x344/0x58c)
<4>[ 1264.187695] [<c033761c>] (worker_thread) from [<c033d2bc>] (kthread+0x148/0x150)
<4>[ 1264.195860] [<c033d2bc>] (kthread) from [<c0307d28>] (ret_from_fork+0x14/0x2c)
<0>[ 1264.203329] Code: eb4724e3 e5944008 e2843038 f593f000 (e1932f9f)
<4>[ 1264.210463] ---[ end trace 54361f4755dee328 ]---
===========================%
</code></pre> batman-adv - Feature #414 (New): Replace usage of word slave/masterhttps://www.open-mesh.org/issues/4142020-07-24T06:29:56ZSven Eckelmann
<p>The code uses the word "slave" in various places. These <a href="https://www.kernel.org/doc/html/v5.8-rc6/process/coding-style.html#naming" class="external">terms are considered deprecated</a> by (parts of) the kernel community.</p>
<p>I agree that there might be better words to describe the relationship of the batadv and attached (lower) devices. But the network subsystem has to be changed first to use these terms before we can switch to the new functions (and connected terminology). And due to this problem, I have disabled the checks for DEPRECATED_TERM in the daily build_test for now.</p>
<p>The ticket should be therefore worked on after the related code in net/core/rtnetlink.c was adjusted.</p> batman-adv - Bug #404 (New): KCSAN: data-race in batadv_tt_local_add / batadv_tt_local_addhttps://www.open-mesh.org/issues/4042019-11-08T14:41:24ZSven Eckelmann
<p>The new KCSAN (concurrency sanitizer) reported a problem with the TT code: <a class="external" href="https://lists.open-mesh.org/mailman3/hyperkitty/list/b.a.t.m.a.n@lists.open-mesh.org/message/Z44URGZT3NKZP5273KQEMW27WHGNJEUP/">https://lists.open-mesh.org/mailman3/hyperkitty/list/b.a.t.m.a.n@lists.open-mesh.org/message/Z44URGZT3NKZP5273KQEMW27WHGNJEUP/</a></p>
<pre>Hello,
syzbot found the following crash on:
HEAD commit: 05f22368 x86, kcsan: Enable KCSAN for x86
git tree: https://github.com/google/ktsan.git kcsan
console output: https://syzkaller.appspot.com/x/log.txt?x=1195a0d4e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=87d111955f40591f
dashboard link: https://syzkaller.appspot.com/bug?extid=1d5dadec56d9e87f0aac
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+1d5dadec56d9e87f0aac@syzkaller.appspotmail.com
==================================================================
BUG: KCSAN: data-race in batadv_tt_local_add / batadv_tt_local_add
write to 0xffff8880a8e19698 of 2 bytes by task 10064 on cpu 0:
batadv_tt_local_add+0x21b/0x1020 net/batman-adv/translation-table.c:799
batadv_interface_tx+0x398/0xae0 net/batman-adv/soft-interface.c:249
__netdev_start_xmit include/linux/netdevice.h:4420 [inline]
netdev_start_xmit include/linux/netdevice.h:4434 [inline]
xmit_one net/core/dev.c:3280 [inline]
dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
__dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
__bpf_tx_skb net/core/filter.c:2060 [inline]
__bpf_redirect_common net/core/filter.c:2099 [inline]
__bpf_redirect+0x4b4/0x710 net/core/filter.c:2106
____bpf_clone_redirect net/core/filter.c:2139 [inline]
bpf_clone_redirect+0x1a5/0x1f0 net/core/filter.c:2111
bpf_prog_bb15b996d00816f9+0x71c/0x1000
bpf_test_run+0x1c3/0x490 net/bpf/test_run.c:44
bpf_prog_test_run_skb+0x4da/0x840 net/bpf/test_run.c:310
bpf_prog_test_run kernel/bpf/syscall.c:2108 [inline]
__do_sys_bpf+0x1664/0x2b90 kernel/bpf/syscall.c:2884
__se_sys_bpf kernel/bpf/syscall.c:2825 [inline]
__x64_sys_bpf+0x4c/0x60 kernel/bpf/syscall.c:2825
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9
read to 0xffff8880a8e19698 of 2 bytes by task 9969 on cpu 1:
batadv_tt_local_add+0x3d1/0x1020 net/batman-adv/translation-table.c:801
batadv_interface_tx+0x398/0xae0 net/batman-adv/soft-interface.c:249
__netdev_start_xmit include/linux/netdevice.h:4420 [inline]
netdev_start_xmit include/linux/netdevice.h:4434 [inline]
xmit_one net/core/dev.c:3280 [inline]
dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
__dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
__bpf_tx_skb net/core/filter.c:2060 [inline]
__bpf_redirect_common net/core/filter.c:2099 [inline]
__bpf_redirect+0x4b4/0x710 net/core/filter.c:2106
____bpf_clone_redirect net/core/filter.c:2139 [inline]
bpf_clone_redirect+0x1a5/0x1f0 net/core/filter.c:2111
bpf_prog_bb15b996d00816f9+0x312/0x1000
bpf_test_run+0x1c3/0x490 net/bpf/test_run.c:44
bpf_prog_test_run_skb+0x4da/0x840 net/bpf/test_run.c:310
bpf_prog_test_run kernel/bpf/syscall.c:2108 [inline]
__do_sys_bpf+0x1664/0x2b90 kernel/bpf/syscall.c:2884
__se_sys_bpf kernel/bpf/syscall.c:2825 [inline]
__x64_sys_bpf+0x4c/0x60 kernel/bpf/syscall.c:2825
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 9969 Comm: syz-executor.2 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
==================================================================
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.</pre> batman-adv - Bug #397 (New): BATMAN_V throughput on bridge, vxlan and vethhttps://www.open-mesh.org/issues/3972019-07-29T14:51:36ZLinus Lüssinglinus.luessing@c0d3.blue
<p>For these interfaces, bridge, vxlan and veth, batman-adv currently uses the 1Mbit/s default throughput. Also see:</p>
<p><a class="external" href="https://github.com/freifunk-gluon/gluon/issues/1728">https://github.com/freifunk-gluon/gluon/issues/1728</a></p>
<p>For vxlan Matthias is currently working on a patch to inherit the properties from its parent device (similar to what vlan does).</p>
<p>For veth ethtool reports 10Gbit/s, which is way more reasonable value for an in-kernel connection than our 1MBit/s default value:</p>
<pre><code>$ ethtool veth0
Settings for veth0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
MDI-X: Unknown
Link detected: yes
</code></pre>
<p>However batman-adv uses the default 1MBit/s throughput value due to auto-negotiation being disabled. We could add an exception in batman-adv for veth to disregard the auto-negotiation property, however that would not be sufficient for applications with for instance v(x)lans stacked on top of veth.</p>
<p>For bridge interfaces it is even more tricky.</p> batman-adv - Feature #365 (New): Support Jumbo frames via batman-advhttps://www.open-mesh.org/issues/3652018-11-17T16:03:43ZSven Eckelmann
<p>The batadv interface is currently limited to 1500 bytes. There are two reasons why this happens:</p>
<ul>
<li>batadv_softif_init_early doesn't set max_mtu to 0
<ul>
<li>required after Linux 4.10
<ul>
<li><a class="external" href="https://patchwork.ozlabs.org/project/netdev/patch/20161008020434.9691-2-jarod@redhat.com/">https://patchwork.ozlabs.org/project/netdev/patch/20161008020434.9691-2-jarod@redhat.com/</a></li>
<li><a class="external" href="https://patchwork.ozlabs.org/project/netdev/patch/20161008020434.9691-3-jarod@redhat.com/">https://patchwork.ozlabs.org/project/netdev/patch/20161008020434.9691-3-jarod@redhat.com/</a></li>
<li><a class="external" href="https://patchwork.ozlabs.org/project/netdev/patch/20161020175524.6184-8-jarod@redhat.com/">https://patchwork.ozlabs.org/project/netdev/patch/20161020175524.6184-8-jarod@redhat.com/</a></li>
</ul>
</li>
</ul>
</li>
<li>batadv_hardif_min_mtu limits it to ETH_DATA_LEN (reason unknown)
<ul>
<li><pre><code class="c syntaxhl" data-language="c"> <span class="cm">/* the real soft-interface MTU is computed by removing the payload
* overhead from the maximum amount of bytes that was just computed.
*
* However batman-adv does not support MTUs bigger than ETH_DATA_LEN
*/</span>
<span class="k">return</span> <span class="nf">min_t</span><span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="n">min_mtu</span> <span class="o">-</span> <span class="n">batadv_max_header_len</span><span class="p">(),</span> <span class="n">ETH_DATA_LEN</span><span class="p">);</span>
</code></pre></li>
</ul></li>
</ul>
<p>It has to be checked why this limit was added in the first place, checked whether it can be removed now and then these two functions have to be modified. For kernels < 4.10, an appropriate compat helper has to be added to compat.h.</p> batman-adv - Bug #363 (New): Broadcast ELP smaller than specified in documentionhttps://www.open-mesh.org/issues/3632018-08-31T10:33:46ZSven Eckelmann
<p>Commit a4b88af77e28 ("batman-adv: ELP - adding basic infrastructure") added the ELP broadcast code. It transmits 16 byte ELP packets + 14 byte ethernet header as broadcast to announce itself. The actual <a class="wiki-page" href="https://www.open-mesh.org/projects/batman-adv/wiki/ELP#section-9">specification</a> talks about extra padding to increase the size significantly (300 bytes).</p>
<p>Either the code or the documentation has to be adjusted</p> batman-adv - Bug #351 (New): Issues with batadv_gw_out_of_rangehttps://www.open-mesh.org/issues/3512018-03-13T09:25:42ZLinus Lüssinglinus.luessing@c0d3.blue
<p>I'm getting the impression that batadv_gw_out_of_range() is broken or even never worked as intended. gw_out_of_range() is only called if DHCP_TO_SERVER is set in interface_tx(). which is only set to DHCP_TO_SERVER in the is_multicast_ether_addr(ethhdr->h_dest) branch in interface_tx(). However, the kerneldoc for gw_out_of_range() says that for multicast destinations it should always return false which means, DHCP packets to a server would never get dropped in interface_tx() due to being "out-of-range". So clients might have been more sticky to dhcp servers than they should have.</p>
<p>And now with multicast TT entries things might get worse... I think there might be DHCPv4 packetloss if some node were to claim FF:FF:FF:FF:FF:FF via TT (the current multicast code does not announce this. however, a broken or malicious node might). And for DHCPv6, the multicast code will currently announce 33:33:00:01:00:02/33:33:00:01:00:03 so that, DHCPv6, might have become broken with the added multicast code, I suspect.</p> batman-adv - Feature #339 (New): Make "batctl log" usable with network namespaceshttps://www.open-mesh.org/issues/3392017-07-13T03:09:55ZLinus Lüssinglinus.luessing@c0d3.blue
<p>Currently, this fails as the socket is only available via debugfs right now. And for debugfs we have no namespace support.</p> batman-adv - Feature #291 (New): Reduce DAT Cache misseshttps://www.open-mesh.org/issues/2912016-07-11T08:35:39ZLinus Lüssinglinus.luessing@c0d3.blue
<p>While the overall ARP overhead is greatly reduced, we generally still have many ARP Requests from gateway nodes / routers. In a 1000 node setup this is about 30kbit/s.</p>
<p>In a minimal setup with just two hosts (Linux 4.6-rc6, no batman-adv involved), one being a DHCP server, the other one a DHCP client, as well as one persistent TCP connection between them, I noticed that ARP packets are sent rarely. This seems to break the initial assumption, that at least one ARP exchange would take place during the 5min. DAT cache timeout.</p>
<p>In the test setup, during a ~37000 seconds (10h) interval, these were the only ARP packets showing up:</p>
<pre>
5 106.241867 02:04:64:a4:39:d3 -> ff:ff:ff:ff:ff:ff ARP 60 Who has 192.168.123.1? Tell 192.168.123.50
6 106.241958 02:04:64:a4:39:f2 -> 02:04:64:a4:39:d3 ARP 42 192.168.123.1 is at 02:04:64:a4:39:f2
14 111.246595 02:04:64:a4:39:f2 -> 02:04:64:a4:39:d3 ARP 42 Who has 192.168.123.50? Tell 192.168.123.1
15 111.247439 02:04:64:a4:39:d3 -> 02:04:64:a4:39:f2 ARP 60 192.168.123.50 is at 02:04:64:a4:39:d3
2092 5217.550877 02:04:64:a4:39:d3 -> 02:04:64:a4:39:f2 ARP 60 Who has 192.168.123.1? Tell 192.168.123.50
2093 5217.550911 02:04:64:a4:39:f2 -> 02:04:64:a4:39:d3 ARP 42 192.168.123.1 is at 02:04:64:a4:39:f2
</pre>
<p>Which would of course be insufficient to keep the DAT Cache fully up to date during the time a client is connected.</p> batman-adv - Feature #206 (New): Distributed IPv6-NDP cache to reduce overhead https://www.open-mesh.org/issues/2062015-03-12T15:46:18ZRuben Kelevracyrond@gmail.com
<p>Currently the Neighbor Discovery Protocol does takes much air-time and idle-bandwidth because of the broadcasts which are send thru the network.</p>
<p>It would be nice if the querys could be stored on the nodes, distributed, to use some of ram of the nodes usefully and reduce network overhead.</p>
<p>One possible solution would be:</p>
<ul>
<li>If an IPv6 is queryed by the local client, the node make three hashes and match them to the nearest mac-address of other nodes, and query them.</li>
<li>* If they all send NX do send the query as normal broadcast.</li>
<li>* * If the broadcast get an answer, send an update to the three nodes.</li>
<li>* If they does not return any answers for more than 20 seconds, do a normal broadcast. (redo querys for each Neighbor-Discovery-Query the node get)</li>
<li>If a node get no query for 2h, delete the entry.</li>
<li>If a node get more than $StoreLimit entrys, delete the oldest one.</li>
</ul>