Open Mesh: Issues
https://www.open-mesh.org/
https://www.open-mesh.org/favicon.ico?1669909042
2015-08-20T11:08:48Z
Open Mesh
Redmine
batman-adv - Bug #223 (Closed): Kernel Crash when using more than one interface in bat0
https://www.open-mesh.org/issues/223
2015-08-20T11:08:48Z
Simon Wunderlich
sw@simonwunderlich.de
<p>Adding this issue from the mailing list:</p>
<p><a class="external" href="https://lists.open-mesh.org/mailman3/hyperkitty/list/b.a.t.m.a.n@lists.open-mesh.org/message/3PN6NDV6JXKKX7CXWMRM2O3N2NKFOE2D/">https://lists.open-mesh.org/mailman3/hyperkitty/list/b.a.t.m.a.n@lists.open-mesh.org/message/3PN6NDV6JXKKX7CXWMRM2O3N2NKFOE2D/</a></p>
<pre>
[ 879.532837] BUG: unable to handle kernel paging request at 0000000100022d60
[ 879.532863] IP: [<ffffffffa04beaa5>] batadv_frag_clear_chain+0x55/0x90 [batman_adv]
[ 879.532891] PGD 0
[ 879.532900] Oops: 0002 [#1] SMP
[ 879.532911] Modules linked in: ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables tun bridge stp llc batman_adv(O) crc32c_generic libcrc32c ip_gre ip_tunnel gre evdev kvm_amd amd64_edac_mod kvm edac_mce_amd tpm_infineon radeon ttm drm_kms_helper pcspkr drm i2c_algo_bit edac_core k10temp shpchp sp5100_tco i2c_piix4 i2c_core tpm_tis tpm button acpi_cpufreq processor thermal_sys autofs4 ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic ohci_pci pata_atiixp ahci libahci ehci_pci ohci_hcd ehci_hcd libata scsi_mod tg3 ptp pps_core libphy usbcore usb_common
[ 879.533106] CPU: 1 PID: 4215 Comm: kworker/u8:0 Tainted: G O 3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1
[ 879.533122] Hardware name: HP ProLiant MicroServer, BIOS O41 07/29/2011
[ 879.533143] Workqueue: bat_events batadv_purge_orig [batman_adv]
[ 879.533155] task: ffff8800d2ff8ae0 ti: ffff8800d7980000 task.ti: ffff8800d7980000
[ 879.533167] RIP: 0010:[<ffffffffa04beaa5>] [<ffffffffa04beaa5>] batadv_frag_clear_chain+0x55/0x90 [batman_adv]
[ 879.533196] RSP: 0018:ffff8800d7983d78 EFLAGS: 00010206
[ 879.533208] RAX: 0000000100022d60 RBX: ffff8800d2a43ce0 RCX: 0000000000000357
[ 879.533221] RDX: 0000000100023599 RSI: ffffffffa04c4440 RDI: ffff8800d2f92ce8
[ 879.533234] RBP: 00005e1c0e1f72c6 R08: 00000000000000c3 R09: 0000000000000101
[ 879.533247] R10: 0000000000002b67 R11: 03fffffffe062e74 R12: ffff8800d2f92ce8
[ 879.533260] R13: ffffffffa04c4440 R14: 0000000000000000 R15: ffff8800d3601940
[ 879.533274] FS: 00007f4c7795b700(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
[ 879.533290] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 879.533302] CR2: 0000000100022d60 CR3: 00000000d2de7000 CR4: 00000000000007e0
[ 879.533314] Stack:
[ 879.533322] ffff8800d2f92cf0 0000000000000008 ffffffffa04beb23 0000000000000000
[ 879.533344] ffff8800d3601968 ffff8800d2f92c00 0000000000000001 ffffffffa04c5144
[ 879.533366] 0000000000000000 ffff8800d2f92cb0 ffff8800d2fec8c0 0000033501000000
[ 879.533387] Call Trace:
[ 879.533407] [<ffffffffa04beb23>] ? batadv_frag_purge_orig+0x43/0x70 [batman_adv]
[ 879.533433] [<ffffffffa04c5144>] ? _batadv_purge_orig+0x294/0x470 [batman_adv]
[ 879.533458] [<ffffffffa04c5335>] ? batadv_purge_orig+0x15/0x40 [batman_adv]
[ 879.533475] [<ffffffff81081692>] ? process_one_work+0x172/0x420
[ 879.533490] [<ffffffff81081d23>] ? worker_thread+0x113/0x4f0
[ 879.533505] [<ffffffff8150d921>] ? __schedule+0x2b1/0x710
[ 879.533519] [<ffffffff81081c10>] ? rescuer_thread+0x2d0/0x2d0
[ 879.533534] [<ffffffff81087fad>] ? kthread+0xbd/0xe0
[ 879.533550] [<ffffffff81087ef0>] ? kthread_create_on_node+0x180/0x180
[ 879.533565] [<ffffffff81511518>] ? ret_from_fork+0x58/0x90
[ 879.533580] [<ffffffff81087ef0>] ? kthread_create_on_node+0x180/0x180
[ 879.533592] Code: 48 89 03 48 b8 00 02 20 00 00 00 ad de 48 89 43 08 e8 f0 e6 f4 e0 48 89 df 48 89 eb e8 75 f7 cc e0 48 8b 2b 48 8b 43 08 48 85 ed <48> 89 28 75 be 48 8b 7b 10 48 b8 00 01 10 00 00 00 ad de 48 89
[ 879.533729] RIP [<ffffffffa04beaa5>] batadv_frag_clear_chain+0x55/0x90 [batman_adv]
[ 879.533754] RSP <ffff8800d7983d78>
[ 879.533763] CR2: 0000000100022d60
</pre>
<p>It looks like there are problems in the fragmentation implementation</p>
batman-adv - Bug #136 (Closed): check hash_add fail
https://www.open-mesh.org/issues/136
2009-12-30T01:30:13Z
Simon Wunderlich
sw@simonwunderlich.de
<p>if hash_add fails (e.g. because kmalloc() for the bucket fails), references to data to be hashed may be lost. This might lead to memory leaks. There are still some cases for this behaviour, e.g. in translation-table.c (current revision: r1518)</p>
batman-adv - Bug #135 (Closed): redzone problem (rev 1517)
https://www.open-mesh.org/issues/135
2009-12-30T00:50:10Z
Simon Wunderlich
sw@simonwunderlich.de
<p>i had 2 qemu openwrt instances, set the orig interval to 50 and adding (not existant) interfaces as stresstest with this script:</p>
<pre>
for i in $(seq 0 100); do echo eth$i > /proc/net/batman-adv/interfaces ; done
</pre>
<p>i've repeated this a few times and cleared the interface list again with "echo > /proc/net/batman-adv/interfaces".</p>
<p>suddenly, i got this error message:</p>
<pre>
=============================================================================
BUG kmalloc-16: Redzone overwritten
-----------------------------------------------------------------------------
INFO: 0xc13762d0-0xc13762d3. First byte 0x98 instead of 0xcc
INFO: Allocated in get_orig_node+0xeb/0x1c9 [batman_adv] age=1 cpu=0 pid=515
INFO: Freed in hardif_add_interface+0x23d/0x3e0 [batman_adv] age=321 cpu=0 pid=515
INFO: Slab 0xc1975ec0 objects=64 used=12 fp=0xc1376300 flags=0x400000c3
INFO: Object 0xc13762c0 @offset=704 fp=0x826ec601
Bytes b4 0xc13762b0: 03 02 00 00 1f 0d 02 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
Object 0xc13762c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Redzone 0xc13762d0: 98 99 99 99 ....
Padding 0xc13762f8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Pid: 515, comm: ash Not tainted 2.6.31.1 #53
Call Trace:
[<c1376010>] ? mtrr_trim_uncached_memory+0x270/0x530
[<c1376010>] ? mtrr_trim_uncached_memory+0x270/0x530
[<c10a1f12>] print_trailer+0x122/0x130
[<c13762c0>] ? mtrr_trim_uncached_memory+0x520/0x530
[<c13762c0>] ? mtrr_trim_uncached_memory+0x520/0x530
[<c13762d4>] ? amd_init_mtrr+0x4/0x20
[<c13762d3>] ? amd_init_mtrr+0x3/0x20
[<c13762d0>] ? amd_init_mtrr+0x0/0x20
[<c10a1fc2>] check_bytes_and_report+0xa2/0xd0
[<c13762d0>] ? amd_init_mtrr+0x0/0x20
[<c13762d3>] ? amd_init_mtrr+0x3/0x20
[<c13762c0>] ? mtrr_trim_uncached_memory+0x520/0x530
[<c13762d0>] ? amd_init_mtrr+0x0/0x20
[<c13762c0>] ? mtrr_trim_uncached_memory+0x520/0x530
[<c10a22a4>] check_object+0x54/0x200
[<c13762d0>] ? amd_init_mtrr+0x0/0x20
[<c13762c0>] ? mtrr_trim_uncached_memory+0x520/0x530
[<c10a29c7>] +slab_free+0x177/0x2d0
[<c10a2ef9>] kfree+0x119/0x150
[<c2af557e>] ? hardif_add_interface+0x1fe/0x3e0 [batman_adv]
[<c13762c0>] ? mtrr_trim_uncached_memory+0x520/0x530
[<c2af557e>] ? hardif_add_interface+0x1fe/0x3e0 [batman_adv]
[<c13762d8>] ? amd_init_mtrr+0x8/0x20
[<c2af557e>] hardif_add_interface+0x1fe/0x3e0 [batman_adv]
[<c2af0035>] setup_procfs+0x7b5/0xcf0 [batman_adv]
[<c2aeffb0>] ? setup_procfs+0x730/0xcf0 [batman_adv]
[<c2aeff10>] ? setup_procfs+0x690/0xcf0 [batman_adv]
[<c10e1d0f>] proc_reg_write+0x6f/0x90
[<c10aa6fe>] vfs_write+0x9e/0x120
[<c10e1ca0>] ? proc_reg_write+0x0/0x90
[<c10aac52>] sys_write+0x42/0x70
[<c1003259>] syscall_call+0x7/0xb
FIX kmalloc-16: Restoring 0xc13762d0-0xc13762d3=0xcc
</pre>
batman-adv - Bug #134 (Closed): race lockup
https://www.open-mesh.org/issues/134
2009-12-29T23:23:09Z
Simon Wunderlich
sw@simonwunderlich.de
<p>These are probably multiple things in one. :)</p>
<p>revision used was 1517.</p>
<p>Directly after startup, i got:</p>
<pre>
BUG: unable to handle kernel NULL pointer dereference at 00000014
IP: [<c2af37db>] bit_shift+0x3b/0x90 [batman_adv]
*pde = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/kernel/uevent_seqnum
Modules linked in: via_velocity via_rhine tg3 sis900 r8169 pcnet32 ne2k_pci 8390 e1000 e100 batman_adv 8139too 3c59x nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppox libphy ipt_REJECT xt_TCPMSS ipt_LOG xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc natsemi crc_ccitt ipv6
Pid: 619, comm: bat_events Not tainted (2.6.31.1 #50)
EIP: 0060:[<c2af37db>] EFLAGS: 00010046 CPU: 0
EIP is at bit_shift+0x3b/0x90 [batman_adv]
EAX: 00000010 EBX: 00000000 ECX: 00000001 EDX: 00000000
ESI: 00000001 EDI: 0000001f EBP: c0ed7e7c ESP: c0ed7e64
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process bat_events (pid: 619, ti=c0ed7000 task=c1f8a440 task.ti=c0ed7000)
Stack:
00000010 00000000 00000014 00000000 00000010 00000010 c0ed7e8c c2af3897
<0> 00000010 c0fe2c60 c0ed7eb8 c2af1826 c1d04770 00000286 00000014 c0d0c150
<0> 00000000 c09bad70 c09c4640 c1d04770 c09c4654 c0ed7eec c2af09f9 c09ba000
Call Trace:
[<c2af3897>] ? bit_get_packet+0x67/0xa0 [batman_adv]
[<c2af1826>] ? slide_own_bcast_window+0x66/0xb0 [batman_adv]
[<c2af09f9>] ? schedule_own_packet+0xc9/0x140 [batman_adv]
[<c2af0dc8>] ? send_outstanding_bat_packet+0x1f8/0x210 [batman_adv]
[<c2af0c8d>] ? send_outstanding_bat_packet+0xbd/0x210 [batman_adv]
[<c1050031>] ? tick_notify+0x21/0x320
[<c1042326>] ? worker_thread+0x186/0x270
[<c10422c2>] ? worker_thread+0x122/0x270
[<c2af0bd0>] ? send_outstanding_bat_packet+0x0/0x210 [batman_adv]
[<c1045c00>] ? autoremove_wake_function+0x0/0x50
[<c10421a0>] ? worker_thread+0x0/0x270
[<c1045b1a>] ? kthread+0x7a/0x90
[<c1045aa0>] ? kthread+0x0/0x90
[<c1003d37>] ? kernel_thread_helper+0x7/0x10
Code: 89 45 e8 7e 70 89 d6 bf 20 00 00 00 83 e6 1f 83 c0 04 c1 ea 05 bb 01 00 00 00 29 f7 89 55 ec 89 45 f0 eb 1e 8b 45 e8 89 f1 31 db <8b> 50 04 d3 e2 8b 4d f0 8b 41 fc 89 f9 d3 e8 01 c2 8b 45 e8 89
EIP: [<c2af37db>] bit_shift+0x3b/0x90 [batman_adv] SS:ESP 0068:c0ed7e64
CR2: 0000000000000014
---[ end trace a88f2a8ad302170c ]---
BUG: spinlock lockup on CPU#0, bat_events/619, c2af9700
Pid: 619, comm: bat_events Tainted: G D 2.6.31.1 #50
Call Trace:
[<c113c66b>] _raw_spin_lock+0x11b/0x150
[<c12515cb>] _spin_lock_irqsave+0x4b/0x60
[<c2af195c>] ? recv_bat_packet+0x4c/0xa0 [batman_adv]
[<c2af195c>] recv_bat_packet+0x4c/0xa0 [batman_adv]
[<c2af5242>] batman_skb_recv+0xa2/0x130 [batman_adv]
[<c2af51a0>] ? batman_skb_recv+0x0/0x130 [batman_adv]
[<c11df851>] netif_receive_skb+0x461/0x4c0
[<c11df530>] ? netif_receive_skb+0x140/0x4c0
[<c2b3df48>] e1000_reinit_locked+0x2a78/0x2e10 [e1000]
[<c102441e>] ? enqueue_task_fair+0xbe/0xd0
[<c2b3dbb0>] ? e1000_reinit_locked+0x26e0/0x2e10 [e1000]
[<c2b3cfbf>] e1000_reinit_locked+0x1aef/0x2e10 [e1000]
[<c1065909>] ? +rcu_process_callbacks+0x119/0x200
[<c1039d52>] ? run_timer_softirq+0x22/0x1d0
[<c11e27c8>] net_rx_action+0x78/0x160
[<c10359d8>] +do_softirq+0xd8/0x1c0
[<c1035900>] ? +do_softirq+0x0/0x1c0
<IRQ> [<c10358ea>] ? irq_exit+0x3a/0x50
[<c1251af5>] ? do_IRQ+0xa5/0xc0
[<c1048ac8>] ? lock_hrtimer_base+0x28/0x60
[<c100386e>] ? common_interrupt+0x2e/0x34
[<c1251549>] ? _spin_unlock_irq+0x29/0x30
[<c1060c4f>] ? acct_collect+0x13f/0x160
[<c1032c08>] ? do_exit+0x148/0x590
[<c1030b10>] ? printk+0x20/0x30
[<c102fa5f>] ? print_oops_end_marker+0x2f/0x40
[<c1006ce5>] ? oops_end+0xa5/0xc0
[<c1030b10>] ? printk+0x20/0x30
[<c101b265>] ? no_context+0x135/0x150
[<c10a7c88>] ? create_object+0x28/0x230
[<c101b49c>] ? +bad_area_nosemaphore+0x15c/0x170
[<c105c66d>] ? is_module_text_address+0xd/0x20
[<c104388b>] ? +kernel_text_address+0x1b/0x40
[<c101b557>] ? bad_area_nosemaphore+0x17/0x20
[<c101b74c>] ? do_page_fault+0x10c/0x240
[<c101b640>] ? do_page_fault+0x0/0x240
[<c1251893>] ? error_code+0x6b/0x70
[<c101b640>] ? do_page_fault+0x0/0x240
[<c2af37db>] ? bit_shift+0x3b/0x90 [batman_adv]
[<c2af3897>] ? bit_get_packet+0x67/0xa0 [batman_adv]
[<c2af1826>] ? slide_own_bcast_window+0x66/0xb0 [batman_adv]
[<c2af09f9>] ? schedule_own_packet+0xc9/0x140 [batman_adv]
[<c2af0dc8>] ? send_outstanding_bat_packet+0x1f8/0x210 [batman_adv]
[<c2af0c8d>] ? send_outstanding_bat_packet+0xbd/0x210 [batman_adv]
[<c1050031>] ? tick_notify+0x21/0x320
[<c1042326>] ? worker_thread+0x186/0x270
[<c10422c2>] ? worker_thread+0x122/0x270
[<c2af0bd0>] ? send_outstanding_bat_packet+0x0/0x210 [batman_adv]
[<c1045c00>] ? autoremove_wake_function+0x0/0x50
[<c10421a0>] ? worker_thread+0x0/0x270
[<c1045b1a>] ? kthread+0x7a/0x90
[<c1045aa0>] ? kthread+0x0/0x90
[<c1003d37>] ? kernel_thread_helper+0x7/0x10
</pre>
batman-adv - Bug #130 (Closed): batman-adv lock-up
https://www.open-mesh.org/issues/130
2009-08-19T19:30:22Z
Simon Wunderlich
sw@simonwunderlich.de
<p>Marek found this bug while testing:</p>
<pre>
BUG: spinlock lockup on CPU#0, bat_events/516, c0200280
Call Trace:
[<80048f34>] dump_stack+0x8/0x34
[<80154700>] _raw_spin_lock+0x104/0x12c
[<80223350>] _spin_lock+0x64/0x80
[<c01f4420>] purge_orig+0x30/0x2f4 [batman_adv]
[<80078098>] run_workqueue+0x170/0x270
[<80078e48>] worker_thread+0xac/0xd4
[<8007c918>] kthread+0x58/0xa0
[<800455d4>] kernel_thread_helper+0x10/0x18
</pre>
<p>this seems to be present in r1397. It rarely occurs ~1 second after loading the module.</p>
<p>Seems the spinlock is held elsewhere.</p>
<p>Another similar occurence has been found here:</p>
<pre>
BUG: spinlock lockup on CPU#0, bat_events/1346, c01ea280
Call Trace:
[<80048f34>] dump_stack+0x8/0x34
[<80154d20>] _raw_spin_lock+0x104/0x12c
[<80223970>] _spin_lock+0x64/0x80
[<c01e1fa0>] purge_vis_packets+0x138/0x83c [batman_adv]
</pre>
batman-adv - Bug #124 (Closed): batman-adv 0.2-alpha: possible[tm] regression in packet aggregation
https://www.open-mesh.org/issues/124
2009-02-13T17:46:06Z
Simon Wunderlich
sw@simonwunderlich.de
<p>Assume we have 2 "threads":</p>
<a name="Thread1"></a>
<h2 >Thread1:<a href="#Thread1" class="wiki-anchor">¶</a></h2>
<p>add_packet_list()</p>
<p>-> set_outstanding_packets_timer()</p>
<p>calls spin_trylock(&packets_timer_lock), acquires the lock, and calls</p>
<p>-> cancel_delayed_work_sync(&send_outstanding_packets_wq);</p>
<a name="Thread-2"></a>
<h2 >Thread 2:<a href="#Thread-2" class="wiki-anchor">¶</a></h2>
<p>Workqueue threads, calls</p>
<p>-> set_oustanding_packets()</p>
<a name="Assumption"></a>
<h2 >Assumption:<a href="#Assumption" class="wiki-anchor">¶</a></h2>
<p>Thread 2 jumps between the spin_trylock() and cancel_delayed_work_sync(). Then Thread 1 holds, Thread 2 tries to acquire it and waits. Thread 1 then waits for Thread 2 to complete within cancel_delayed_work_sync(). This is a deadlock/lifelock.</p>
<p>Its not very likely that it happens, but it might happen, so i'm opening this ticket. ;)</p>
batman-adv - Feature #123 (Closed): batman-adv: broadcast vs unicast
https://www.open-mesh.org/issues/123
2009-01-25T14:22:40Z
Simon Wunderlich
sw@simonwunderlich.de
<p>Unlike Unicast-packets, which have an ARQ-mechanism on the 802.11 layer to retransmit packets packets, broadcast-packets are only sent once. Consequently the unicast packet loss might be very low and the network will appear will pretty stable, but broadcast-packets (like DHCP, ARP) will hardly reach their destination in worst case scenarios.</p>
<p>Options are for example to just send the broadcast packets multiple times (batman nodes will notice and drop the duplicates in any way) to increase the probability that the broadcast packets are received.</p>
<p>(this is just a reminder for ourselves to fix this problem ;)</p>