Project

General

Profile

Bug #183

Batman-adv 2014.1 does not close handle on interface

Added by Ruben Kelevra over 7 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
Start date:
04/19/2014
Due date:
% Done:

0%

Estimated time:

Description

Were using fastd here, for tunneling over the internet the batman-adv packets.

Since we try to update to 2014.1 we installed a server, and batman-adv on ArchLinux running Kernel 3.14.1.

Batman-adv is compiled from source via AUR1.

On start of fastd we do run

batctl -m mesh-gro if add ffgro-mesh-vpn
ip link set up dev mesh-gro
brctl addif freifunk-gro mesh-gro

pre-down we run:
brctl delif freifunk-gro mesh-gro
batctl -m mesh-gro if del ffgro-mesh-vpn

The kill of fastd is delayed much with the kernel-message

unregister_netdevice: waiting for ffgro-mesh-vpn to become free. Usage count = 1

I try a downgrade to 2013.4, no such problem occour there.

[1] https://aur.archlinux.org/packages/batman-adv/

History

#1

Updated by Antonio Quartulli over 7 years ago

Hi ruben,
This is a known problem on 2014.1.0 for which we have a fix that is not yet part of any release.

You can get the fix either by cloning the maint branch of our git repo, or by using the openwrt-routing feed (but since you are not compiling batman-adv on openwrt, I think that the second option is not applicable).

#2

Updated by Ruben Kelevra over 7 years ago

I could send the maintainer of my package a patch file.

Can you provide a link or the checksum of the comit which fixes this?

Thanks in advance

Best regards Ruben

#3

Updated by Ruben Kelevra about 7 years ago

The commit 548c938bfef2dfad63d999fc9005f1d387e3d15a look like it would fix this problem, but its still there, so which commit does fix it?

Best regards

Ruben

#4

Updated by Antonio Quartulli about 7 years ago

Actually there are more than one commit fixing this problem. Have you tried using the maint branch? That will contain all the needed patches.

Otherwise you can also wait a few days for the new batman-adv-2014.2.0 which will contain all the fixes as well.

#5

Updated by Ruben Kelevra about 7 years ago

I think I will wait these days since only reboots affected by this bug in
our setup.

Best regards

Ruben

#6

Updated by André Gaul about 7 years ago

We're also affected here at Freifunk Berlin with batman-adv-2014.2.0 on openwrt. So, unfortunately, this issue is not fixed by the release mentioned in comment 4.

I experienced the problem after ~2 days of uptime when the wifi device (one of batman's transport interfaces) suddenly became unable to send. After issuing /sbin/wifi the following appears in the log and the last lines are repeated for a long time (forever?).

[213108.520000] device wlan0-1 left promiscuous mode
[213108.520000] br-batmesh: port 2(wlan0-1) entered disabled state
[213108.700000] batman_adv: bat0: Interface deactivated: wlan0-adhoc-5
[213108.970000] batman_adv: bat0: Removing interface: wlan0-adhoc-5
[213119.130000] unregister_netdevice: waiting for wlan0-adhoc-5 to become free. Usage count = 1
[213129.270000] unregister_netdevice: waiting for wlan0-adhoc-5 to become free. Usage count = 1
[213139.410000] unregister_netdevice: waiting for wlan0-adhoc-5 to become free. Usage count = 1
...

Seems like rebooting is the only option when this happens. Please let me know if and how I can help to investigate the problem.

#7

Updated by Marek Lindner about 7 years ago

2014.2.0 has not officially been released yet. How did you test this version ?

#8

Updated by André Gaul about 7 years ago

Oh, there is a tag for this version (2014.2.0), cf. https://git.open-mesh.org/batman-adv.git/tag/32c8319fb5590f12fbac0364d9a36fb2766c1b1d . That's what I used for testing. Does this version contain the mentioned fixes for this bug?

#9

Updated by Antonio Quartulli about 7 years ago

Yes, it does.
However we know that this problem is not 100% solved, but we are not able to reproduce it anymore, so we haven't been able to investigate it any further. Do you know how to consistently trigger this issue with 2014.2.0 ?

#10

Updated by André Gaul about 7 years ago

Thanks for the clarification! :) By now I don't know how to reproduce this consistently but I just observed the behavior this night on one of my openwrt routers again (see the dmesg output). The router runs openwrt trunk (r40839) and runs olsrd as well as batman-adv on multiple interfaces (wifi and eth, see the full configuration). All transport interfaces of batman-adv seem to be unusable after this happens.

I'll setup another router with only batman-adv and I'll investigate how I can reproduce the bug.

#11

Updated by Ruben Kelevra about 7 years ago

In my setup this problem seems fixed.

#12

Updated by André Gaul about 7 years ago

Hey Ruben, what have you changed since then?

#13

Updated by Ruben Kelevra about 7 years ago

Updated from 2014.1 to 2014.2

#14

Updated by Marek Lindner about 7 years ago

Can you reliably reproduce the issue on the latest batman-adv ? If so, we can supply some debug patches to track it down.

#15

Updated by Marek Lindner over 6 years ago

Any update or shall we close the ticket ?

#16

Updated by Ruben Kelevra over 6 years ago

I'm sorry, never read your latest response 7 month ago. The issue was completely fixed in 2014.2 in my setup! Good work, thank your for your time and effort! (:

#17

Updated by Marek Lindner over 6 years ago

  • Status changed from New to Closed

Ok, thanks for the info!

#18

Updated by Bjoern Franke about 6 years ago

I have a similar issue with 2015.0 running Kernel 4.0.0-1 under Debian Jessie.
[93792.512073] unregister_netdevice: waiting for bat0 to become free. Usage count = 3
[93797.393717] unregister_netdevice: waiting for fftransit to become free. Usage count = 1

#19

Updated by Ruben Kelevra about 6 years ago

Does this issue still occur with 4.0.4 Kernel? This is the one we run atm.

#20

Updated by Milan Pässler about 6 years ago

I have this issue with kernel version 4.0.7 and batman-adv v2015.0 from the AUR.

#21

Updated by Ruben Kelevra about 6 years ago

Since it does not occur here anymore, it might be setup-related. Please provide a "batctl if" an "ip link" and an "brctl show" dump as well as the setup-scripts.

#22

Updated by Sven Eckelmann about 6 years ago

Please first try the maint branch (it is 2015.0 + bug fixes at the moment). Especially commit:3c92b633715b7eca80dc7a2347e0e4dbcce1f018 ("batman-adv: initialize up/down values when adding a gateway").

Btw. this ticket is closed since 5 months.

#23

Updated by Ruben Kelevra almost 6 years ago

Sven Eckelmann wrote:

Btw. this ticket is closed since 5 months.

Thanks for the hint, so please reopen it. We cant.

Please first try the maint branch (it is 2015.0 + bug fixes at the moment). Especially commit:3c92b633715b7eca80dc7a2347e0e4dbcce1f018 ("batman-adv: initialize up/down values when adding a gateway").

I don't think this fix does fix the issue, since we do not have this issues on our gateway, and others do I think we have fixed it with some config-changes. So we need the configs or documentations from Milan.

Best regards

Ruben

#24

Updated by Marek Lindner almost 6 years ago

Ruben Kelevra wrote:

Please first try the maint branch (it is 2015.0 + bug fixes at the moment). Especially commit:3c92b633715b7eca80dc7a2347e0e4dbcce1f018 ("batman-adv: initialize up/down values when adding a gateway").

I don't think this fix does fix the issue, since we do not have this issues on our gateway, and others do I think we have fixed it with some config-changes. So we need the configs or documentations from Milan.

The point was: Try the latest version. Nobody here as any spare time to waste on hunting an already fixed bug.

Feel free to to open a new ticket with the latest stable version tested.

Thanks!

#25

Updated by Milan Pässler almost 6 years ago

Seems to be some kind of ghost bug :(
I can't remember that I changed anything at all, but I can't reproduce it anymore.

#26

Updated by Sven Eckelmann over 4 years ago

  • Target version set to 2014.2.0

Also available in: Atom PDF