Bug #341
open65% packet loss after node disconnection
0%
Description
Hi,
my configuration is the following :
+-------+ +---------------+ |laptop |<---->|batman GateWay |<----> batman nodes(A,B,C) +-------+ +---------------+
- the laptop is not a part of the batman network. it is connected to the GW via ethernet
- all the batman nodes are RocketM5 running batman 2017.1 BATMAN_V
scenario :
- All nodes are connected to batman network.
- Node A is shut down
the issue:
Ping to node B and C from laptop has about 65% packet loss
Thanks Alot!
Files
Updated by Sven Eckelmann about 7 years ago
- Description updated (diff)
- Status changed from New to Feedback
- Assignee changed from batman-adv developers to Moshe Hoori
Sample complaints as I had in #340#note-1
The bug description is also quite odd. Why is it expected to have lower than 65% packet loss when you remove the nearest [1] node which had a good connection [2] to the batman-adv gateway? A bad connection [3] will result in packet loss - so nothing unexpected here.
The bug also doesn't describe whether this is a temporary problem (which could be expected until a node times out in the originator table) or is a stable problem over multiple hours. The latter requires also a test which must shut everything down and then only starting B+C (and never A).
[1] at least I would assume that A is the nearest. Bug description is missing any information about that
[2] at least I would guess that it had a good connection. Bug description is missing any information about that
[3] at least I would guess that the connection to B and C from the gateway is bad. Bug description is missing any information about that
Updated by Sven Eckelmann about 7 years ago
Btw. make sure that you've also applied the BATMAN_V fixes from batman-adv 2017.2 in your tests:
Updated by Moshe Hoori about 7 years ago
1. A isn't the nearest . all the nodes are with great proximity to one another.
2. the problem is temporary, the ping gets better about 2 minutes after the issue occurs.
3. ping from the laptop provides same results as from the gateway.
Updated by Sven Eckelmann about 7 years ago
What about the patches? Can the BATMAN_V developers please get the originator + neighbor table output from each device (beside the laptop) for
- node A is on and ping is fine
- node A is turned off and ping is bad
- node A is turned off and ping is good again
The output of
iw dev XXXXX station dumpwould also be nice
Good question would also be whether you see this problem with BATMAN_IV.
Updated by Sven Eckelmann about 7 years ago
The nodeA (which goes down) doesn't seem to be the best next hop for anything but itself and there is only a single interface involved. Just to be sure, what is your build and runtime configuration for batman-adv? Do you see the packet loss with ipv4 and batctl ping? Or what kind of traffic are you using to detect the packet loss. Are you sure that you don't have additional traffic towards the removed node which causes airtime saturation due to the retries by the wifi hw/rate control? Did you do a capture on a wifi monitor interface and the mesh0 interface to detect where the traffic is routed and where it is dropped?
Did you check
iw dev mesh0 station dump?
Has somebody else a good idea for what to look for? Here are the logs but with mac address replaced with human readable names.
start¶
nodeGW¶
Originator last-seen ( throughput) Nexthop [outgoingIF]: Potential nexthops ... nodeC 0.120s ( 24.0) nodeC [ mesh0]: nodeA ( 9.4) nodeB ( 6.3) nodeC ( 24.0) nodeB 0.480s ( 24.0) nodeB [ mesh0]: nodeA ( 8.8) nodeC ( 9.0) nodeB ( 24.0) nodeA 0.360s ( 20.4) nodeA [ mesh0]: nodeB ( 8.7) nodeC ( 9.6) nodeA ( 20.4)
nodeA¶
Originator last-seen ( throughput) Nexthop [outgoingIF]: Potential nexthops ... nodeC 0.470s ( 23.9) nodeC [ mesh0]: nodeGW ( 15.0) nodeB ( 7.5) nodeC ( 23.9) nodeB 0.910s ( 17.4) nodeB [ mesh0]: nodeC ( 9.4) nodeGW ( 16.9) nodeB ( 17.4) nodeGW 0.210s ( 17.4) nodeGW [ mesh0]: nodeB ( 12.0) nodeC ( 12.0) nodeGW ( 17.4)
nodeB¶
Originator last-seen ( throughput) Nexthop [outgoingIF]: Potential nexthops ... nodeC 0.170s ( 17.9) nodeGW [ mesh0]: nodeA ( 9.5) nodeGW ( 17.9) nodeC ( 12.7) nodeGW 0.840s ( 24.0) nodeGW [ mesh0]: nodeA ( 9.2) nodeC ( 12.0) nodeGW ( 24.0) nodeA 0.320s ( 17.3) nodeA [ mesh0]: nodeC ( 9.7) nodeGW ( 11.7) nodeA ( 17.3)
nodeC¶
Originator last-seen ( throughput) Nexthop [outgoingIF]: Potential nexthops ... nodeB 0.360s ( 18.1) nodeB [ mesh0]: nodeA ( 8.7) nodeGW ( 13.4) nodeB ( 18.1) nodeGW 0.530s ( 23.9) nodeGW [ mesh0]: nodeA ( 9.2) nodeB ( 12.0) nodeGW ( 23.9) nodeA 0.060s ( 22.6) nodeA [ mesh0]: nodeB ( 8.6) nodeGW ( 9.5) nodeA ( 22.6)
node A turned off - high packet loss¶
nodeGW¶
Originator last-seen ( throughput) Nexthop [outgoingIF]: Potential nexthops ... nodeC 0.350s ( 27.3) nodeC [ mesh0]: nodeA ( 11.8) nodeB ( 6.8) nodeC ( 27.3) nodeB 0.900s ( 24.0) nodeB [ mesh0]: nodeA ( 8.7) nodeC ( 11.3) nodeB ( 24.0) nodeA 11.490s ( 18.6) nodeA [ mesh0]: nodeB ( 8.6) nodeC ( 11.5) nodeA ( 18.6)
nodeA¶
This was the disconnected node.
nodeB¶
Originator last-seen ( throughput) Nexthop [outgoingIF]: Potential nexthops ... nodeC 0.270s ( 13.3) nodeGW [ mesh0]: nodeA ( 11.8) nodeGW ( 13.3) nodeC ( 12.7) nodeGW 0.010s ( 25.5) nodeGW [ mesh0]: nodeA ( 9.5) nodeC ( 12.7) nodeGW ( 25.5) nodeA 12.370s ( 17.3) nodeA [ mesh0]: nodeC ( 11.5) nodeGW ( 9.3) nodeA ( 17.3)
nodeC¶
Originator last-seen ( throughput) Nexthop [outgoingIF]: Potential nexthops ... nodeB 0.670s ( 20.7) nodeB [ mesh0]: nodeA ( 8.7) nodeGW ( 12.0) nodeB ( 20.7) nodeGW 0.130s ( 25.7) nodeGW [ mesh0]: nodeA ( 9.5) nodeB ( 12.5) nodeGW ( 25.7) nodeA 14.410s ( 23.1) nodeA [ mesh0]: nodeB ( 8.6) nodeGW ( 9.3) nodeA ( 23.1)
node A turned off - ping good again¶
nodeGW¶
Originator last-seen ( throughput) Nexthop [outgoingIF]: Potential nexthops ... nodeC 0.110s ( 24.0) nodeC [ mesh0]: nodeA ( 11.8) nodeB ( 6.3) nodeC ( 24.0) nodeB 0.630s ( 24.0) nodeB [ mesh0]: nodeA ( 8.7) nodeC ( 9.5) nodeB ( 24.0) nodeA 128.700s ( 18.6) nodeA [ mesh0]: nodeB ( 8.6) nodeC ( 11.5) nodeA ( 18.6)
nodeA¶
Not connected.
nodeB¶
Originator last-seen ( throughput) Nexthop [outgoingIF]: Potential nexthops ... nodeC 0.930s ( 12.7) nodeC [ mesh0]: nodeA ( 11.8) nodeGW ( 12.0) nodeC ( 12.7) nodeGW 0.850s ( 24.0) nodeGW [ mesh0]: nodeA ( 9.5) nodeC ( 12.0) nodeGW ( 24.0) nodeA 131.530s ( 17.3) nodeA [ mesh0]: nodeC ( 11.5) nodeGW ( 9.3) nodeA ( 17.3)
nodeC¶
Originator last-seen ( throughput) Nexthop [outgoingIF]: Potential nexthops ... nodeB 0.640s ( 19.1) nodeB [ mesh0]: nodeA ( 8.7) nodeGW ( 12.1) nodeB ( 19.1) nodeGW 0.080s ( 24.0) nodeGW [ mesh0]: nodeA ( 9.5) nodeB ( 12.0) nodeGW ( 24.0) nodeA 177.920s ( 23.1) nodeA [ mesh0]: nodeB ( 8.6) nodeGW ( 9.3) nodeA ( 23.1)
Updated by david lichterov about 7 years ago
- File cop2_to_ox1.ping cop2_to_ox1.ping added
- File ox1_to_cop2.ping ox1_to_cop2.ping added
Sven Eckelmann wrote:
The nodeA (which goes down) doesn't seem to be the best next hop for anything but itself and there is only a single interface involved. Just to be sure, what is your build and runtime configuration for batman-adv? Do you see the packet loss with ipv4 and batctl ping? Or what kind of traffic are you using to detect the packet loss. Are you sure that you don't have additional traffic towards the removed node which causes airtime saturation due to the retries by the wifi hw/rate control? Did you do a capture on a wifi monitor interface and the mesh0 interface to detect where the traffic is routed and where it is dropped?
Did you check [...]?
Has somebody else a good idea for what to look for? Here are the logs but with mac address replaced with human readable names.
Hey, i work with Moshe on this problem i have some of the details that you requested .
Can you explain what you mean by "build and runtime configuration for batman-adv" ?
Did you mean this :
root@OpenWrt:~# cat /etc/config/network
config interface 'loopback'
option ifname 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'fd35:991f:6257::/48'
config interface 'lan'
option force_link '1'
option type 'bridge'
option proto 'static'
option netmask '255.255.255.0'
option ip6assign '60'
option _orig_bridge 'true'
option ifname 'bat0 eth0'
option ipaddr '192.168.1.17'
config interface 'mesh'
option mtu '1532'
option proto 'batadv'
option mesh 'bat0'
option routing_algo 'BATMAN_V'
config interface 'bat'
option ifname 'bat0'
option proto 'static'
option mtu '1500'
option ipaddr '10.0.0.20'
option netmask '255.255.255.0'
root@OpenWrt:~# cat /etc/config/wireless
config wifi-device 'radio0'
option type 'mac80211'
option path 'platform/ar934x_wmac'
option htmode 'HT20'
option hwmode '11a'
option txpower '22'
option country 'IL'
option channel '36'
config wifi-iface
option device 'radio0'
option ssid 'OpenWrt'
option ifname 'mesh0'
option network 'mesh'
option mode 'adhoc'
option bssid '02:CA:FE:CA:CA:40'
option mcast_rate '18000'
option encryption 'none'
we tried to do pings from batctl and we are seeing the same thing as above (I am attaching files with ping results).
And bellow is the output of
iw dev mesh0 station dump
root@OpenWrt:~# iw dev mesh0 station dump
Station 44:d9:e7:5c:e1:f7 (on mesh0)
inactive time: 0 ms
rx bytes: 198449132
rx packets: 447840
tx bytes: 16424007
tx packets: 98882
tx retries: 6193
tx failed: 1208
signal: -35 [-36, -41] dBm
signal avg: -36 [-37, -43] dBm
tx bitrate: 144.4 MBit/s MCS 15 short GI
rx bitrate: 144.4 MBit/s MCS 15 short GI
expected throughput: 87.798Mbps
authorized: yes
authenticated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no
connected time: 7893 seconds
Station 04:18:d6:f6:49:f4 (on mesh0)
inactive time: 10 ms
rx bytes: 88169529
rx packets: 153829
tx bytes: 1878035
tx packets: 14428
tx retries: 2281
tx failed: 255
signal: -39 [-40, -46] dBm
signal avg: -40 [-41, -46] dBm
tx bitrate: 144.4 MBit/s MCS 15 short GI
rx bitrate: 130.0 MBit/s MCS 14 short GI
expected throughput: 89.355Mbps
authorized: yes
authenticated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no
connected time: 372 seconds
root@OpenWrt:~# iw dev mesh0 station dump
Station 04:18:d6:cc:93:9b (on mesh0)
inactive time: 0 ms
rx bytes: 56219995
rx packets: 178963
tx bytes: 58290787
tx packets: 44390
tx retries: 4080
tx failed: 0
signal: -40 [-48, -41] dBm
signal avg: -41 [-49, -42] dBm
tx bitrate: 144.4 MBit/s MCS 15 short GI
rx bitrate: 144.4 MBit/s MCS 15 short GI
expected throughput: 47.57Mbps
authorized: yes
authenticated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no
connected time: 399 seconds
Station 44:d9:e7:5c:e1:f7 (on mesh0)
inactive time: 10 ms
rx bytes: 52802022
rx packets: 147025
tx bytes: 416432
tx packets: 1467
tx retries: 67
tx failed: 0
signal: -27 [-29, -31] dBm
signal avg: -27 [-30, -31] dBm
tx bitrate: 144.4 MBit/s MCS 15 short GI
rx bitrate: 144.4 MBit/s MCS 15 short GI
expected throughput: 47.57Mbps
authorized: yes
authenticated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no
connected time: 339 seconds
root@OpenWrt:~# iw dev mesh0 station dump
Station 04:18:d6:cc:93:9b (on mesh0)
inactive time: 0 ms
rx bytes: 117922138
rx packets: 365641
tx bytes: 461079664
tx packets: 322163
tx retries: 19107
tx failed: 0
signal: -43 [-45, -47] dBm
signal avg: -39 [-41, -43] dBm
tx bitrate: 144.4 MBit/s MCS 15 short GI
rx bitrate: 144.4 MBit/s MCS 15 short GI
expected throughput: 80.108Mbps
authorized: yes
authenticated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no
connected time: 779 seconds
Station 04:18:d6:f6:49:f4 (on mesh0)
inactive time: 10 ms
rx bytes: 114449063
rx packets: 315425
tx bytes: 964221
tx packets: 3338
tx retries: 79
tx failed: 0
signal: -26 [-36, -27] dBm
signal avg: -26 [-35, -26] dBm
tx bitrate: 130.0 MBit/s MCS 14 short GI
rx bitrate: 144.4 MBit/s MCS 15 short GI
expected throughput: 45.43Mbps
authorized: yes
authenticated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no
connected time: 779 seconds
Updated by Sven Eckelmann about 7 years ago
Hey, i work with Moshe on this problem i have some of the details that you requested .
Can you explain what you mean by "build and runtime configuration for batman-adv" ?
I meant with "build configuration" the the options which you've enabled during build time like:
- CONFIG_BATMAN_ADV_DEBUG
- CONFIG_BATMAN_ADV_DEBUGFS
- CONFIG_BATMAN_ADV_BLA
- CONFIG_BATMAN_ADV_DAT
- CONFIG_BATMAN_ADV_NC
- CONFIG_BATMAN_ADV_MCAST
- CONFIG_BATMAN_ADV_BATMAN_V
The network and wireless options are interesting - but you've missed /etc/config/batman-adv and anything which you change manually during runtime.
The output of iw dev mesh0 station dump
which you gave us is unfortunately meaningless at the moment. It is not known when you've taken it. And it doesn't look like you've taken it during each of the previously suggested stages (all connected, nodeA turned off and packet loss, nodeA turned off and good packet loss)
What about the other questions:
- Are you sure that you don't have additional traffic towards the removed node which causes airtime saturation due to the retries by the wifi hw/rate control?
- Did you do a capture on a wifi monitor interface and the mesh0 interface to detect where the traffic is routed and where it is dropped?
Right now it just looks like the latency increases by a lot when nodeA gets turned off. This could (but doesn't have to be) be related to some packets which gets retransmitted quite often by the wifi driver/hw when nodeA disappears (and therefore cannot ACK packets anymore). Would therefore be interesting to know whether this problem disappears when the wifi driver drops this station from its neighbor list. And it would of course be interesting what is actually be transmitted by the wifi device (hence the wifi monitor dumps).
Updated by Marek Lindner about 7 years ago
My 2 cents:
It might be interesting to configure the adhoc interface with IP addresses and repeat the same test on that interface. Since this will bypass batman-adv (which is not needed in this simple scenario) it would tell us whether this is a problem created by batman-adv.
Assuming the adhoc-ping-test does not show the same timeout behavior, you could also play with batctl ping / batctl traceroute. The layer2 ping / traceroute might tell us if this is a layer 2 or layer 3 issue and can also show route changes (if any).
Updated by david lichterov about 7 years ago
- File br-lan.monitor br-lan.monitor added
Sven Eckelmann wrote:
Hey, i work with Moshe on this problem i have some of the details that you requested .
Can you explain what you mean by "build and runtime configuration for batman-adv" ?I meant with "build configuration" the the options which you've enabled during build time like:
- CONFIG_BATMAN_ADV_DEBUG
- CONFIG_BATMAN_ADV_DEBUGFS
- CONFIG_BATMAN_ADV_BLA
- CONFIG_BATMAN_ADV_DAT
- CONFIG_BATMAN_ADV_NC
- CONFIG_BATMAN_ADV_MCAST
- CONFIG_BATMAN_ADV_BATMAN_V
Our config is :
CONFIG_PACKAGE_kmod-batman-adv=y
CONFIG_KMOD_BATMAN_ADV_DEBUG_LOG=y
CONFIG_KMOD_BATMAN_ADV_BLA=y
CONFIG_KMOD_BATMAN_ADV_DAT=y
CONFIG_KMOD_BATMAN_ADV_DEBUGFS=y
CONFIG_KMOD_BATMAN_ADV_MCAST=y
CONFIG_KMOD_BATMAN_ADV_NC=y
CONFIG_KMOD_BATMAN_ADV_BATMAN_V=y
The network and wireless options are interesting - but you've missed /etc/config/batman-adv and anything which you change manually during runtime.
root@OpenWrt:~# cat /etc/config/batman-adv
config mesh 'bat0'
option gw_mode 'server'
The output of
iw dev mesh0 station dump
which you gave us is unfortunately meaningless at the moment. It is not known when you've taken it. And it doesn't look like you've taken it during each of the previously suggested stages (all connected, nodeA turned off and packet loss, nodeA turned off and good packet loss)What about the other questions:
- Are you sure that you don't have additional traffic towards the removed node which causes airtime saturation due to the retries by the wifi hw/rate control?
The dumps bellow should be without any additional traffic towards the removed node.
- Did you do a capture on a wifi monitor interface and the mesh0 interface to detect where the traffic is routed and where it is dropped?
I am adding the dumps of a capture on the wifi interface and mesh0 interface . We start the monitor when there are 3 nodes connected and a laptop that's connected with ethernet cable (not on the mesh) to the node that we monitor. After approximately 60 seconds we disconnect node (MAC 04:18:D6:F6:49:F4 , IP 192.168.1.15). The capture was done on the interfaces of mesh=04:18:D6:CC:93:9B, br-lan=04:18:D6:CD:93:9B, ip=192.168.1.42.
Updated by david lichterov about 7 years ago
Attaching the monitor files again.
Updated by Sven Eckelmann about 7 years ago
There is no monitor capture in the bz2.. Please refer to https://wireless.wiki.kernel.org/en/users/documentation/iw#modifying_monitor_interface_flags to see how to create a monitor interface. And please also create pcaps with "tcpdump -w /tmp/blabla.pcap ...."
The only thing which I saw in you captures is that there is traffic towards 04:18:d6:f6:49:f4 (which is the one which is offline). Most of it are ELP messages. But you told us that it also happens with BATMAN_IV and ELP doesn't exist in BATMAN_IV. So these should not be the culprit.
Did you try the test from #341#note-9? You can configure an ip manually on mesh0 using using
node1 $ ifconfig mesh0 192.168.25.1 node2 $ ifconfig mesh0 192.168.25.2 node2 $ ping -c 20 192.168.25.1
Updated by david lichterov about 7 years ago
- File tests.tar.bz2 tests.tar.bz2 added
Sven Eckelmann wrote:
There is no monitor capture in the bz2.. Please refer to https://wireless.wiki.kernel.org/en/users/documentation/iw#modifying_monitor_interface_flags to see how to create a monitor interface. And please also create pcaps with "tcpdump -w /tmp/blabla.pcap ...."
Trying again... attaching the out put files.
The only thing which I saw in you captures is that there is traffic towards 04:18:d6:f6:49:f4 (which is the one which is offline). Most of it are ELP messages. But you told us that it also happens with BATMAN_IV and ELP doesn't exist in BATMAN_IV. So these should not be the culprit.
Did you try the test from #341#note-9? You can configure an ip manually on mesh0 using using
[...]
I tried here is the results of the ping :
root@OpenWrt:~# ping 192.168.25.3
PING 192.168.25.3 (192.168.25.3): 56 data bytes
64 bytes from 192.168.25.3: seq=0 ttl=64 time=1.581 ms
64 bytes from 192.168.25.3: seq=1 ttl=64 time=1.416 ms
64 bytes from 192.168.25.3: seq=2 ttl=64 time=1.407 ms
64 bytes from 192.168.25.3: seq=3 ttl=64 time=1.397 ms
64 bytes from 192.168.25.3: seq=4 ttl=64 time=1.413 ms
64 bytes from 192.168.25.3: seq=5 ttl=64 time=1.507 ms
64 bytes from 192.168.25.3: seq=6 ttl=64 time=1.440 ms
64 bytes from 192.168.25.3: seq=7 ttl=64 time=1.676 ms
64 bytes from 192.168.25.3: seq=8 ttl=64 time=2.269 ms
64 bytes from 192.168.25.3: seq=9 ttl=64 time=1.407 ms
64 bytes from 192.168.25.3: seq=10 ttl=64 time=1.823 ms
64 bytes from 192.168.25.3: seq=11 ttl=64 time=1.401 ms
64 bytes from 192.168.25.3: seq=12 ttl=64 time=1.399 ms
64 bytes from 192.168.25.3: seq=13 ttl=64 time=1.389 ms
64 bytes from 192.168.25.3: seq=14 ttl=64 time=1.426 ms
64 bytes from 192.168.25.3: seq=15 ttl=64 time=1.384 ms
64 bytes from 192.168.25.3: seq=16 ttl=64 time=1.385 ms
64 bytes from 192.168.25.3: seq=17 ttl=64 time=2.075 ms
64 bytes from 192.168.25.3: seq=18 ttl=64 time=1.394 ms
64 bytes from 192.168.25.3: seq=19 ttl=64 time=1.400 ms
64 bytes from 192.168.25.3: seq=20 ttl=64 time=1.375 ms
64 bytes from 192.168.25.3: seq=21 ttl=64 time=1.450 ms
64 bytes from 192.168.25.3: seq=22 ttl=64 time=1.358 ms
64 bytes from 192.168.25.3: seq=23 ttl=64 time=1.374 ms
64 bytes from 192.168.25.3: seq=24 ttl=64 time=1.373 ms
64 bytes from 192.168.25.3: seq=25 ttl=64 time=1.375 ms
64 bytes from 192.168.25.3: seq=26 ttl=64 time=1.375 ms
64 bytes from 192.168.25.3: seq=27 ttl=64 time=15.474 ms
64 bytes from 192.168.25.3: seq=28 ttl=64 time=1.370 ms
64 bytes from 192.168.25.3: seq=29 ttl=64 time=1.379 ms
64 bytes from 192.168.25.3: seq=30 ttl=64 time=1.375 ms
64 bytes from 192.168.25.3: seq=31 ttl=64 time=1.364 ms
64 bytes from 192.168.25.3: seq=32 ttl=64 time=1.374 ms
64 bytes from 192.168.25.3: seq=33 ttl=64 time=1.374 ms
64 bytes from 192.168.25.3: seq=34 ttl=64 time=1.388 ms
64 bytes from 192.168.25.3: seq=35 ttl=64 time=1.379 ms
64 bytes from 192.168.25.3: seq=36 ttl=64 time=1.344 ms
64 bytes from 192.168.25.3: seq=37 ttl=64 time=1.842 ms
64 bytes from 192.168.25.3: seq=38 ttl=64 time=1.507 ms
64 bytes from 192.168.25.3: seq=39 ttl=64 time=2.994 ms
64 bytes from 192.168.25.3: seq=40 ttl=64 time=1.398 ms
64 bytes from 192.168.25.3: seq=41 ttl=64 time=1.391 ms
64 bytes from 192.168.25.3: seq=42 ttl=64 time=1.710 ms
64 bytes from 192.168.25.3: seq=43 ttl=64 time=1.400 ms
64 bytes from 192.168.25.3: seq=44 ttl=64 time=1.391 ms
64 bytes from 192.168.25.3: seq=45 ttl=64 time=1.391 ms
64 bytes from 192.168.25.3: seq=46 ttl=64 time=116.681 ms
64 bytes from 192.168.25.3: seq=47 ttl=64 time=26.692 ms
64 bytes from 192.168.25.3: seq=48 ttl=64 time=57.789 ms
64 bytes from 192.168.25.3: seq=49 ttl=64 time=53.890 ms
64 bytes from 192.168.25.3: seq=50 ttl=64 time=224.817 ms
64 bytes from 192.168.25.3: seq=51 ttl=64 time=191.852 ms
64 bytes from 192.168.25.3: seq=52 ttl=64 time=31.281 ms
64 bytes from 192.168.25.3: seq=53 ttl=64 time=44.700 ms
64 bytes from 192.168.25.3: seq=54 ttl=64 time=157.240 ms
64 bytes from 192.168.25.3: seq=55 ttl=64 time=91.909 ms
64 bytes from 192.168.25.3: seq=56 ttl=64 time=9.933 ms
64 bytes from 192.168.25.3: seq=57 ttl=64 time=22.053 ms
64 bytes from 192.168.25.3: seq=58 ttl=64 time=43.157 ms
64 bytes from 192.168.25.3: seq=59 ttl=64 time=56.001 ms
64 bytes from 192.168.25.3: seq=60 ttl=64 time=154.643 ms
64 bytes from 192.168.25.3: seq=61 ttl=64 time=44.681 ms
64 bytes from 192.168.25.3: seq=62 ttl=64 time=62.938 ms
64 bytes from 192.168.25.3: seq=63 ttl=64 time=117.767 ms
64 bytes from 192.168.25.3: seq=64 ttl=64 time=68.347 ms
64 bytes from 192.168.25.3: seq=65 ttl=64 time=53.134 ms
64 bytes from 192.168.25.3: seq=66 ttl=64 time=55.873 ms
64 bytes from 192.168.25.3: seq=67 ttl=64 time=67.072 ms
64 bytes from 192.168.25.3: seq=68 ttl=64 time=136.168 ms
64 bytes from 192.168.25.3: seq=69 ttl=64 time=62.937 ms
64 bytes from 192.168.25.3: seq=70 ttl=64 time=103.860 ms
64 bytes from 192.168.25.3: seq=71 ttl=64 time=39.309 ms
64 bytes from 192.168.25.3: seq=72 ttl=64 time=49.363 ms
64 bytes from 192.168.25.3: seq=73 ttl=64 time=69.935 ms
64 bytes from 192.168.25.3: seq=74 ttl=64 time=126.700 ms
64 bytes from 192.168.25.3: seq=75 ttl=64 time=144.226 ms
64 bytes from 192.168.25.3: seq=76 ttl=64 time=81.853 ms
64 bytes from 192.168.25.3: seq=77 ttl=64 time=83.896 ms
64 bytes from 192.168.25.3: seq=78 ttl=64 time=2.479 ms
64 bytes from 192.168.25.3: seq=79 ttl=64 time=70.421 ms
64 bytes from 192.168.25.3: seq=80 ttl=64 time=187.253 ms
64 bytes from 192.168.25.3: seq=81 ttl=64 time=266.937 ms
64 bytes from 192.168.25.3: seq=82 ttl=64 time=194.640 ms
64 bytes from 192.168.25.3: seq=83 ttl=64 time=87.860 ms
64 bytes from 192.168.25.3: seq=84 ttl=64 time=81.853 ms
64 bytes from 192.168.25.3: seq=85 ttl=64 time=152.013 ms
64 bytes from 192.168.25.3: seq=86 ttl=64 time=118.396 ms
64 bytes from 192.168.25.3: seq=87 ttl=64 time=51.432 ms
64 bytes from 192.168.25.3: seq=88 ttl=64 time=94.313 ms
64 bytes from 192.168.25.3: seq=89 ttl=64 time=58.187 ms
64 bytes from 192.168.25.3: seq=90 ttl=64 time=220.814 ms
64 bytes from 192.168.25.3: seq=91 ttl=64 time=114.451 ms
64 bytes from 192.168.25.3: seq=92 ttl=64 time=31.864 ms
64 bytes from 192.168.25.3: seq=93 ttl=64 time=129.706 ms
64 bytes from 192.168.25.3: seq=94 ttl=64 time=69.393 ms
64 bytes from 192.168.25.3: seq=95 ttl=64 time=277.520 ms
64 bytes from 192.168.25.3: seq=96 ttl=64 time=236.258 ms
64 bytes from 192.168.25.3: seq=97 ttl=64 time=43.555 ms
64 bytes from 192.168.25.3: seq=98 ttl=64 time=90.342 ms
64 bytes from 192.168.25.3: seq=99 ttl=64 time=35.094 ms
64 bytes from 192.168.25.3: seq=100 ttl=64 time=310.098 ms
64 bytes from 192.168.25.3: seq=101 ttl=64 time=157.963 ms
64 bytes from 192.168.25.3: seq=102 ttl=64 time=79.196 ms
64 bytes from 192.168.25.3: seq=103 ttl=64 time=92.743 ms
64 bytes from 192.168.25.3: seq=104 ttl=64 time=157.861 ms
64 bytes from 192.168.25.3: seq=105 ttl=64 time=227.749 ms
64 bytes from 192.168.25.3: seq=106 ttl=64 time=1.480 ms
64 bytes from 192.168.25.3: seq=107 ttl=64 time=1.378 ms
64 bytes from 192.168.25.3: seq=108 ttl=64 time=1.354 ms
64 bytes from 192.168.25.3: seq=109 ttl=64 time=1.544 ms
64 bytes from 192.168.25.3: seq=110 ttl=64 time=1.360 ms
64 bytes from 192.168.25.3: seq=111 ttl=64 time=1.460 ms
64 bytes from 192.168.25.3: seq=112 ttl=64 time=1.412 ms
64 bytes from 192.168.25.3: seq=113 ttl=64 time=1.375 ms
64 bytes from 192.168.25.3: seq=114 ttl=64 time=1.385 ms
64 bytes from 192.168.25.3: seq=115 ttl=64 time=1.375 ms
^C
--- 192.168.25.3 ping statistics ---
116 packets transmitted, 116 packets received, 0% packet loss
round-trip min/avg/max = 1.344/55.071/310.098 ms
it's seems not better then the tests that we did before.
Updated by david lichterov about 7 years ago
it's seems not better then the tests that we did before.
i meant to say that it seems better .
Updated by david lichterov about 7 years ago
One more thing. i did the monitor only on mesh0. those are the interfaces that we have :
root@OpenWrt:/tmp# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-lan state UP qlen 1000
link/ether 04:18:d6:cd:93:9b brd ff:ff:ff:ff:ff:ff
4: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether 04:18:d6:cd:93:9b brd ff:ff:ff:ff:ff:ff
5: mesh0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1532 qdisc mq master bat0 state UP qlen 1000
link/ether 04:18:d6:cc:93:9b brd ff:ff:ff:ff:ff:ff
6: bat0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UNKNOWN qlen 1000
link/ether 7e:1f:b4:f4:a5:cc brd ff:ff:ff:ff:ff:ff
7: fish0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/[803] 04:18:d6:cc:93:9b brd ff:ff:ff:ff:ff:ff
should i monitor any thing else ?
Updated by david lichterov about 7 years ago
Sven Eckelmann wrote:
Did you try the test from #341#note-9? You can configure an ip manually on mesh0 using using
[...]
Hey,
does the test from #341#note-9 and the results that we got suggests that the issue is not with batman?
Updated by Sven Eckelmann about 7 years ago
No, right now it still looks more like the queuing (fq codel + fair airtime?) and rate settings of your wifi stack/driver is to blame here.
And I told you about the monitor interface and dumps on it in #341#note-13 and #341#note-8. See #341#note-8 regarding you iw dev mesh0 station dump
output. There are also other ideas in the ticket about things which can be tested to see whether batman-adv is to blame ("route changes", ...)
Updated by Marek Lindner about 7 years ago
Sven Eckelmann wrote:
No, right now it still looks more like the queuing (fq codel + fair airtime?) and rate settings of your wifi stack/driver is to blame here.
I agree with Sven. The latency values in that test run might not be as high as during previous runs but generally, deactivating an unrelated WiFi neighbor should not increase latency anywhere. If anything, it should reduce latency.
Assuming the latency is created by the WiFi layer (Wifi driver, analog noise, queues, etc) you will always see that latency in batman-adv too. You could also start poking in the WiFi layer. For instance, check the driver you're using. Is it an old version with bugs ? Is it bleeding edge ? Are all test devices using Atheros AR934X ?
Updated by Sven Eckelmann about 7 years ago
We heard about problems with the fair airtime implementation and appearing/disappearing clients (with and without batman-adv). So you could try to revert the changes from
- https://git.lede-project.org/?p=source.git;a=blob;f=package/kernel/mac80211/patches/320-ath9k-clean-up-and-fix-ath_tx_count_airtime.patch;h=a6a3bfca6d0be5c1541738ad3ef5e2e779781e39;hb=764cd09dd845864b0d45c6b1f914b81612a5dd28
- https://git.lede-project.org/?p=source.git;a=blob;f=package/kernel/mac80211/patches/344-ath9k-Introduce-airtime-fairness-scheduling-between-.patch;h=10c6573b8c1ff83196c19bdee4b2eafd1ff06e8e;hb=91fce81df6e99cec0876b9d4866bd86e7c49820f (or the upstream version https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=63fefa050477b0974ab34f650e21a7cfc3b02d96 )
You should get in contact with Toke Høiland-Jørgensen <toke@toke.dk> and linux-wireless@vger.kernel.org when this is the reason for your problems.
Updated by david lichterov about 7 years ago
Hey Sven,
Thank you for the pointers and help.
Before we reached out to Toke and linux-wireless@vger.kernel.org we wanted to redo the tests and make sure that we get the same results.
1. What we did last time was to ping through mesh0 interface as was suggested above. The test resulted in poor ping quality when one of the nodes was disconnected.
2. In the current test we configured an ad-hoc network on 3 nodes, With the configuration bellow:
root@OpenWrt:/# cat /etc/config/network config interface 'loopback' option ifname 'lo' option proto 'static' option ipaddr '127.0.0.1' option netmask '255.0.0.0' config globals 'globals' option ula_prefix 'fdf8:4b2c:88b1::/48' config interface 'lan' option type 'bridge' option ifname 'eth0 wlan0' option proto 'static' option ipaddr '192.168.1.12' option netmask '255.255.255.0' option ip6assign '60'
root@OpenWrt:/# cat /etc/config/wireless config wifi-device radio0 option type mac80211 option channel 36 option hwmode 11a option path 'platform/ar934x_wmac' option htmode HT20 config wifi-iface option device radio0 option network lan option mode adhoc option ssid OpenWrt option encryption none
root@OpenWrt:/# ifconfig br-lan Link encap:Ethernet HWaddr 04:18:D6:F7:49:E3 inet addr:192.168.1.12 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fdf8:4b2c:88b1::1/60 Scope:Global UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) eth0 Link encap:Ethernet HWaddr 04:18:D6:F7:49:E3 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) Interrupt:4 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:124 errors:0 dropped:0 overruns:0 frame:0 TX packets:124 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:8544 (8.3 KiB) TX bytes:8544 (8.3 KiB) wlan0 Link encap:Ethernet HWaddr 04:18:D6:F6:49:E3 inet addr:172.16.0.201 Bcast:172.16.255.255 Mask:255.255.0.0 inet6 addr: fe80::618:d6ff:fef6:49e3/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:7 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:864 (864.0 B)
The ping test was done from node A to node B and C, node C was disconnected. The result of that ping test :
root@OpenWrt:/# ping 172.16.0.202 PING 172.16.0.202 (172.16.0.202): 56 data bytes 64 bytes from 172.16.0.202: seq=0 ttl=64 time=1.502 ms 64 bytes from 172.16.0.202: seq=1 ttl=64 time=1.399 ms 64 bytes from 172.16.0.202: seq=2 ttl=64 time=1.398 ms 64 bytes from 172.16.0.202: seq=3 ttl=64 time=1.442 ms 64 bytes from 172.16.0.202: seq=4 ttl=64 time=1.402 ms 64 bytes from 172.16.0.202: seq=5 ttl=64 time=1.143 ms 64 bytes from 172.16.0.202: seq=6 ttl=64 time=1.777 ms 64 bytes from 172.16.0.202: seq=7 ttl=64 time=1.459 ms 64 bytes from 172.16.0.202: seq=8 ttl=64 time=1.411 ms 64 bytes from 172.16.0.202: seq=9 ttl=64 time=1.421 ms 64 bytes from 172.16.0.202: seq=10 ttl=64 time=1.407 ms 64 bytes from 172.16.0.202: seq=11 ttl=64 time=1.399 ms 64 bytes from 172.16.0.202: seq=12 ttl=64 time=1.458 ms 64 bytes from 172.16.0.202: seq=13 ttl=64 time=1.407 ms 64 bytes from 172.16.0.202: seq=14 ttl=64 time=1.397 ms 64 bytes from 172.16.0.202: seq=15 ttl=64 time=1.392 ms 64 bytes from 172.16.0.202: seq=16 ttl=64 time=1.636 ms 64 bytes from 172.16.0.202: seq=17 ttl=64 time=1.428 ms 64 bytes from 172.16.0.202: seq=18 ttl=64 time=1.381 ms 64 bytes from 172.16.0.202: seq=19 ttl=64 time=1.406 ms 64 bytes from 172.16.0.202: seq=20 ttl=64 time=1.400 ms 64 bytes from 172.16.0.202: seq=21 ttl=64 time=1.403 ms 64 bytes from 172.16.0.202: seq=22 ttl=64 time=7.701 ms 64 bytes from 172.16.0.202: seq=23 ttl=64 time=1.417 ms 64 bytes from 172.16.0.202: seq=24 ttl=64 time=1.413 ms 64 bytes from 172.16.0.202: seq=25 ttl=64 time=1.407 ms 64 bytes from 172.16.0.202: seq=26 ttl=64 time=1.396 ms 64 bytes from 172.16.0.202: seq=27 ttl=64 time=5.830 ms 64 bytes from 172.16.0.202: seq=28 ttl=64 time=1.395 ms 64 bytes from 172.16.0.202: seq=29 ttl=64 time=1.397 ms 64 bytes from 172.16.0.202: seq=30 ttl=64 time=1.401 ms 64 bytes from 172.16.0.202: seq=31 ttl=64 time=6.423 ms 64 bytes from 172.16.0.202: seq=32 ttl=64 time=1.434 ms 64 bytes from 172.16.0.202: seq=33 ttl=64 time=1.406 ms 64 bytes from 172.16.0.202: seq=34 ttl=64 time=1.396 ms 64 bytes from 172.16.0.202: seq=35 ttl=64 time=1.408 ms 64 bytes from 172.16.0.202: seq=36 ttl=64 time=1.424 ms 64 bytes from 172.16.0.202: seq=37 ttl=64 time=1.456 ms 64 bytes from 172.16.0.202: seq=38 ttl=64 time=8.808 ms 64 bytes from 172.16.0.202: seq=39 ttl=64 time=1.409 ms 64 bytes from 172.16.0.202: seq=40 ttl=64 time=1.520 ms 64 bytes from 172.16.0.202: seq=41 ttl=64 time=1.458 ms 64 bytes from 172.16.0.202: seq=42 ttl=64 time=1.450 ms 64 bytes from 172.16.0.202: seq=43 ttl=64 time=1.393 ms 64 bytes from 172.16.0.202: seq=44 ttl=64 time=1.765 ms 64 bytes from 172.16.0.202: seq=45 ttl=64 time=1.436 ms 64 bytes from 172.16.0.202: seq=46 ttl=64 time=1.406 ms 64 bytes from 172.16.0.202: seq=47 ttl=64 time=1.408 ms 64 bytes from 172.16.0.202: seq=48 ttl=64 time=5.378 ms 64 bytes from 172.16.0.202: seq=49 ttl=64 time=1.416 ms 64 bytes from 172.16.0.202: seq=50 ttl=64 time=1.476 ms 64 bytes from 172.16.0.202: seq=51 ttl=64 time=1.403 ms 64 bytes from 172.16.0.202: seq=52 ttl=64 time=1.403 ms 64 bytes from 172.16.0.202: seq=53 ttl=64 time=1.464 ms 64 bytes from 172.16.0.202: seq=54 ttl=64 time=1.407 ms 64 bytes from 172.16.0.202: seq=55 ttl=64 time=1.412 ms 64 bytes from 172.16.0.202: seq=56 ttl=64 time=1.416 ms 64 bytes from 172.16.0.202: seq=57 ttl=64 time=2.249 ms 64 bytes from 172.16.0.202: seq=58 ttl=64 time=1.408 ms 64 bytes from 172.16.0.202: seq=59 ttl=64 time=1.407 ms 64 bytes from 172.16.0.202: seq=60 ttl=64 time=1.405 ms 64 bytes from 172.16.0.202: seq=61 ttl=64 time=1.422 ms 64 bytes from 172.16.0.202: seq=62 ttl=64 time=1.406 ms 64 bytes from 172.16.0.202: seq=63 ttl=64 time=1.433 ms 64 bytes from 172.16.0.202: seq=64 ttl=64 time=1.408 ms 64 bytes from 172.16.0.202: seq=65 ttl=64 time=1.400 ms 64 bytes from 172.16.0.202: seq=66 ttl=64 time=1.404 ms 64 bytes from 172.16.0.202: seq=67 ttl=64 time=1.430 ms 64 bytes from 172.16.0.202: seq=68 ttl=64 time=1.426 ms 64 bytes from 172.16.0.202: seq=69 ttl=64 time=1.407 ms 64 bytes from 172.16.0.202: seq=70 ttl=64 time=3.037 ms 64 bytes from 172.16.0.202: seq=71 ttl=64 time=1.448 ms 64 bytes from 172.16.0.202: seq=72 ttl=64 time=1.419 ms 64 bytes from 172.16.0.202: seq=73 ttl=64 time=1.414 ms 64 bytes from 172.16.0.202: seq=74 ttl=64 time=1.452 ms 64 bytes from 172.16.0.202: seq=75 ttl=64 time=1.422 ms 64 bytes from 172.16.0.202: seq=76 ttl=64 time=1.725 ms 64 bytes from 172.16.0.202: seq=77 ttl=64 time=1.416 ms 64 bytes from 172.16.0.202: seq=78 ttl=64 time=1.412 ms 64 bytes from 172.16.0.202: seq=79 ttl=64 time=1.453 ms 64 bytes from 172.16.0.202: seq=80 ttl=64 time=1.439 ms 64 bytes from 172.16.0.202: seq=81 ttl=64 time=1.413 ms 64 bytes from 172.16.0.202: seq=82 ttl=64 time=1.413 ms 64 bytes from 172.16.0.202: seq=83 ttl=64 time=1.411 ms 64 bytes from 172.16.0.202: seq=84 ttl=64 time=1.561 ms 64 bytes from 172.16.0.202: seq=85 ttl=64 time=1.411 ms 64 bytes from 172.16.0.202: seq=86 ttl=64 time=1.428 ms 64 bytes from 172.16.0.202: seq=87 ttl=64 time=2.627 ms 64 bytes from 172.16.0.202: seq=88 ttl=64 time=1.405 ms 64 bytes from 172.16.0.202: seq=89 ttl=64 time=1.404 ms ^C --- 172.16.0.202 ping statistics --- 90 packets transmitted, 90 packets received, 0% packet loss round-trip min/avg/max = 1.143/1.772/8.808 ms
As you can see the issue did not reoccur in this setup.
3. With Batman setup we also did arping test, the results dont show any delay while disconnecting one of the nodes :
root@OpenWrt:/# arping -I br-lan 192.168.1.12 ARPING 192.168.1.12 from 192.168.1.13 br-lan Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.737ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.679ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.683ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.687ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.684ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.682ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.679ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.687ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.684ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.687ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.684ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.689ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.687ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.688ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.681ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.686ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.689ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.691ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.684ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.689ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.691ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.687ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.690ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.680ms Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14] 0.686ms Unicast reply fr^CSent 26 probes (1 broadcast(s)) Received 26 response(s)
4. We tested the same batman setup on rpi with Debian installed and didn't see any of the issues mentioned above.
How can we further investigate the issue ?
Thanks !
Updated by Sven Eckelmann about 7 years ago
Sorry, but I will not provide any help here anymore because you do everything - but not what I've asked for.
Updated by Moshe Hoori about 7 years ago
Hi Sven,
Sorry, we forgot to mention that as we use OpenWrt, we don't have the fair airtime implementation in our code at all.
so there is nothing to revert.
Updated by Marek Lindner about 7 years ago
david lichterov wrote:
How can we further investigate the issue ?
You're basically telling us everything works as it should and ask us why it works ? I hope you understand the irony ? :-)
From an engineering perspective I'd say: If the problem disappeared something changed in your setup. What that something is we obviously don't know. We only have the information you provided. Could be anything from the Openwrt toolchain, driver versions, etc up to the WiFi environment.
Assuming the issue is gone, why don't you go ahead with whatever you set out to do before you opened this ticket ? In case the problem shows up again, you can always re-open the ticket or create a new one or visit our IRC channel. It is really hard to make suggestions remotely what to do to trigger some problem only you have seen.
Updated by Moshe Hoori about 7 years ago
Hi Marek,
Sadly, the issue is not gone.Let me sum up what we know so far:
- Using BATMAN on Rocketm5 with OpenWrt:
- PING quality is bad after node disconnection (as mentioned in the initial post).
- ARPING quality is good after node disconnection .
- PING via mash0 as in post #12 quality is good after node disconnection .
- Using ADHOC connections on Rocketm5 with OpenWrt:
- PING+ARPING quality is good after node disconnection
- Using BATMAN on RPI3 with Debain:
- PING+ARPING quality is good after node disconnection .
We still see the issue on BATMAN + OpenWrt.
Updated by Marek Lindner about 7 years ago
Thanks for the summary.
Let me sum up what we know so far:
- Using BATMAN on Rocketm5 with OpenWrt:
- PING quality is bad after node disconnection (as mentioned in the initial post).
- ARPING quality is good after node disconnection .
- PING via mash0 as in post #12 quality is good after node disconnection .
Can you also help me understand how you come to the 'quality is good' conclusion for the pure adhoc mode test ? Doesn't ticket update 13 show the latency go up on disconnect ?
- Using ADHOC connections on Rocketm5 with OpenWrt:
- PING+ARPING quality is good after node disconnection
Is this test case different from the previous 'good' or the same ? Mesh0 simply is pure adhoc mode, right ?
- Using BATMAN on RPI3 with Debain:
- PING+ARPING quality is good after node disconnection .
Is the batman-adv version in all test scenarios the same ?
Updated by Moshe Hoori about 7 years ago
Marek Lindner wrote:
Thanks for the summary.
Let me sum up what we know so far:
- Using BATMAN on Rocketm5 with OpenWrt:
- PING quality is bad after node disconnection (as mentioned in the initial post).
- ARPING quality is good after node disconnection .
- PING via mash0 as in post #12 quality is good after node disconnection .
Can you also help me understand how you come to the 'quality is good' conclusion for the pure adhoc mode test ? Doesn't ticket update 13 show the latency go up on disconnect ?
you are right, I got confused. the ping test results in that case were not good.
- Using ADHOC connections on Rocketm5 with OpenWrt:
- PING+ARPING quality is good after node disconnection
Is this test case different from the previous 'good' or the same ? Mesh0 simply is pure adhoc mode, right ?
The results were good please see post 20. I don't know if Mesh0 is simply adhoc mode. just followed Sven's orders on ticket 12.
- Using BATMAN on RPI3 with Debain:
- PING+ARPING quality is good after node disconnection .
Is the batman-adv version in all test scenarios the same ?
yes.
Updated by Marek Lindner about 7 years ago
Moshe Hoori wrote:
Is this test case different from the previous 'good' or the same ? Mesh0 simply is pure adhoc mode, right ?
The results were good please see post 20.
To be honest, ticket update 20 confuses me because I don't understand which output is linked to what test and what is different from the expected. At first read, I thought all quoted text (config & tests) were coming from Openwrt and show no issue. In the end, the update states also no issue with Debian. Hence my conclusion in update 23: Everything is good ? In the following update you state that Openwrt still does not work ..
I don't know if Mesh0 is simply adhoc mode.
The second quote part of update 20 shows it: wifi-iface => radio0, mode => adhoc. The interface name can be chosen, it is the mode that matters.
Let me try to summarize what I understand so far while (partially) ignoring update 20 for now due to confusion on my end:
- Debian on RPI with or without batman-adv does not exhibt any spikes in latency when a nearby WiFi node is turned off.
- Openwrt with or without batman-adv exhibits spikes in latency when a nearby WiFi node is turned off.
The same batman-adv versions were deployed in all tests.
Please correct me if I got it wrong.
Based on the above, wouldn't it be safe to assume the problem lies somewhere in Openwrt (Wifi or Kernel or ...) ? There still might be a problem with batman-adv but until the underlying latency spikes haven't been resolved, it makes little sense to poke in batman-adv. Because batman-adv relies on the WiFi layer for the actual packet transmission we can not tackle a problem in batman-adv while the WiFi layer is misbehaving.
Updated by Moshe Hoori about 7 years ago
Marek Lindner wrote:
Moshe Hoori wrote:
Is this test case different from the previous 'good' or the same ? Mesh0 simply is pure adhoc mode, right ?
The results were good please see post 20.
To be honest, ticket update 20 confuses me because I don't understand which output is linked to what test and what is different from the expected. At first read, I thought all quoted text (config & tests) were coming from Openwrt and show no issue. In the end, the update states also no issue with Debian. Hence my conclusion in update 23: Everything is good ? In the following update you state that Openwrt still does not work ..
I don't know if Mesh0 is simply adhoc mode.
The second quote part of update 20 shows it: wifi-iface => radio0, mode => adhoc. The interface name can be chosen, it is the mode that matters.
Let me try to summarize what I understand so far while (partially) ignoring update 20 for now due to confusion on my end:
- Debian on RPI with or without batman-adv does not exhibt any spikes in latency when a nearby WiFi node is turned off.
that's right
- Openwrt with or without batman-adv exhibits spikes in latency when a nearby WiFi node is turned off.
using batman-adv, pinging as in ticket 12 results bad ping quality (this is what I referred to as pinging mesh0).
without batman-adv, using adhoc network configuration, results good ping quality.
The same batman-adv versions were deployed in all tests.
Please correct me if I got it wrong.
Based on the above, wouldn't it be safe to assume the problem lies somewhere in Openwrt (Wifi or Kernel or ...) ? There still might be a problem with batman-adv but until the underlying latency spikes haven't been resolved, it makes little sense to poke in batman-adv. Because batman-adv relies on the WiFi layer for the actual packet transmission we can not tackle a problem in batman-adv while the WiFi layer is misbehaving.
Updated by Marek Lindner about 7 years ago
Moshe Hoori wrote:
using batman-adv, pinging as in ticket 12 results bad ping quality (this is what I referred to as pinging mesh0).
without batman-adv, using adhoc network configuration, results good ping quality.
Sorry, there seems to be some misconception here. Pinging as described in ticket update 12 is without batman-adv. Ticket 12 even references my suggestion to test the pure WiFi layer without batman-adv which I suggested in ticket update 9. Ticket update 13 then shows the test results of pure WiFi without batman-adv and depicts a spike in latency. No ?
Updated by Moshe Hoori about 7 years ago
Hi Marek,
sorry for being unclear about that.
1. what we did in the test from post 12, is testing while BATMAN-ADV is installed, and bat0 is up.
2. what we did in the test from post 20, is testing while BATMAN-ADV is not installed at all.
today we tested (1) again. and it seems the the issue reoccurs only if bat0 interface is up.
Updated by Marek Lindner about 7 years ago
Moshe Hoori wrote:
Hi Marek,
sorry for being unclear about that.1. what we did in the test from post 12, is testing while BATMAN-ADV is installed, and bat0 is up.
2. what we did in the test from post 20, is testing while BATMAN-ADV is not installed at all.today we tested (1) again. and it seems the the issue reoccurs only if bat0 interface is up.
Sorry, I don't want to continue this discussion here. The information you provide appears contradicting and unclear which makes it hard to help you. Just now, you throw in another aspect that wasn't discussed before: What is installed and what is not.
I recommend you either join our IRC channel for a more real-time discussion with questions & answers or you provide a comprehensive overview about what is working and what is not. Right now I can't tell.