Project

General

Profile

Actions

Bug #341

open

65% packet loss after node disconnection

Added by Moshe Hoori almost 7 years ago. Updated over 6 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
Target version:
-
Start date:
07/18/2017
Due date:
% Done:

0%

Estimated time:

Description

Hi,

my configuration is the following :

+-------+      +---------------+
|laptop |<---->|batman GateWay |<----> batman nodes(A,B,C)
+-------+      +---------------+
  • the laptop is not a part of the batman network. it is connected to the GW via ethernet
  • all the batman nodes are RocketM5 running batman 2017.1 BATMAN_V

scenario :

  1. All nodes are connected to batman network.
  2. Node A is shut down

the issue:

Ping to node B and C from laptop has about 65% packet loss

Thanks Alot!


Files

5_ot.txt (825 Bytes) 5_ot.txt Moshe Hoori, 07/19/2017 01:03 PM
4_ot.txt (2.1 KB) 4_ot.txt Moshe Hoori, 07/19/2017 01:03 PM
1_ot.txt (2.1 KB) 1_ot.txt Moshe Hoori, 07/19/2017 01:03 PM
gw_ot.txt (2.1 KB) gw_ot.txt Moshe Hoori, 07/19/2017 01:03 PM
ox1_to_cop2.ping (4.19 KB) ox1_to_cop2.ping david lichterov, 07/20/2017 11:14 AM
cop2_to_ox1.ping (13.1 KB) cop2_to_ox1.ping david lichterov, 07/20/2017 11:14 AM
br-lan.monitor (745 KB) br-lan.monitor david lichterov, 07/25/2017 11:10 AM
open-mesh.monitor.tar.bz2 (217 KB) open-mesh.monitor.tar.bz2 david lichterov, 07/25/2017 11:17 AM
tests.tar.bz2 (310 KB) tests.tar.bz2 david lichterov, 07/25/2017 02:38 PM
Actions #1

Updated by Sven Eckelmann almost 7 years ago

  • Description updated (diff)
  • Status changed from New to Feedback
  • Assignee changed from batman-adv developers to Moshe Hoori

Sample complaints as I had in #340#note-1

The bug description is also quite odd. Why is it expected to have lower than 65% packet loss when you remove the nearest [1] node which had a good connection [2] to the batman-adv gateway? A bad connection [3] will result in packet loss - so nothing unexpected here.

The bug also doesn't describe whether this is a temporary problem (which could be expected until a node times out in the originator table) or is a stable problem over multiple hours. The latter requires also a test which must shut everything down and then only starting B+C (and never A).

[1] at least I would assume that A is the nearest. Bug description is missing any information about that
[2] at least I would guess that it had a good connection. Bug description is missing any information about that
[3] at least I would guess that the connection to B and C from the gateway is bad. Bug description is missing any information about that

Actions #3

Updated by Moshe Hoori almost 7 years ago

1. A isn't the nearest . all the nodes are with great proximity to one another.
2. the problem is temporary, the ping gets better about 2 minutes after the issue occurs.
3. ping from the laptop provides same results as from the gateway.

Actions #4

Updated by Sven Eckelmann almost 7 years ago

What about the patches? Can the BATMAN_V developers please get the originator + neighbor table output from each device (beside the laptop) for

  1. node A is on and ping is fine
  2. node A is turned off and ping is bad
  3. node A is turned off and ping is good again

The output of

iw dev XXXXX station dump
would also be nice


Good question would also be whether you see this problem with BATMAN_IV.

Updated by Moshe Hoori almost 7 years ago

Hi,
Attached is the originator tables you requested.

this also happens with BATMAN_IV

thanks!

Actions #6

Updated by Sven Eckelmann almost 7 years ago

The nodeA (which goes down) doesn't seem to be the best next hop for anything but itself and there is only a single interface involved. Just to be sure, what is your build and runtime configuration for batman-adv? Do you see the packet loss with ipv4 and batctl ping? Or what kind of traffic are you using to detect the packet loss. Are you sure that you don't have additional traffic towards the removed node which causes airtime saturation due to the retries by the wifi hw/rate control? Did you do a capture on a wifi monitor interface and the mesh0 interface to detect where the traffic is routed and where it is dropped?

Did you check

iw dev mesh0 station dump
?

Has somebody else a good idea for what to look for? Here are the logs but with mac address replaced with human readable names.

start

nodeGW

  Originator      last-seen ( throughput)           Nexthop [outgoingIF]:   Potential nexthops ...
nodeC     0.120s (       24.0) nodeC  [     mesh0]: nodeA  (        9.4) nodeB  (        6.3) nodeC  (       24.0)
nodeB     0.480s (       24.0) nodeB  [     mesh0]: nodeA  (        8.8) nodeC  (        9.0) nodeB  (       24.0)
nodeA     0.360s (       20.4) nodeA  [     mesh0]: nodeB  (        8.7) nodeC  (        9.6) nodeA  (       20.4)

nodeA

  Originator      last-seen ( throughput)           Nexthop [outgoingIF]:   Potential nexthops ...
nodeC     0.470s (       23.9) nodeC  [     mesh0]: nodeGW (       15.0) nodeB  (        7.5) nodeC  (       23.9)
nodeB     0.910s (       17.4) nodeB  [     mesh0]: nodeC  (        9.4) nodeGW (       16.9) nodeB  (       17.4)
nodeGW    0.210s (       17.4) nodeGW [     mesh0]: nodeB  (       12.0) nodeC  (       12.0) nodeGW (       17.4)

nodeB

  Originator      last-seen ( throughput)           Nexthop [outgoingIF]:   Potential nexthops ...
nodeC     0.170s (       17.9) nodeGW [     mesh0]: nodeA  (        9.5) nodeGW (       17.9) nodeC  (       12.7)
nodeGW    0.840s (       24.0) nodeGW [     mesh0]: nodeA  (        9.2) nodeC  (       12.0) nodeGW (       24.0)
nodeA     0.320s (       17.3) nodeA  [     mesh0]: nodeC  (        9.7) nodeGW (       11.7) nodeA  (       17.3)

nodeC

  Originator      last-seen ( throughput)           Nexthop [outgoingIF]:   Potential nexthops ...
nodeB     0.360s (       18.1) nodeB  [     mesh0]: nodeA  (        8.7) nodeGW (       13.4) nodeB  (       18.1)
nodeGW    0.530s (       23.9) nodeGW [     mesh0]: nodeA  (        9.2) nodeB  (       12.0) nodeGW (       23.9)
nodeA     0.060s (       22.6) nodeA  [     mesh0]: nodeB  (        8.6) nodeGW (        9.5) nodeA  (       22.6)

node A turned off - high packet loss

nodeGW

  Originator      last-seen ( throughput)           Nexthop [outgoingIF]:   Potential nexthops ...
nodeC     0.350s (       27.3) nodeC  [     mesh0]: nodeA  (       11.8) nodeB  (        6.8) nodeC  (       27.3)
nodeB     0.900s (       24.0) nodeB  [     mesh0]: nodeA  (        8.7) nodeC  (       11.3) nodeB  (       24.0)
nodeA    11.490s (       18.6) nodeA  [     mesh0]: nodeB  (        8.6) nodeC  (       11.5) nodeA  (       18.6)

nodeA

This was the disconnected node.

nodeB

  Originator      last-seen ( throughput)           Nexthop [outgoingIF]:   Potential nexthops ...
nodeC     0.270s (       13.3) nodeGW [     mesh0]: nodeA  (       11.8) nodeGW (       13.3) nodeC  (       12.7)
nodeGW    0.010s (       25.5) nodeGW [     mesh0]: nodeA  (        9.5) nodeC  (       12.7) nodeGW (       25.5)
nodeA    12.370s (       17.3) nodeA  [     mesh0]: nodeC  (       11.5) nodeGW (        9.3) nodeA  (       17.3)

nodeC

  Originator      last-seen ( throughput)           Nexthop [outgoingIF]:   Potential nexthops ...
nodeB     0.670s (       20.7) nodeB  [     mesh0]: nodeA  (        8.7) nodeGW (       12.0) nodeB  (       20.7)
nodeGW    0.130s (       25.7) nodeGW [     mesh0]: nodeA  (        9.5) nodeB  (       12.5) nodeGW (       25.7)
nodeA    14.410s (       23.1) nodeA  [     mesh0]: nodeB  (        8.6) nodeGW (        9.3) nodeA  (       23.1)

node A turned off - ping good again

nodeGW

  Originator      last-seen ( throughput)           Nexthop [outgoingIF]:   Potential nexthops ...
nodeC     0.110s (       24.0) nodeC  [     mesh0]: nodeA  (       11.8) nodeB  (        6.3) nodeC  (       24.0)
nodeB     0.630s (       24.0) nodeB  [     mesh0]: nodeA  (        8.7) nodeC  (        9.5) nodeB  (       24.0)
nodeA   128.700s (       18.6) nodeA  [     mesh0]: nodeB  (        8.6) nodeC  (       11.5) nodeA  (       18.6)

nodeA

Not connected.

nodeB

  Originator      last-seen ( throughput)           Nexthop [outgoingIF]:   Potential nexthops ...
nodeC     0.930s (       12.7) nodeC  [     mesh0]: nodeA  (       11.8) nodeGW (       12.0) nodeC  (       12.7)
nodeGW    0.850s (       24.0) nodeGW [     mesh0]: nodeA  (        9.5) nodeC  (       12.0) nodeGW (       24.0)
nodeA   131.530s (       17.3) nodeA  [     mesh0]: nodeC  (       11.5) nodeGW (        9.3) nodeA  (       17.3)

nodeC

  Originator      last-seen ( throughput)           Nexthop [outgoingIF]:   Potential nexthops ...
nodeB     0.640s (       19.1) nodeB  [     mesh0]: nodeA  (        8.7) nodeGW (       12.1) nodeB  (       19.1)
nodeGW    0.080s (       24.0) nodeGW [     mesh0]: nodeA  (        9.5) nodeB  (       12.0) nodeGW (       24.0)
nodeA   177.920s (       23.1) nodeA  [     mesh0]: nodeB  (        8.6) nodeGW (        9.3) nodeA  (       23.1)

Updated by david lichterov over 6 years ago

Sven Eckelmann wrote:

The nodeA (which goes down) doesn't seem to be the best next hop for anything but itself and there is only a single interface involved. Just to be sure, what is your build and runtime configuration for batman-adv? Do you see the packet loss with ipv4 and batctl ping? Or what kind of traffic are you using to detect the packet loss. Are you sure that you don't have additional traffic towards the removed node which causes airtime saturation due to the retries by the wifi hw/rate control? Did you do a capture on a wifi monitor interface and the mesh0 interface to detect where the traffic is routed and where it is dropped?

Did you check [...]?

Has somebody else a good idea for what to look for? Here are the logs but with mac address replaced with human readable names.

Hey, i work with Moshe on this problem i have some of the details that you requested .
Can you explain what you mean by "build and runtime configuration for batman-adv" ?
Did you mean this :

root@OpenWrt:~# cat /etc/config/network 

config interface 'loopback'
    option ifname 'lo'
    option proto 'static'
    option ipaddr '127.0.0.1'
    option netmask '255.0.0.0'

config globals 'globals'
    option ula_prefix 'fd35:991f:6257::/48'

config interface 'lan'
    option force_link '1'
    option type 'bridge'
    option proto 'static'
    option netmask '255.255.255.0'
    option ip6assign '60'
    option _orig_bridge 'true'
    option ifname 'bat0 eth0'
    option ipaddr '192.168.1.17'

config interface 'mesh'
    option mtu '1532'
    option proto 'batadv'
    option mesh 'bat0'
    option routing_algo 'BATMAN_V'

config interface 'bat'
    option ifname 'bat0'
    option proto 'static'
    option mtu '1500'
    option ipaddr '10.0.0.20'
    option netmask '255.255.255.0'

root@OpenWrt:~# cat /etc/config/wireless
config wifi-device 'radio0'
    option type 'mac80211'
    option path 'platform/ar934x_wmac'
    option htmode 'HT20'
    option hwmode '11a'
    option txpower '22'
    option country 'IL'
    option channel '36'

config wifi-iface
    option device 'radio0'
    option ssid 'OpenWrt'
    option ifname 'mesh0'
    option network 'mesh'
    option mode 'adhoc'
    option bssid '02:CA:FE:CA:CA:40'
    option mcast_rate '18000'
    option encryption 'none'

we tried to do pings from batctl and we are seeing the same thing as above (I am attaching files with ping results).

And bellow is the output of

iw dev mesh0 station dump

root@OpenWrt:~#  iw dev mesh0 station dump
Station 44:d9:e7:5c:e1:f7 (on mesh0)
    inactive time:    0 ms
    rx bytes:    198449132
    rx packets:    447840
    tx bytes:    16424007
    tx packets:    98882
    tx retries:    6193
    tx failed:    1208
    signal:      -35 [-36, -41] dBm
    signal avg:    -36 [-37, -43] dBm
    tx bitrate:    144.4 MBit/s MCS 15 short GI
    rx bitrate:    144.4 MBit/s MCS 15 short GI
    expected throughput:    87.798Mbps
    authorized:    yes
    authenticated:    yes
    preamble:    long
    WMM/WME:    yes
    MFP:        no
    TDLS peer:    no
    connected time:    7893 seconds
Station 04:18:d6:f6:49:f4 (on mesh0)
    inactive time:    10 ms
    rx bytes:    88169529
    rx packets:    153829
    tx bytes:    1878035
    tx packets:    14428
    tx retries:    2281
    tx failed:    255
    signal:      -39 [-40, -46] dBm
    signal avg:    -40 [-41, -46] dBm
    tx bitrate:    144.4 MBit/s MCS 15 short GI
    rx bitrate:    130.0 MBit/s MCS 14 short GI
    expected throughput:    89.355Mbps
    authorized:    yes
    authenticated:    yes
    preamble:    long
    WMM/WME:    yes
    MFP:        no
    TDLS peer:    no
    connected time:    372 seconds
root@OpenWrt:~#  iw dev mesh0 station dump
Station 04:18:d6:cc:93:9b (on mesh0)
    inactive time:    0 ms
    rx bytes:    56219995
    rx packets:    178963
    tx bytes:    58290787
    tx packets:    44390
    tx retries:    4080
    tx failed:    0
    signal:      -40 [-48, -41] dBm
    signal avg:    -41 [-49, -42] dBm
    tx bitrate:    144.4 MBit/s MCS 15 short GI
    rx bitrate:    144.4 MBit/s MCS 15 short GI
    expected throughput:    47.57Mbps
    authorized:    yes
    authenticated:    yes
    preamble:    long
    WMM/WME:    yes
    MFP:        no
    TDLS peer:    no
    connected time:    399 seconds
Station 44:d9:e7:5c:e1:f7 (on mesh0)
    inactive time:    10 ms
    rx bytes:    52802022
    rx packets:    147025
    tx bytes:    416432
    tx packets:    1467
    tx retries:    67
    tx failed:    0
    signal:      -27 [-29, -31] dBm
    signal avg:    -27 [-30, -31] dBm
    tx bitrate:    144.4 MBit/s MCS 15 short GI
    rx bitrate:    144.4 MBit/s MCS 15 short GI
    expected throughput:    47.57Mbps
    authorized:    yes
    authenticated:    yes
    preamble:    long
    WMM/WME:    yes
    MFP:        no
    TDLS peer:    no
    connected time:    339 seconds

root@OpenWrt:~# iw dev mesh0 station dump
Station 04:18:d6:cc:93:9b (on mesh0)
    inactive time:    0 ms
    rx bytes:    117922138
    rx packets:    365641
    tx bytes:    461079664
    tx packets:    322163
    tx retries:    19107
    tx failed:    0
    signal:      -43 [-45, -47] dBm
    signal avg:    -39 [-41, -43] dBm
    tx bitrate:    144.4 MBit/s MCS 15 short GI
    rx bitrate:    144.4 MBit/s MCS 15 short GI
    expected throughput:    80.108Mbps
    authorized:    yes
    authenticated:    yes
    preamble:    long
    WMM/WME:    yes
    MFP:        no
    TDLS peer:    no
    connected time:    779 seconds
Station 04:18:d6:f6:49:f4 (on mesh0)
    inactive time:    10 ms
    rx bytes:    114449063
    rx packets:    315425
    tx bytes:    964221
    tx packets:    3338
    tx retries:    79
    tx failed:    0
    signal:      -26 [-36, -27] dBm
    signal avg:    -26 [-35, -26] dBm
    tx bitrate:    130.0 MBit/s MCS 14 short GI
    rx bitrate:    144.4 MBit/s MCS 15 short GI
    expected throughput:    45.43Mbps
    authorized:    yes
    authenticated:    yes
    preamble:    long
    WMM/WME:    yes
    MFP:        no
    TDLS peer:    no
    connected time:    779 seconds
Actions #8

Updated by Sven Eckelmann over 6 years ago

Hey, i work with Moshe on this problem i have some of the details that you requested .
Can you explain what you mean by "build and runtime configuration for batman-adv" ?

I meant with "build configuration" the the options which you've enabled during build time like:

  • CONFIG_BATMAN_ADV_DEBUG
  • CONFIG_BATMAN_ADV_DEBUGFS
  • CONFIG_BATMAN_ADV_BLA
  • CONFIG_BATMAN_ADV_DAT
  • CONFIG_BATMAN_ADV_NC
  • CONFIG_BATMAN_ADV_MCAST
  • CONFIG_BATMAN_ADV_BATMAN_V

The network and wireless options are interesting - but you've missed /etc/config/batman-adv and anything which you change manually during runtime.

The output of iw dev mesh0 station dump which you gave us is unfortunately meaningless at the moment. It is not known when you've taken it. And it doesn't look like you've taken it during each of the previously suggested stages (all connected, nodeA turned off and packet loss, nodeA turned off and good packet loss)

What about the other questions:

  • Are you sure that you don't have additional traffic towards the removed node which causes airtime saturation due to the retries by the wifi hw/rate control?
  • Did you do a capture on a wifi monitor interface and the mesh0 interface to detect where the traffic is routed and where it is dropped?

Right now it just looks like the latency increases by a lot when nodeA gets turned off. This could (but doesn't have to be) be related to some packets which gets retransmitted quite often by the wifi driver/hw when nodeA disappears (and therefore cannot ACK packets anymore). Would therefore be interesting to know whether this problem disappears when the wifi driver drops this station from its neighbor list. And it would of course be interesting what is actually be transmitted by the wifi device (hence the wifi monitor dumps).

Actions #9

Updated by Marek Lindner over 6 years ago

My 2 cents:

It might be interesting to configure the adhoc interface with IP addresses and repeat the same test on that interface. Since this will bypass batman-adv (which is not needed in this simple scenario) it would tell us whether this is a problem created by batman-adv.

Assuming the adhoc-ping-test does not show the same timeout behavior, you could also play with batctl ping / batctl traceroute. The layer2 ping / traceroute might tell us if this is a layer 2 or layer 3 issue and can also show route changes (if any).

Actions #10

Updated by david lichterov over 6 years ago

Sven Eckelmann wrote:

Hey, i work with Moshe on this problem i have some of the details that you requested .
Can you explain what you mean by "build and runtime configuration for batman-adv" ?

I meant with "build configuration" the the options which you've enabled during build time like:

  • CONFIG_BATMAN_ADV_DEBUG
  • CONFIG_BATMAN_ADV_DEBUGFS
  • CONFIG_BATMAN_ADV_BLA
  • CONFIG_BATMAN_ADV_DAT
  • CONFIG_BATMAN_ADV_NC
  • CONFIG_BATMAN_ADV_MCAST
  • CONFIG_BATMAN_ADV_BATMAN_V

Our config is :
CONFIG_PACKAGE_kmod-batman-adv=y
CONFIG_KMOD_BATMAN_ADV_DEBUG_LOG=y
CONFIG_KMOD_BATMAN_ADV_BLA=y
CONFIG_KMOD_BATMAN_ADV_DAT=y
CONFIG_KMOD_BATMAN_ADV_DEBUGFS=y
CONFIG_KMOD_BATMAN_ADV_MCAST=y
CONFIG_KMOD_BATMAN_ADV_NC=y
CONFIG_KMOD_BATMAN_ADV_BATMAN_V=y

The network and wireless options are interesting - but you've missed /etc/config/batman-adv and anything which you change manually during runtime.

root@OpenWrt:~# cat /etc/config/batman-adv

config mesh 'bat0'
option gw_mode 'server'

The output of iw dev mesh0 station dump which you gave us is unfortunately meaningless at the moment. It is not known when you've taken it. And it doesn't look like you've taken it during each of the previously suggested stages (all connected, nodeA turned off and packet loss, nodeA turned off and good packet loss)

What about the other questions:

  • Are you sure that you don't have additional traffic towards the removed node which causes airtime saturation due to the retries by the wifi hw/rate control?

The dumps bellow should be without any additional traffic towards the removed node.

  • Did you do a capture on a wifi monitor interface and the mesh0 interface to detect where the traffic is routed and where it is dropped?

I am adding the dumps of a capture on the wifi interface and mesh0 interface . We start the monitor when there are 3 nodes connected and a laptop that's connected with ethernet cable (not on the mesh) to the node that we monitor. After approximately 60 seconds we disconnect node (MAC 04:18:D6:F6:49:F4 , IP 192.168.1.15). The capture was done on the interfaces of mesh=04:18:D6:CC:93:9B, br-lan=04:18:D6:CD:93:9B, ip=192.168.1.42.

Actions #11

Updated by david lichterov over 6 years ago

Attaching the monitor files again.

Actions #12

Updated by Sven Eckelmann over 6 years ago

There is no monitor capture in the bz2.. Please refer to https://wireless.wiki.kernel.org/en/users/documentation/iw#modifying_monitor_interface_flags to see how to create a monitor interface. And please also create pcaps with "tcpdump -w /tmp/blabla.pcap ...."

The only thing which I saw in you captures is that there is traffic towards 04:18:d6:f6:49:f4 (which is the one which is offline). Most of it are ELP messages. But you told us that it also happens with BATMAN_IV and ELP doesn't exist in BATMAN_IV. So these should not be the culprit.

Did you try the test from #341#note-9? You can configure an ip manually on mesh0 using using

node1 $ ifconfig mesh0 192.168.25.1

node2 $ ifconfig mesh0 192.168.25.2
node2 $ ping -c 20 192.168.25.1
Actions #13

Updated by david lichterov over 6 years ago

Sven Eckelmann wrote:

There is no monitor capture in the bz2.. Please refer to https://wireless.wiki.kernel.org/en/users/documentation/iw#modifying_monitor_interface_flags to see how to create a monitor interface. And please also create pcaps with "tcpdump -w /tmp/blabla.pcap ...."

Trying again... attaching the out put files.

The only thing which I saw in you captures is that there is traffic towards 04:18:d6:f6:49:f4 (which is the one which is offline). Most of it are ELP messages. But you told us that it also happens with BATMAN_IV and ELP doesn't exist in BATMAN_IV. So these should not be the culprit.

Did you try the test from #341#note-9? You can configure an ip manually on mesh0 using using

[...]

I tried here is the results of the ping :

root@OpenWrt:~# ping 192.168.25.3
PING 192.168.25.3 (192.168.25.3): 56 data bytes
64 bytes from 192.168.25.3: seq=0 ttl=64 time=1.581 ms
64 bytes from 192.168.25.3: seq=1 ttl=64 time=1.416 ms
64 bytes from 192.168.25.3: seq=2 ttl=64 time=1.407 ms
64 bytes from 192.168.25.3: seq=3 ttl=64 time=1.397 ms
64 bytes from 192.168.25.3: seq=4 ttl=64 time=1.413 ms
64 bytes from 192.168.25.3: seq=5 ttl=64 time=1.507 ms
64 bytes from 192.168.25.3: seq=6 ttl=64 time=1.440 ms
64 bytes from 192.168.25.3: seq=7 ttl=64 time=1.676 ms
64 bytes from 192.168.25.3: seq=8 ttl=64 time=2.269 ms
64 bytes from 192.168.25.3: seq=9 ttl=64 time=1.407 ms
64 bytes from 192.168.25.3: seq=10 ttl=64 time=1.823 ms
64 bytes from 192.168.25.3: seq=11 ttl=64 time=1.401 ms
64 bytes from 192.168.25.3: seq=12 ttl=64 time=1.399 ms
64 bytes from 192.168.25.3: seq=13 ttl=64 time=1.389 ms
64 bytes from 192.168.25.3: seq=14 ttl=64 time=1.426 ms
64 bytes from 192.168.25.3: seq=15 ttl=64 time=1.384 ms
64 bytes from 192.168.25.3: seq=16 ttl=64 time=1.385 ms
64 bytes from 192.168.25.3: seq=17 ttl=64 time=2.075 ms
64 bytes from 192.168.25.3: seq=18 ttl=64 time=1.394 ms
64 bytes from 192.168.25.3: seq=19 ttl=64 time=1.400 ms
64 bytes from 192.168.25.3: seq=20 ttl=64 time=1.375 ms
64 bytes from 192.168.25.3: seq=21 ttl=64 time=1.450 ms
64 bytes from 192.168.25.3: seq=22 ttl=64 time=1.358 ms
64 bytes from 192.168.25.3: seq=23 ttl=64 time=1.374 ms
64 bytes from 192.168.25.3: seq=24 ttl=64 time=1.373 ms
64 bytes from 192.168.25.3: seq=25 ttl=64 time=1.375 ms
64 bytes from 192.168.25.3: seq=26 ttl=64 time=1.375 ms
64 bytes from 192.168.25.3: seq=27 ttl=64 time=15.474 ms
64 bytes from 192.168.25.3: seq=28 ttl=64 time=1.370 ms
64 bytes from 192.168.25.3: seq=29 ttl=64 time=1.379 ms
64 bytes from 192.168.25.3: seq=30 ttl=64 time=1.375 ms
64 bytes from 192.168.25.3: seq=31 ttl=64 time=1.364 ms
64 bytes from 192.168.25.3: seq=32 ttl=64 time=1.374 ms
64 bytes from 192.168.25.3: seq=33 ttl=64 time=1.374 ms
64 bytes from 192.168.25.3: seq=34 ttl=64 time=1.388 ms
64 bytes from 192.168.25.3: seq=35 ttl=64 time=1.379 ms
64 bytes from 192.168.25.3: seq=36 ttl=64 time=1.344 ms
64 bytes from 192.168.25.3: seq=37 ttl=64 time=1.842 ms
64 bytes from 192.168.25.3: seq=38 ttl=64 time=1.507 ms
64 bytes from 192.168.25.3: seq=39 ttl=64 time=2.994 ms
64 bytes from 192.168.25.3: seq=40 ttl=64 time=1.398 ms
64 bytes from 192.168.25.3: seq=41 ttl=64 time=1.391 ms
64 bytes from 192.168.25.3: seq=42 ttl=64 time=1.710 ms
64 bytes from 192.168.25.3: seq=43 ttl=64 time=1.400 ms
64 bytes from 192.168.25.3: seq=44 ttl=64 time=1.391 ms
64 bytes from 192.168.25.3: seq=45 ttl=64 time=1.391 ms
64 bytes from 192.168.25.3: seq=46 ttl=64 time=116.681 ms
64 bytes from 192.168.25.3: seq=47 ttl=64 time=26.692 ms
64 bytes from 192.168.25.3: seq=48 ttl=64 time=57.789 ms
64 bytes from 192.168.25.3: seq=49 ttl=64 time=53.890 ms
64 bytes from 192.168.25.3: seq=50 ttl=64 time=224.817 ms
64 bytes from 192.168.25.3: seq=51 ttl=64 time=191.852 ms
64 bytes from 192.168.25.3: seq=52 ttl=64 time=31.281 ms
64 bytes from 192.168.25.3: seq=53 ttl=64 time=44.700 ms
64 bytes from 192.168.25.3: seq=54 ttl=64 time=157.240 ms
64 bytes from 192.168.25.3: seq=55 ttl=64 time=91.909 ms
64 bytes from 192.168.25.3: seq=56 ttl=64 time=9.933 ms
64 bytes from 192.168.25.3: seq=57 ttl=64 time=22.053 ms
64 bytes from 192.168.25.3: seq=58 ttl=64 time=43.157 ms
64 bytes from 192.168.25.3: seq=59 ttl=64 time=56.001 ms
64 bytes from 192.168.25.3: seq=60 ttl=64 time=154.643 ms
64 bytes from 192.168.25.3: seq=61 ttl=64 time=44.681 ms
64 bytes from 192.168.25.3: seq=62 ttl=64 time=62.938 ms
64 bytes from 192.168.25.3: seq=63 ttl=64 time=117.767 ms
64 bytes from 192.168.25.3: seq=64 ttl=64 time=68.347 ms
64 bytes from 192.168.25.3: seq=65 ttl=64 time=53.134 ms
64 bytes from 192.168.25.3: seq=66 ttl=64 time=55.873 ms
64 bytes from 192.168.25.3: seq=67 ttl=64 time=67.072 ms
64 bytes from 192.168.25.3: seq=68 ttl=64 time=136.168 ms
64 bytes from 192.168.25.3: seq=69 ttl=64 time=62.937 ms
64 bytes from 192.168.25.3: seq=70 ttl=64 time=103.860 ms
64 bytes from 192.168.25.3: seq=71 ttl=64 time=39.309 ms
64 bytes from 192.168.25.3: seq=72 ttl=64 time=49.363 ms
64 bytes from 192.168.25.3: seq=73 ttl=64 time=69.935 ms
64 bytes from 192.168.25.3: seq=74 ttl=64 time=126.700 ms
64 bytes from 192.168.25.3: seq=75 ttl=64 time=144.226 ms
64 bytes from 192.168.25.3: seq=76 ttl=64 time=81.853 ms
64 bytes from 192.168.25.3: seq=77 ttl=64 time=83.896 ms
64 bytes from 192.168.25.3: seq=78 ttl=64 time=2.479 ms
64 bytes from 192.168.25.3: seq=79 ttl=64 time=70.421 ms
64 bytes from 192.168.25.3: seq=80 ttl=64 time=187.253 ms
64 bytes from 192.168.25.3: seq=81 ttl=64 time=266.937 ms
64 bytes from 192.168.25.3: seq=82 ttl=64 time=194.640 ms
64 bytes from 192.168.25.3: seq=83 ttl=64 time=87.860 ms
64 bytes from 192.168.25.3: seq=84 ttl=64 time=81.853 ms
64 bytes from 192.168.25.3: seq=85 ttl=64 time=152.013 ms
64 bytes from 192.168.25.3: seq=86 ttl=64 time=118.396 ms
64 bytes from 192.168.25.3: seq=87 ttl=64 time=51.432 ms
64 bytes from 192.168.25.3: seq=88 ttl=64 time=94.313 ms
64 bytes from 192.168.25.3: seq=89 ttl=64 time=58.187 ms
64 bytes from 192.168.25.3: seq=90 ttl=64 time=220.814 ms
64 bytes from 192.168.25.3: seq=91 ttl=64 time=114.451 ms
64 bytes from 192.168.25.3: seq=92 ttl=64 time=31.864 ms
64 bytes from 192.168.25.3: seq=93 ttl=64 time=129.706 ms
64 bytes from 192.168.25.3: seq=94 ttl=64 time=69.393 ms
64 bytes from 192.168.25.3: seq=95 ttl=64 time=277.520 ms
64 bytes from 192.168.25.3: seq=96 ttl=64 time=236.258 ms
64 bytes from 192.168.25.3: seq=97 ttl=64 time=43.555 ms
64 bytes from 192.168.25.3: seq=98 ttl=64 time=90.342 ms
64 bytes from 192.168.25.3: seq=99 ttl=64 time=35.094 ms
64 bytes from 192.168.25.3: seq=100 ttl=64 time=310.098 ms
64 bytes from 192.168.25.3: seq=101 ttl=64 time=157.963 ms
64 bytes from 192.168.25.3: seq=102 ttl=64 time=79.196 ms
64 bytes from 192.168.25.3: seq=103 ttl=64 time=92.743 ms
64 bytes from 192.168.25.3: seq=104 ttl=64 time=157.861 ms
64 bytes from 192.168.25.3: seq=105 ttl=64 time=227.749 ms
64 bytes from 192.168.25.3: seq=106 ttl=64 time=1.480 ms
64 bytes from 192.168.25.3: seq=107 ttl=64 time=1.378 ms
64 bytes from 192.168.25.3: seq=108 ttl=64 time=1.354 ms
64 bytes from 192.168.25.3: seq=109 ttl=64 time=1.544 ms
64 bytes from 192.168.25.3: seq=110 ttl=64 time=1.360 ms
64 bytes from 192.168.25.3: seq=111 ttl=64 time=1.460 ms
64 bytes from 192.168.25.3: seq=112 ttl=64 time=1.412 ms
64 bytes from 192.168.25.3: seq=113 ttl=64 time=1.375 ms
64 bytes from 192.168.25.3: seq=114 ttl=64 time=1.385 ms
64 bytes from 192.168.25.3: seq=115 ttl=64 time=1.375 ms
^C
--- 192.168.25.3 ping statistics ---
116 packets transmitted, 116 packets received, 0% packet loss
round-trip min/avg/max = 1.344/55.071/310.098 ms

it's seems not better then the tests that we did before.

Actions #14

Updated by david lichterov over 6 years ago

it's seems not better then the tests that we did before.

i meant to say that it seems better .

Actions #15

Updated by david lichterov over 6 years ago

One more thing. i did the monitor only on mesh0. those are the interfaces that we have :

root@OpenWrt:/tmp# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-lan state UP qlen 1000
    link/ether 04:18:d6:cd:93:9b brd ff:ff:ff:ff:ff:ff
4: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 04:18:d6:cd:93:9b brd ff:ff:ff:ff:ff:ff
5: mesh0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1532 qdisc mq master bat0 state UP qlen 1000
    link/ether 04:18:d6:cc:93:9b brd ff:ff:ff:ff:ff:ff
6: bat0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UNKNOWN qlen 1000
    link/ether 7e:1f:b4:f4:a5:cc brd ff:ff:ff:ff:ff:ff
7: fish0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
    link/[803] 04:18:d6:cc:93:9b brd ff:ff:ff:ff:ff:ff

should i monitor any thing else ?

Actions #16

Updated by david lichterov over 6 years ago

Sven Eckelmann wrote:

Did you try the test from #341#note-9? You can configure an ip manually on mesh0 using using

[...]

Hey,
does the test from #341#note-9 and the results that we got suggests that the issue is not with batman?

Actions #17

Updated by Sven Eckelmann over 6 years ago

No, right now it still looks more like the queuing (fq codel + fair airtime?) and rate settings of your wifi stack/driver is to blame here.

And I told you about the monitor interface and dumps on it in #341#note-13 and #341#note-8. See #341#note-8 regarding you iw dev mesh0 station dump output. There are also other ideas in the ticket about things which can be tested to see whether batman-adv is to blame ("route changes", ...)

Actions #18

Updated by Marek Lindner over 6 years ago

Sven Eckelmann wrote:

No, right now it still looks more like the queuing (fq codel + fair airtime?) and rate settings of your wifi stack/driver is to blame here.

I agree with Sven. The latency values in that test run might not be as high as during previous runs but generally, deactivating an unrelated WiFi neighbor should not increase latency anywhere. If anything, it should reduce latency.

Assuming the latency is created by the WiFi layer (Wifi driver, analog noise, queues, etc) you will always see that latency in batman-adv too. You could also start poking in the WiFi layer. For instance, check the driver you're using. Is it an old version with bugs ? Is it bleeding edge ? Are all test devices using Atheros AR934X ?

Actions #20

Updated by david lichterov over 6 years ago

Hey Sven,
Thank you for the pointers and help.
Before we reached out to Toke and we wanted to redo the tests and make sure that we get the same results.
1. What we did last time was to ping through mesh0 interface as was suggested above. The test resulted in poor ping quality when one of the nodes was disconnected.
2. In the current test we configured an ad-hoc network on 3 nodes, With the configuration bellow:

root@OpenWrt:/# cat /etc/config/network                                         

config interface 'loopback'                                                     
        option ifname 'lo'                                                      
        option proto 'static'                                                   
        option ipaddr '127.0.0.1'                                               
        option netmask '255.0.0.0'                                              

config globals 'globals'                                                        
        option ula_prefix 'fdf8:4b2c:88b1::/48'                                 

config interface 'lan'                                                          
        option type 'bridge'                                                    
        option ifname 'eth0 wlan0'                                              
        option proto 'static'                                                   
        option ipaddr '192.168.1.12'                                            
        option netmask '255.255.255.0'                                          
        option ip6assign '60' 

root@OpenWrt:/# cat /etc/config/wireless                                        
config wifi-device  radio0                                                      
        option type     mac80211                                                
        option channel  36                                                      
        option hwmode   11a                                                     
        option path     'platform/ar934x_wmac'                                  
        option htmode   HT20                                                    

config wifi-iface                                                               
        option device   radio0                                                  
        option network  lan                                                     
        option mode     adhoc                                                   
        option ssid     OpenWrt                                                 
        option encryption none  

root@OpenWrt:/# ifconfig                                                        
br-lan    Link encap:Ethernet  HWaddr 04:18:D6:F7:49:E3                         
          inet addr:192.168.1.12  Bcast:192.168.1.255  Mask:255.255.255.0       
          inet6 addr: fdf8:4b2c:88b1::1/60 Scope:Global                         
          UP BROADCAST MULTICAST  MTU:1500  Metric:1                            
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                    
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                  
          collisions:0 txqueuelen:1000                                          
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                

eth0      Link encap:Ethernet  HWaddr 04:18:D6:F7:49:E3                         
          UP BROADCAST MULTICAST  MTU:1500  Metric:1                            
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                    
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                  
          collisions:0 txqueuelen:1000                                          
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                
          Interrupt:4                                                           

lo        Link encap:Local Loopback                                             
          inet addr:127.0.0.1  Mask:255.0.0.0                                   
          inet6 addr: ::1/128 Scope:Host                                        
          UP LOOPBACK RUNNING  MTU:65536  Metric:1                              
          RX packets:124 errors:0 dropped:0 overruns:0 frame:0                  
          TX packets:124 errors:0 dropped:0 overruns:0 carrier:0                
          collisions:0 txqueuelen:1                                             
          RX bytes:8544 (8.3 KiB)  TX bytes:8544 (8.3 KiB)                      

wlan0     Link encap:Ethernet  HWaddr 04:18:D6:F6:49:E3                         
          inet addr:172.16.0.201  Bcast:172.16.255.255  Mask:255.255.0.0        
          inet6 addr: fe80::618:d6ff:fef6:49e3/64 Scope:Link                    
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                    
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                    
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0                  
          collisions:0 txqueuelen:1000                                          
          RX bytes:0 (0.0 B)  TX bytes:864 (864.0 B)    

The ping test was done from node A to node B and C, node C was disconnected. The result of that ping test :

root@OpenWrt:/# ping 172.16.0.202
PING 172.16.0.202 (172.16.0.202): 56 data bytes
64 bytes from 172.16.0.202: seq=0 ttl=64 time=1.502 ms
64 bytes from 172.16.0.202: seq=1 ttl=64 time=1.399 ms
64 bytes from 172.16.0.202: seq=2 ttl=64 time=1.398 ms
64 bytes from 172.16.0.202: seq=3 ttl=64 time=1.442 ms
64 bytes from 172.16.0.202: seq=4 ttl=64 time=1.402 ms
64 bytes from 172.16.0.202: seq=5 ttl=64 time=1.143 ms
64 bytes from 172.16.0.202: seq=6 ttl=64 time=1.777 ms
64 bytes from 172.16.0.202: seq=7 ttl=64 time=1.459 ms
64 bytes from 172.16.0.202: seq=8 ttl=64 time=1.411 ms
64 bytes from 172.16.0.202: seq=9 ttl=64 time=1.421 ms
64 bytes from 172.16.0.202: seq=10 ttl=64 time=1.407 ms
64 bytes from 172.16.0.202: seq=11 ttl=64 time=1.399 ms
64 bytes from 172.16.0.202: seq=12 ttl=64 time=1.458 ms
64 bytes from 172.16.0.202: seq=13 ttl=64 time=1.407 ms
64 bytes from 172.16.0.202: seq=14 ttl=64 time=1.397 ms
64 bytes from 172.16.0.202: seq=15 ttl=64 time=1.392 ms
64 bytes from 172.16.0.202: seq=16 ttl=64 time=1.636 ms
64 bytes from 172.16.0.202: seq=17 ttl=64 time=1.428 ms
64 bytes from 172.16.0.202: seq=18 ttl=64 time=1.381 ms
64 bytes from 172.16.0.202: seq=19 ttl=64 time=1.406 ms
64 bytes from 172.16.0.202: seq=20 ttl=64 time=1.400 ms
64 bytes from 172.16.0.202: seq=21 ttl=64 time=1.403 ms
64 bytes from 172.16.0.202: seq=22 ttl=64 time=7.701 ms
64 bytes from 172.16.0.202: seq=23 ttl=64 time=1.417 ms
64 bytes from 172.16.0.202: seq=24 ttl=64 time=1.413 ms
64 bytes from 172.16.0.202: seq=25 ttl=64 time=1.407 ms
64 bytes from 172.16.0.202: seq=26 ttl=64 time=1.396 ms
64 bytes from 172.16.0.202: seq=27 ttl=64 time=5.830 ms
64 bytes from 172.16.0.202: seq=28 ttl=64 time=1.395 ms
64 bytes from 172.16.0.202: seq=29 ttl=64 time=1.397 ms
64 bytes from 172.16.0.202: seq=30 ttl=64 time=1.401 ms
64 bytes from 172.16.0.202: seq=31 ttl=64 time=6.423 ms
64 bytes from 172.16.0.202: seq=32 ttl=64 time=1.434 ms
64 bytes from 172.16.0.202: seq=33 ttl=64 time=1.406 ms
64 bytes from 172.16.0.202: seq=34 ttl=64 time=1.396 ms
64 bytes from 172.16.0.202: seq=35 ttl=64 time=1.408 ms
64 bytes from 172.16.0.202: seq=36 ttl=64 time=1.424 ms
64 bytes from 172.16.0.202: seq=37 ttl=64 time=1.456 ms
64 bytes from 172.16.0.202: seq=38 ttl=64 time=8.808 ms
64 bytes from 172.16.0.202: seq=39 ttl=64 time=1.409 ms
64 bytes from 172.16.0.202: seq=40 ttl=64 time=1.520 ms
64 bytes from 172.16.0.202: seq=41 ttl=64 time=1.458 ms
64 bytes from 172.16.0.202: seq=42 ttl=64 time=1.450 ms
64 bytes from 172.16.0.202: seq=43 ttl=64 time=1.393 ms
64 bytes from 172.16.0.202: seq=44 ttl=64 time=1.765 ms
64 bytes from 172.16.0.202: seq=45 ttl=64 time=1.436 ms
64 bytes from 172.16.0.202: seq=46 ttl=64 time=1.406 ms
64 bytes from 172.16.0.202: seq=47 ttl=64 time=1.408 ms
64 bytes from 172.16.0.202: seq=48 ttl=64 time=5.378 ms
64 bytes from 172.16.0.202: seq=49 ttl=64 time=1.416 ms
64 bytes from 172.16.0.202: seq=50 ttl=64 time=1.476 ms
64 bytes from 172.16.0.202: seq=51 ttl=64 time=1.403 ms
64 bytes from 172.16.0.202: seq=52 ttl=64 time=1.403 ms
64 bytes from 172.16.0.202: seq=53 ttl=64 time=1.464 ms
64 bytes from 172.16.0.202: seq=54 ttl=64 time=1.407 ms
64 bytes from 172.16.0.202: seq=55 ttl=64 time=1.412 ms
64 bytes from 172.16.0.202: seq=56 ttl=64 time=1.416 ms
64 bytes from 172.16.0.202: seq=57 ttl=64 time=2.249 ms
64 bytes from 172.16.0.202: seq=58 ttl=64 time=1.408 ms
64 bytes from 172.16.0.202: seq=59 ttl=64 time=1.407 ms
64 bytes from 172.16.0.202: seq=60 ttl=64 time=1.405 ms
64 bytes from 172.16.0.202: seq=61 ttl=64 time=1.422 ms
64 bytes from 172.16.0.202: seq=62 ttl=64 time=1.406 ms
64 bytes from 172.16.0.202: seq=63 ttl=64 time=1.433 ms
64 bytes from 172.16.0.202: seq=64 ttl=64 time=1.408 ms
64 bytes from 172.16.0.202: seq=65 ttl=64 time=1.400 ms
64 bytes from 172.16.0.202: seq=66 ttl=64 time=1.404 ms
64 bytes from 172.16.0.202: seq=67 ttl=64 time=1.430 ms
64 bytes from 172.16.0.202: seq=68 ttl=64 time=1.426 ms
64 bytes from 172.16.0.202: seq=69 ttl=64 time=1.407 ms
64 bytes from 172.16.0.202: seq=70 ttl=64 time=3.037 ms
64 bytes from 172.16.0.202: seq=71 ttl=64 time=1.448 ms
64 bytes from 172.16.0.202: seq=72 ttl=64 time=1.419 ms
64 bytes from 172.16.0.202: seq=73 ttl=64 time=1.414 ms
64 bytes from 172.16.0.202: seq=74 ttl=64 time=1.452 ms
64 bytes from 172.16.0.202: seq=75 ttl=64 time=1.422 ms
64 bytes from 172.16.0.202: seq=76 ttl=64 time=1.725 ms
64 bytes from 172.16.0.202: seq=77 ttl=64 time=1.416 ms
64 bytes from 172.16.0.202: seq=78 ttl=64 time=1.412 ms
64 bytes from 172.16.0.202: seq=79 ttl=64 time=1.453 ms
64 bytes from 172.16.0.202: seq=80 ttl=64 time=1.439 ms
64 bytes from 172.16.0.202: seq=81 ttl=64 time=1.413 ms
64 bytes from 172.16.0.202: seq=82 ttl=64 time=1.413 ms
64 bytes from 172.16.0.202: seq=83 ttl=64 time=1.411 ms
64 bytes from 172.16.0.202: seq=84 ttl=64 time=1.561 ms
64 bytes from 172.16.0.202: seq=85 ttl=64 time=1.411 ms
64 bytes from 172.16.0.202: seq=86 ttl=64 time=1.428 ms
64 bytes from 172.16.0.202: seq=87 ttl=64 time=2.627 ms
64 bytes from 172.16.0.202: seq=88 ttl=64 time=1.405 ms
64 bytes from 172.16.0.202: seq=89 ttl=64 time=1.404 ms
^C
--- 172.16.0.202 ping statistics ---
90 packets transmitted, 90 packets received, 0% packet loss
round-trip min/avg/max = 1.143/1.772/8.808 ms

As you can see the issue did not reoccur in this setup.

3. With Batman setup we also did arping test, the results dont show any delay while disconnecting one of the nodes :

root@OpenWrt:/# arping -I br-lan 192.168.1.12
ARPING 192.168.1.12 from 192.168.1.13 br-lan
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.737ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.679ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.683ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.687ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.684ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.682ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.679ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.687ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.684ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.687ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.684ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.689ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.687ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.688ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.681ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.686ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.689ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.691ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.684ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.689ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.691ms
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.687ms                    
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.690ms                    
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.680ms                    
Unicast reply from 192.168.1.12 [80:2A:A8:B9:3E:14]  0.686ms                    
Unicast reply fr^CSent 26 probes (1 broadcast(s))                               
Received 26 response(s)                       

4. We tested the same batman setup on rpi with Debian installed and didn't see any of the issues mentioned above.

How can we further investigate the issue ?

Thanks !

Actions #21

Updated by Sven Eckelmann over 6 years ago

Sorry, but I will not provide any help here anymore because you do everything - but not what I've asked for.

Actions #22

Updated by Moshe Hoori over 6 years ago

Hi Sven,

Sorry, we forgot to mention that as we use OpenWrt, we don't have the fair airtime implementation in our code at all.

so there is nothing to revert.

Actions #23

Updated by Marek Lindner over 6 years ago

david lichterov wrote:

How can we further investigate the issue ?

You're basically telling us everything works as it should and ask us why it works ? I hope you understand the irony ? :-)

From an engineering perspective I'd say: If the problem disappeared something changed in your setup. What that something is we obviously don't know. We only have the information you provided. Could be anything from the Openwrt toolchain, driver versions, etc up to the WiFi environment.

Assuming the issue is gone, why don't you go ahead with whatever you set out to do before you opened this ticket ? In case the problem shows up again, you can always re-open the ticket or create a new one or visit our IRC channel. It is really hard to make suggestions remotely what to do to trigger some problem only you have seen.

Actions #24

Updated by Moshe Hoori over 6 years ago

Hi Marek,

Sadly, the issue is not gone.
Let me sum up what we know so far:
  1. Using BATMAN on Rocketm5 with OpenWrt:
    • PING quality is bad after node disconnection (as mentioned in the initial post).
    • ARPING quality is good after node disconnection .
    • PING via mash0 as in post #12 quality is good after node disconnection .
  2. Using ADHOC connections on Rocketm5 with OpenWrt:
    • PING+ARPING quality is good after node disconnection
  3. Using BATMAN on RPI3 with Debain:
    • PING+ARPING quality is good after node disconnection .

We still see the issue on BATMAN + OpenWrt.

Actions #25

Updated by Marek Lindner over 6 years ago

Thanks for the summary.

Let me sum up what we know so far:
  1. Using BATMAN on Rocketm5 with OpenWrt:
    • PING quality is bad after node disconnection (as mentioned in the initial post).
    • ARPING quality is good after node disconnection .
    • PING via mash0 as in post #12 quality is good after node disconnection .

Can you also help me understand how you come to the 'quality is good' conclusion for the pure adhoc mode test ? Doesn't ticket update 13 show the latency go up on disconnect ?

  1. Using ADHOC connections on Rocketm5 with OpenWrt:
    • PING+ARPING quality is good after node disconnection

Is this test case different from the previous 'good' or the same ? Mesh0 simply is pure adhoc mode, right ?

  1. Using BATMAN on RPI3 with Debain:
    • PING+ARPING quality is good after node disconnection .

Is the batman-adv version in all test scenarios the same ?

Actions #26

Updated by Moshe Hoori over 6 years ago

Marek Lindner wrote:

Thanks for the summary.

Let me sum up what we know so far:
  1. Using BATMAN on Rocketm5 with OpenWrt:
    • PING quality is bad after node disconnection (as mentioned in the initial post).
    • ARPING quality is good after node disconnection .
    • PING via mash0 as in post #12 quality is good after node disconnection .

Can you also help me understand how you come to the 'quality is good' conclusion for the pure adhoc mode test ? Doesn't ticket update 13 show the latency go up on disconnect ?

you are right, I got confused. the ping test results in that case were not good.

  1. Using ADHOC connections on Rocketm5 with OpenWrt:
    • PING+ARPING quality is good after node disconnection

Is this test case different from the previous 'good' or the same ? Mesh0 simply is pure adhoc mode, right ?

The results were good please see post 20. I don't know if Mesh0 is simply adhoc mode. just followed Sven's orders on ticket 12.

  1. Using BATMAN on RPI3 with Debain:
    • PING+ARPING quality is good after node disconnection .

Is the batman-adv version in all test scenarios the same ?

yes.

Actions #27

Updated by Marek Lindner over 6 years ago

Moshe Hoori wrote:

Is this test case different from the previous 'good' or the same ? Mesh0 simply is pure adhoc mode, right ?

The results were good please see post 20.

To be honest, ticket update 20 confuses me because I don't understand which output is linked to what test and what is different from the expected. At first read, I thought all quoted text (config & tests) were coming from Openwrt and show no issue. In the end, the update states also no issue with Debian. Hence my conclusion in update 23: Everything is good ? In the following update you state that Openwrt still does not work ..

I don't know if Mesh0 is simply adhoc mode.

The second quote part of update 20 shows it: wifi-iface => radio0, mode => adhoc. The interface name can be chosen, it is the mode that matters.

Let me try to summarize what I understand so far while (partially) ignoring update 20 for now due to confusion on my end:

  • Debian on RPI with or without batman-adv does not exhibt any spikes in latency when a nearby WiFi node is turned off.
  • Openwrt with or without batman-adv exhibits spikes in latency when a nearby WiFi node is turned off.

The same batman-adv versions were deployed in all tests.

Please correct me if I got it wrong.

Based on the above, wouldn't it be safe to assume the problem lies somewhere in Openwrt (Wifi or Kernel or ...) ? There still might be a problem with batman-adv but until the underlying latency spikes haven't been resolved, it makes little sense to poke in batman-adv. Because batman-adv relies on the WiFi layer for the actual packet transmission we can not tackle a problem in batman-adv while the WiFi layer is misbehaving.

Actions #28

Updated by Moshe Hoori over 6 years ago

Marek Lindner wrote:

Moshe Hoori wrote:

Is this test case different from the previous 'good' or the same ? Mesh0 simply is pure adhoc mode, right ?

The results were good please see post 20.

To be honest, ticket update 20 confuses me because I don't understand which output is linked to what test and what is different from the expected. At first read, I thought all quoted text (config & tests) were coming from Openwrt and show no issue. In the end, the update states also no issue with Debian. Hence my conclusion in update 23: Everything is good ? In the following update you state that Openwrt still does not work ..

I don't know if Mesh0 is simply adhoc mode.

The second quote part of update 20 shows it: wifi-iface => radio0, mode => adhoc. The interface name can be chosen, it is the mode that matters.

Let me try to summarize what I understand so far while (partially) ignoring update 20 for now due to confusion on my end:

  • Debian on RPI with or without batman-adv does not exhibt any spikes in latency when a nearby WiFi node is turned off.

that's right

  • Openwrt with or without batman-adv exhibits spikes in latency when a nearby WiFi node is turned off.

using batman-adv, pinging as in ticket 12 results bad ping quality (this is what I referred to as pinging mesh0).
without batman-adv, using adhoc network configuration, results good ping quality.

The same batman-adv versions were deployed in all tests.

Please correct me if I got it wrong.

Based on the above, wouldn't it be safe to assume the problem lies somewhere in Openwrt (Wifi or Kernel or ...) ? There still might be a problem with batman-adv but until the underlying latency spikes haven't been resolved, it makes little sense to poke in batman-adv. Because batman-adv relies on the WiFi layer for the actual packet transmission we can not tackle a problem in batman-adv while the WiFi layer is misbehaving.

Actions #29

Updated by Marek Lindner over 6 years ago

Moshe Hoori wrote:

using batman-adv, pinging as in ticket 12 results bad ping quality (this is what I referred to as pinging mesh0).
without batman-adv, using adhoc network configuration, results good ping quality.

Sorry, there seems to be some misconception here. Pinging as described in ticket update 12 is without batman-adv. Ticket 12 even references my suggestion to test the pure WiFi layer without batman-adv which I suggested in ticket update 9. Ticket update 13 then shows the test results of pure WiFi without batman-adv and depicts a spike in latency. No ?

Actions #30

Updated by Moshe Hoori over 6 years ago

Hi Marek,
sorry for being unclear about that.

1. what we did in the test from post 12, is testing while BATMAN-ADV is installed, and bat0 is up.
2. what we did in the test from post 20, is testing while BATMAN-ADV is not installed at all.

today we tested (1) again. and it seems the the issue reoccurs only if bat0 interface is up.

Actions #31

Updated by Marek Lindner over 6 years ago

Moshe Hoori wrote:

Hi Marek,
sorry for being unclear about that.

1. what we did in the test from post 12, is testing while BATMAN-ADV is installed, and bat0 is up.
2. what we did in the test from post 20, is testing while BATMAN-ADV is not installed at all.

today we tested (1) again. and it seems the the issue reoccurs only if bat0 interface is up.

Sorry, I don't want to continue this discussion here. The information you provide appears contradicting and unclear which makes it hard to help you. Just now, you throw in another aspect that wasn't discussed before: What is installed and what is not.

I recommend you either join our IRC channel for a more real-time discussion with questions & answers or you provide a comprehensive overview about what is working and what is not. Right now I can't tell.

Actions

Also available in: Atom PDF