Starving routes since "batman-adv: avoid temporary routing loops by being strict on forwarded OGMs"
In a four node setup (see attached topology.png/dia) I'm experiencing starving routes. Node A frequently looses track of mostly node D, but also C. This seems to happen as soon as node B switches its route towards D from one interface to the other.
Checking with 'batctl td' it looks like before commit:f76d019 any OGM leading to a "Changing route towards..." event got forwarded. With commit:f76d019 they do not get forwarded anymore.
Additionally looking at the provided logs it looks weird, that the OGM of node D received the second time via C's other interface at node B results in such a "Changing route towards..." event at all even though this new one has the same TQ - 225 - as the old one. This then leads to route flapping with every new pair of OGMs of originator D.
These two things together very often (that is when no packet loss is present which is very often the case in this setup) lead to node A not getting any OGM at all from originator D for instance. The route from node A to D starves.
The attached number-of-route-changes-B-to-D.svg visualizes when a lot of route change events happened for the provided logs. So for instance at about 1500s (or uptime 113499280 or seqno 2259557124) in nodeB.log is a time when the route towards D starves for node A.
I tried reproducing the same setup with virtual machines and wirefilter and a '-d 50' parameter to simulate the second, slower wifi interface from the real setup, but mostly unsuccessfully so far. There are some OGMs not getting through on the route between A and D but only for a few seconds. Instead of a second route switch with the OGM from the alternate interface, like in the physical setup, I'm getting a "Drop packet: packet within seqno protection time (sender: fe:fe:00:00:03:04)" on node B (which is the intended behaviour?).
Updated by Linus Lüssing over 8 years ago
- commit:716c8c9a8bb7ac1e30e959e50ed74caa7dabe60a: Fixed the observed "Drop packet: packet within seqno protection" issue, making things reproduceable within kvm.
- [PATCH] batman-adv: Fix symmetry check...: Fixes the observed route flapping issues.
With these changes, the cause of the starving route issue seems to become clearer:
This issue occures every time node B switches to the slower (i.e. higher latency) link towards C (i.e. the -d50 wirefilter link in kvm). (Which happens when a single OGM ocasionally gets lost on the faster link, I guess, even in a kvm/wirefilter setup with no packet loss configured.)
This then results in:
- OGM via fast link gets accepted, seqno updated, but no route switch and not rebroadcasted [bc. of (!is_from_best_next_hop && !is_single_hop_neigh) in batadv_iv_ogm_forward -> 2nd return statement]
- OGM via slow link gets dropped as a duplicate, does not get rebroadcasted either
Which means no OGM ever gets forwarded to A until a packet loss on the slow link occures.