Asymmetric VLAN generates loop
In case of an asymmetric VLAN configuration as shown in the picture it is possible to generate a loop even if BLA is enabled.
The loop is triggered by the fact that the two potential backbone nodes won't recognize each other. This can be understood by looking at the steps taken by a BLA ANNOUNCE message:
- is generated at node B for VLAN y
- is broadcast'd over eth0.x
- reaches eth0 on node A and does not get decapsulated
- enters bat0.y and gets encapsulated once again
- batman-adv does not find the BLA message because the "encapsulated_proto" field is not ARP
- the message gets forwarded in the mesh because the node thinks to be alone on the LAN
- the message reaches node B the loop starts
I don't see a clear solution to this problem because even if we make batman-adv decapsulate any packet until we get to the inner layer, we still have the problem that node A won't be recognized by nodeB (nodeA will always send with no VLAN header).
Updated by Simon Wunderlich over 7 years ago
that's an interesting corner case, and I think you are right about your conclusion. This would loop, although not endlessly because a new VLAN header is added in each round until the MTU is reached, but still ...
We wouldn't have this problem if we BLA would be able to work on VLANs in VLANs (QinQ), but it works on only one layer of VLANs. It would also work if there wasn't a VLAN ontop of batman (since BLA would kick in for that specific VLAN x then), but there is most probably a reason for that too.
- forbid QinQ or stacked VLANs on batman, or make that as selectable option to allow it. Some switches do that do AFAIR * add QinQ for BLA, allowing to operate on multiple stacked VLANs * accept it and fix the config on the APs - why are batman-VLANs connected to different ethernet VLANs in the first place anyway? that seems broken by design ...