| # LwIP changes for Matter |
| |
| LwIP is one of the network layers used in the Matter platform. Although it has |
| some good IPv6 support, there are areas that are lacking that we should |
| implement for Matter. The recommendations here are listed roughly from most to |
| least important. |
| |
| ## Route Information Options (RIO) |
| |
| The specification requires devices to store route options from Route Information |
| Options (RIO) sent in router advertisements. This functionality is not currently |
| present in upstream LwIP. The patch to add this is relatively small, but we may |
| need to upstream this in order to require its use in Matter. Platforms would |
| need to incorporate this into their own middleware |
| |
| ### Recommendation: |
| |
| - write a RIO patch, upstream to lwip |
| - Ensure patch is RFC compliant (especially re: expiry) |
| - UPDATE: Patch is available at https://savannah.nongnu.org/patch/?10114 |
| |
| ## Address Scopes |
| |
| Link local addresses are less common on IPv4, which normally rely on NAT at the |
| router to do address translation. Matter mandates the use of IPv6 link local |
| addresses for communication to nodes on the same network (wifi or thread). When |
| there is more than one netif in the system (ex. loopback, softAP, STA), the link |
| local address needs more information to determine which link the address is |
| local to. This is normally added as the link local scope and can be seen on |
| addresses ex. `FE80::xxxx:xxxx:xxxx:xxxx%<scope>`, where the <scope> identifies |
| the netif (something like `%wlan0` or `%eno1` etc.). |
| |
| Without this indicator, the link local address can only be resolved if there is |
| one netif. LwIP will also allow a direct address match to the netif source |
| address, but this does not scale well at all and is VERY racy. LwIP also |
| supports output to a specific netif, but this is not brought up to the socket |
| layer. |
| |
| Upstream LwIP has support for IPv6 address scopes, but only as an option. |
| However, the code to support this is not present in the CHIP LwIP codebase. |
| Other platform versions assume this option is not present (ex. M5 has an |
| assertion on ip address sizes that disallow the use of a scope tag). |
| |
| ### Recommendation: |
| |
| - Ensure Matter SDK code works with scopes on our various platforms OR |
| alternate: bring netif sendto up through the api / sockets layers |
| - Audit Matter code to ensure LL addresses are properly scoped to their netif |
| in all areas (DNS returned addresses especially) |
| |
| ## Duplicate address detection |
| |
| The DAD in LwIP is actually implemented correctly right now, but there are |
| routers that incorrectly implement multicast for IPv6 and send packets back to |
| the sender. This triggers the LwIP DAD because it doesn’t check the source. This |
| can be fixed in the wifi layer as a filter, but it’s easy enough to add the fix |
| into the LwIP layer. This would help implementers so they don’t all have to |
| debug the same issues. Recommendation: |
| |
| - Create an LwIP patch to check NS/NA packets for source and discard if they |
| originate from the same device. Upstream and offer patch to vendors. |
| |
| ## Timers, including TCP |
| |
| lwIP uses on-demand timers for IGMP and MLD (see |
| https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/lwip.html#esp-lwip-custom-modifications |
| for changes Espressif made to lwIP to help power usage on ESP32 and better |
| support IPv6), and also has several uncorrelated always-on timers for TCP. These |
| timers have caused power issues on some products. |
| |
| ### Recommendation: |
| |
| - Make sure to take-in Espressif improvements to timers (not sure they are |
| upstreamed) |
| - Look into supporting aligned TCP timers to aggregate multiple timers within |
| a single wake |
| |
| ## pbuf management |
| |
| Pool-based management has been a source of problems on several products, but |
| does have advantages over purely “heap” based allocation of pbufs as done in |
| ESP32 and many common lwIP stacks. |
| |
| Overall, having the ability to instrument all PBUF allocations for usage (e.g. |
| Driver TX, Driver RX, Manual PacketBuffer allocation, internal TCP stack pbufs, |
| etc) would allow us to move towards a pool approach by allowing us to track the |
| following: Understanding of the overall memory usage of lwIP packet buffers over |
| time, helping debug issues related to out-of-pbuf or overly-long queuing. Keep |
| track of incoming packets dwelling and outgoing packets dwelling to start |
| dropping at ingress when running out of memory Overall, allow sizing of heap and |
| pools based on usage patterns. |
| |
| ### Recommendation: |
| |
| - Upstream a portable version of pbuf alloc/free accounting, allowing |
| registration of instrumentation handlers. |
| - Add support to account for high watermark of pbuf memory used and concurrent |
| pbuf allocations. |
| - Add more pbuf allocation types to allow finer-grained recording of “reason” |
| for a pbuf alloc |
| |
| ## IPv6 Ping |
| |
| Although ping is not required for Matter, it is very helpful for debugging |
| networking issues. Having a reliable ping would be beneficial for a lot of |
| developers. |
| |
| LwIP will automatically respond to pings, but has no built-in way to send them. |
| The current ping implementation is a contrib app that only works for IPv4. |
| Extending the app is challenging for two reasons: 1) IPv6 checksum needs access |
| to the pbuf for calculation, which the app doesn’t have and 2) IPv6 has a lot |
| more ICMP traffic for SLAAC that the app would have to be updated to disregard. |
| Instead, it might be better to build this into the ICMP layer itself. |
| |
| ### Recommendation: |
| |
| - Add an ASYNC send_icmp6_ping function and add a hook to check ping |
| responses. Upstream patch if possible. OR write an external ICMP6 ping util |
| |
| ## DNS |
| |
| LwIP's DNS handling isn’t great and breaks down when the router supports |
| IPv4/IPv6. There is a single list of DNS servers, DHCP, SLAAC and DHCPv6 all |
| update the list without locks. Basically, whatever wrote to the list last gets |
| to set the list. Although there is handling for IP type (requesting A or `AAAA` |
| records), there isn’t handling to specify an IPv6 or IPv4 server specifically, |
| which can be challenging since not all servers serve all record types. |
| |
| The design of the weave connectivity manager moves the DNS selection to the |
| upper layers by stopping lwip from directly changing the DNS list and hooking to |
| the DNS selection. This means the DNS selection policy isn’t hard-coded into the |
| lwip layer. This seems like a good model for CHIP going forward. |
| |
| Additionally, we should ensure that CHIP uses non-blocking DNS APIs. |
| |
| ### Recommendation: |
| |
| - bug fix for DHCPv6 to avoid it setting bad addresses. |
| - note - fixed in |
| https://git.savannah.nongnu.org/cgit/lwip.git/commit/?id=941300c21c45a4dbf1c074b29a9ca3c88c9f6553, |
| but not yet released as a part of an official release. |
| - Create a patch to add hooks to the SetDns and GetDns functions so logic for |
| selecting the DNS server can be moved into the manager layer |