The majority of SD-WAN implementations are using Forward Error Correction (FEC) to guarantee voice and other critical traffic such as credit card authorizations work well across networks that do not guarantee end-to-end QoS. While FEC sounds good in theory, real-world implementations are showing mixed results.
FEC has been around since the 1950’s and layer 4 protocols such as TCP provide retransmission services, so what is new? Providing FEC at the layer 3, network level, in theory will reduce the time it takes TCP to resend a dropped packet and provides a function that UDP does not offer. The advantage of layer 3 FEC is that it is aware of other network traffic on the same link and can, in theory, intelligently manage all flows, while TCP is only aware of a single flow.
There are multiple ways of doing layer 3 FEC, with pros and cons of each. Early best practices are starting to emerge and there is not a “one size fits all” solution.
Static vs. Dynamic – Static FEC is duplicating all packets for critical traffic and ideally sending the duplicated packets over a second network. This method works well when the amount of critical traffic that is being duplicated across the networks is far less than the capacity of the network. When bandwidth is at a premium dynamic FEC is more efficient and works well when the number of packets being dropped is minimal. Static FEC is simpler to implement and support, so it is preferred.
Small vs. Large Packets – Layer 3 FEC works well on flows that use small packets, under 300bytes, such as voice or a point of sale transaction. The small packet flows do not take as much overhead and rarely is resequencing of packets on the far end required. Large packet applications such as video and file transfers usually are not business critical apps, and doing forward error correction on fragmented packets is difficult. The fragmentation comes from adding IPsec header to an already maximum sized packet. So small packet only flows using layer 3 FEC is an emerging best practice.
Adaptive Codecs vs FEC – Many of the latest voice and video codecs support FEC within the codec. Opus for example can be configured to put redundant data into the audio payload, reducing the IPsec/IP/UDP overhead of layer 3 static FEC. Adaptive codecs can also dynamically increase the size of the jitter buffer and reduce the amount of data sent when variable latency and dropped packets occurs on the network. The IETF has a standard for RTP redundancy (a form of FEC) that permits up to 9 layers of redundancy (RFC 2198). For voice and video traffic, adaptive codecs are more efficient than layer 3 FEC.
LTE Backup and FEC – For an SD-WAN that has a primary path with LTE for backup, should FEC be used? In most cases, the answer is no. FEC increases the amount of traffic on a network that is already limited in capacity and it makes the situation worse. When packets are dropped in a wireless network, it is usually many packets in a row, regardless of priority, of which FEC does not help. Dynamic session networking is the answer with the ability to prioritize some sessions above others and utilize RTCP and TCP-Retransmits to dynamically and intelligently manage sessions.
Today’s SD-WANs solutions lack session awareness and not being able to delineate between session criticality and application flow control mechanisms that are already built in. For instance, flow based SD-WANs treat all voice traffic the same, instead of having different FEC policies for different types of voice codecs. An example is an enterprise that uses Cisco with G.729 for their office phones, Avaya with G.711 for contact center phones, Microsoft with SILK for internal collaboration, and Slack with WebRTC for external collaboration. The FEC rules for each application with each codec type should vary to maintain toll quality for customer facing calls while optimizing network performance.
On a public Internet connection where packets are being randomly discarded, FEC is appealing. But layer 3 FEC must be supplemented with dynamic session networking to intelligently manage network performance.