3 Most Common SD-WAN Issues

Alyssa Lamberti
Alyssa Lamberti Last updated on Nov 28, 2022

3 Most Common SD-WAN Issues

In Summary

Many businesses rely on SD-WAN services to deliver optimal Internet, cloud, and UC performance. But like any network, SD-WAN can experience network issues that affect user experience. So you need to be ready. Keep reading to learn about the most common SD-WAN issues.

Table of Contents
Table of Contents

Many businesses rely on SD-WAN services to deliver optimal Internet, cloud, and UC performance. But like any network, SD-WAN can experience network issues that affect user experience. So you need to be ready. Keep reading to learn about the most common SD-WAN issues.

This article is part of a series of articles about monitoring and troubleshooting SD-WAN networks before, during, and after migrations. The articles include:

Introduction
Introduction

Essentially, the most common SD-WAN issues are caused by network bandwidth congestion (bottleneck) or high network devices ressources usage (High CPU). This usually occurs on the Local Loop or the customer Edge Router, which are both prone to network congestion.

In addition, most of the problems in an ISP's backbone that can cause SD-WAN issues are related to congestion on their peering and transit paths with other networks or Service providers.

Although ISP's backbone are more reliable and robust than other network infrastructures, performance issues can still happen.

The same goes for SD-WAN networks in general. SD-WAN vendors promise their solution is magic, but the user experience is not always magical.

Keep reading for more details about where SD-WAN issues occur, and concrete examples of the 3 most common issues and how to identify them.

Where SD-WAN Issues Occur
Where SD-WAN Issues Occur

Since always, the weakest link in a network has always been the last mile. The last mile is the last segment of the network, which generally has the lowest speeds, the least route diversity and the most single points of failure.

SD-WAN networks are no exception to this rule.

So it wasn't a surprise for our team of network pros to discover that 75% of our customer base experience SD-WAN issues located on the last mile of their network.

This is why most SD-WAN networks rely on more than one link.

The assumption is that, if a problem occurs, it should not affect all the links at the same time and the SD-WAN Edge Router should be able to load-balance the network sessions on the best link available. But link diversity on its own is not enough to avoid all issues that can happen in SD-WAN Networks.

SD-WAN Network Design
SD-WAN Network Design

To truly understand what SD-WAN issues can happen, you need to first understand what SD-WAN networks look like and where the SD-WAN issues can happen in the network.

The image below is a diagram of an SD-WAN network site communicating with a Data Center, Head Office or IaaS.

SD-WAN Issues Design

A. The Underlay

  • The Internet
  • Internet Local Loop
  • Internet Provider’s Edge Router
  • ISP Backbone
  • ISP Peering Point

B. The Overlay

  • IPsec Tunnel from one site to another

C. The LAN

  • SD-WAN Edge
  • Core & Distribution Switches
  • Access Switches

Identifying SD-WAN Issues
Identifying SD-WAN Issues

Before we dive deeper into the 3 most common SD-WAN issues, it’s important to understand that there are a variety of network problems that can affect your SD-WAN network.

  • Some common SD-WAN issues include:
  • Defective cables or connectors
  • Bandwidth congestion (bottleneck)
  • Device misconfigurations
  • Device software issues
  • High CPU usage
  • Physical/ hardware issues
  • Human errors
  • DNS issues

To be able to identify and troubleshoot SD-WAN issues that severely impact the user experience, you need end-to-end visibility of your SD-WAN network.

To do so we recommend a modern decentralized Network Monitoring tool like Obkio Network Performance Monitoring software which continuously monitors end-to-end network performance with synthetic traffic using Network Monitoring Agents.

Get started with Obkio’s free trial! Or check out our blog post on How to Monitor SD-WAN Networks.

Get Started with Obkio
Get Started with Obkio

Start monitoring and troubleshooting your SD-WAN network in just 15 minutes with Obkio's free trial!

Start for Free right arrow hover right arrow

Troubleshooting SD-WAN Issues
Troubleshooting SD-WAN Issues

Once you’ve identified any SD-WAN issues in your network, Obkio’s network monitoring solution will also allow you to collect the data you need to troubleshoot these network problems.

We talk about Obkio’s SD-WAN troubleshooting steps in our article on SD-WAN Troubleshooting, but here is a summary of the steps.

1. Pinpoint SD-WAN Issues
1. Pinpoint SD-WAN Issues

  • Analyze live data or alerts received from your monitoring solution to look at what network locations are currently experiencing poor performance
  • Isolate the issue and focus on the location with the worst performance

2. What the Problem Is & When It Happened
2. What the Problem Is & When It Happened

  • Look at past historical data
  • Isolate when the issue first happened and its pattern

3. Compare the Information & Find A Root Cause
3. Compare the Information & Find A Root Cause

  • Once you know what happened, look at your historical traceroutes to pinpoint where the issue happened
  • Identify if the issue is internal or external (in the ISP network)

4. Implement A Solution
4. Implement A Solution

  • If the issue is on your ISP’s side, open a support ticket with information from Obkio
  • If the problem is internal, resolve internally
SD-WAN Troubleshooting: How to Troubleshoot SD-WAN Networks

Learn how to troubleshoot SD-WAN issues using Obkio Network Monitoring software and key SD-WAN troubleshooting steps.

Learn more right arrow hover right arrow

3 Most Common SD-WAN Issues
3 Most Common SD-WAN Issues

As we said above, there are a variety of SD-WAN issues that can occur, but some happen more often than others.

As we said above, the majority of SD-WAN issues happen in the last mile, generally in the Local Loop or the customer Edge Router.

So we’re going to show you the 3 most common SD-WAN issues using concrete examples, and show you what they look like using Obkio’s Network Monitoring Software. We’re going to focus on SD-WAN issues happening in Branches #1, #2, and #3, which you can see in the Chord Diagram below.

SD-WAN Issues Diagram

With a tool like Obkio, you can identify and visualize SD-WAN issues, and be alerted as soon as they happen.

1. High CPU Usage
1. High CPU Usage

The first SD-WAN issue is high CPU usage on SD-WAN Devices affecting all sessions. This generally occurs when a network device does not have enough available resources to manage the throughput.

SD-WAN Issues High CPU usage

A. What Do We See?
A. What Do We See?

In the screenshot above, we can see an Obkio Dashboard for a Branch #3 with various Obkio’s performance graphs. The selected view shows performance over the last 8 hours.

Column 1 shows the UDP monitoring session performance from the Branch 3 Monitoring Agent towards the SD-WAN user experience Monitoring Agents.

  • The first graph shows the Internet SD-WAN user experience
  • The 2 bottom graphs show the experience of the Internet connections (ISP 1 & ISP 2)

After reviewing the information from the dashboard, we can see that:

  • There is poor performance caused by high packet loss sequences, affecting all the traffic going through the SD-WAN network
  • Both ISP #1 and ISP #2 are being affected

B. When Did the Problem Happen?
B. When Did the Problem Happen?

When analyzing the historical data on the dashboard to find a trigger or a pattern, we see that this is an intermittent problem (happens on and off) and doesn’t follow a specific pattern.

C. What is the SD-WAN issue?
C. What is the SD-WAN issue?

For ISP #1 and ISP #2 to be affected, this means that the network problem is happening on a network segment that is common to both ISPs.

Column 2 shows SNMP Polling (Device Monitoring) on the SD-WAN Edge Equipment and metrics for CPU Usage and Bandwidth Usage.

  • Let’s focus on the CPU usage on the Firewall
  • At the same time that ISP #1 and ISP #2 are experiencing performance issues, we can see that the CPU usage is at 100%

This is not a local loop issue and you don’t need to call your ISP. Like with the first CPU usage issue, this is a local problem. Significant traffic is being sent to that port, perhaps from a different application.

This could be in the LAN, or directly on the SD-WAN Edge Router.

Problems on Edge Routers are very common, because they are usually security devices with lots of features and software. The software and features are very resource intensive and can affect your CPU usage.

In Column 3, we can see performance for Zoom and Microsoft Teams call quality.

  • When ISP #1 and ISP #2 are experiencing performance issues, it also affects Zoom and Teams call quality.

That’s because, if the CPU of the network device doesn’t have the power to treat the packets in real time, you’ll then experience high packet loss.

Packet loss can then affect the performance of network devices, as well as UC applications like Zoom and Microsoft Teams.

D. What are Possible Solutions?
D. What are Possible Solutions?

In this situation, the SD-WAN problem is happening on a local network device, and not in your ISP’s network. So it’s up to you to troubleshoot.

When High CPU usage starts:

  • Look at the device logs to understand what process started at this time.
  • Identify software bugs in your device.
  • Look into if a software update was recently done and roll back to an older software version.
  • Update your device’s firmware
  • Look at Network Device Monitoring to understand if high CPU usage is happenening simultaneously with high bandwidth usage (not in this exemple).
  • If high bandwidth usage is the cause, look at the firewall logs to understand if your traffic is legitimate or not.
  • Manage priorities in your Firewall to prioritize certain traffic.
  • Upgrade to a bigger device.

After deciding on a resolution, look into the real-time data from Obkio's monitoring tool to see if your chosen course of action solved the issue.

2. High Bandwidth Usage
2. High Bandwidth Usage

The second SD-WAN issue is on the the underlay of ISP #2 caused by high bandwidth usage.

SD-WAN Issues High Bandwidth usage

A. What Do We See?
A. What Do We See?

In the screenshot above, we can see an Obkio Dashboard for a Branch #1 with various performance graphs. The selected view shows performance over the last 8 hours.

Column 1 shows the UDP monitoring session performance from the Branch #1 Monitoring Agent towards the SD-WAN user experience Monitoring Agents.

  • The first graph shows the Internet SD-WAN user experience
  • The 2 bottom graphs show the experience of the Internet connections (ISP 1 & ISP 2)

ISP #1 doesn’t show any performance issues:

  • The solid blue line tells us that the latency is stable
  • The different shades of blue suggest there is low jitter
  • We don't see any yellow or red bars, which means that no packet loss is detected.

ISP #2 shows a clear performance issue caused by high packet loss measurements.

B. When Did the Problem Happen?
B. When Did the Problem Happen?

We need to focus on the top graph in the first column, which is the user experience.

When ISP #2 started experiencing issues, the users were using that link and also experiencing the issue. At some moment, the SD-WAN service switched from ISP #2 to ISP #1.

The issue stopped from a user standpoint because it switched to ISP #1, but ISP #2 is still experiencing issues although it isn’t being used.

At some point, the issue seems to stop, the SD-WAN service switches back to ISP #2. Then the issue comes back again on ISP #2 and the users start experiencing the issue again.

C. What is the SD-WAN issue?
C. What is the SD-WAN issue?

Column 2 shows SNMP Polling (Device Monitoring) on the SD-WAN Edge Equipment and metrics for CPU Usage and Bandwidth Usage.

  • Let’s focus on the Bandwidth usage on WAN Port #2
  • At the same time that ISP #2 is experiencing high packet loss, we can see that the bandwidth usage is over the available 500 mb bandwidth service.

From here we can see that the bandwidth usage is over the limit and determine that the high bandwidth usage is causing the packet loss. Obkio’s tool would have alerted you about the high packet loss with a Smart Notification.

In Column 3, we can see performance for Zoom and Microsoft Teams call quality.

  • When ISP #2 is being used and experiences high packet loss, it also affects Zoom and Teams call quality.

This is not a local loop issue and you don’t need to call your ISP. Like with the first CPU usage issue, this is a local problem. Significant traffic is being sent to that port, perhaps from a different application.

D. What are Possible Solutions?
D. What are Possible Solutions?

Since the SD-WAN problem is happening on a local network device, your ISP can’t help you here.

  • Look at the firewall logs to understand if your traffic is legitimate or not.
  • Manage priorities in your Firewall to prioritize certain traffic.
  • Change the backup schedule
  • Rate limit the flow of traffic
  • Upgrade your Internet connection bandwidth with your ISP if you’re out of bandwidth.

You can then use Obkio’s Live View to see the effect of the changes you made on ISP #2 in real-time.

3. Local Loop Issue
3. Local Loop Issue

The 3rd most common SD-WAN issue is an ISP Local Loop issue on the underlay.

SD-WAN Issues Local Loop Issue

A. What Do We See?
A. What Do We See?

In the screenshot above, we can see an Obkio Dashboard for a Branch #2 with various performance graphs. The selected view shows performance over the last 8 hours.

Column 1 shows the UDP monitoring session performance from the Branch 2 Monitoring Agent towards the SD-WAN user experience Monitoring Agents.

  • The first graph shows the Internet SD-WAN user experience
  • The 2 bottom graphs show the experience of the Internet connections (ISP 1 & ISP 2)

ISP #1 doesn’t show any performance issues. The solid blue line tells us that the latency is stable, and the different shades of blue suggest there is low jitter. We don't see any yellow or red bars, which means that no packet loss is detected.

ISP #2 shows a clear performance issue.

Column 2 shows SNMP Polling (Device Monitoring) on the SD-WAN Edge Equipment and metrics for CPU Usage and Bandwidth Usage.

  • Unlike in the previous example, there is no high bandwidth usage being shown.
  • This is not a bandwidth issue related to a lack of resources from the SD-WAN Edge router.

In Column 3, we can see HTTP performance for Zoom and Microsoft Teams, which is the same as the Network Response Time of the load-balanced session (top left corner).

  • When ISP #2 experiences performance issues, it also affects Zoom and Teams call quality.
  • The issues on Zoom and Teams happen around the same time as they occur on ISP #2.

B. When Did the Problem Happen?
B. When Did the Problem Happen?

For more information, we’ll be using Obkio Vision, Obkio’s free Visual Traceroute tool that runs continuously to interpret Traceroute results to identify network problems in your WAN and over the Internet.

By looking at the traceroute below, the issue seems to be introduced right from the 1st hop, and we can see that only ISP #2 is affected.

SD-WAN Issues Visual Traceoute

C. What is the SD-WAN issue?
C. What is the SD-WAN issue?

The SD-WAN problem is happening on the Local Loop, between the ISP Edge and SD-WAN Edge Equipment.

In this case, the problem is related to your ISP, so they are responsible for solving the problem.

Obkio’s Visual Traceroutes are able to identify problems anywhere in your network (ISP and AWS, ISP and Peering etc.), detect that they are a performance issue, and validate that the issue is not on your end.

How to Troubleshoot Networks with Obkio Vision Visual Traceroute

Learn how to use Obkio Vision’s Visual Traceroute tool to troubleshoot network problems with traceroutes both inside & outside your local network.

Learn more right arrow hover right arrow

D. What are Possible Solutions?
D. What are Possible Solutions?

Firstly, you want to make sure that you’re not using ISP #2 while you’re waiting for the issue to be resolved.

Secondly, you need to contact your ISP using the information you’ve acquired from Obkio’s app.

  • Open a support ticket with your ISP using the screenshots of Monitoring Sessions, Dashboards or Traceroutes in Vision.
  • Use Live Monitoring mode for real-time updates and share results of Live Traceroutes with your ISP using a public link.
  • If your ISP wants to analyze your data further, you can create a temporary Read-Only User in your Obkio account for them.

Detect SD-WAN Issues
Detect SD-WAN Issues

Now you’ve just seen some of the most common SD-WAN issues that your network can experience, so you're ready to fight them off!

Remember that SD-WAN issues are inevitable. It’s not about if they happen, it’s about when, how, and where they happen.

To be able to identify and troubleshoot any SD-WAN issues, whether they happen in your network or your ISP’s network, continuously monitor your SD-WAN network using Obkio’s SD-WAN Monitoring tool.

  • Monitor your SD-WAN migration
  • Continuously monitor SD-WAN performance
  • Proactively identify SD-WAN issues anywhere in your network
  • Collect the information you need to troubleshoot internally or externally

Get started with Obkio's Free Trial!

Get Started
Related Blog Categories:
How To
SD-WAN