Is the Internet Going to Break?

by Dave Michels

We are placing unprecedented demand on our network and computing services. The coronavirus is spreading and with it comes illness, deaths, and fear. The experts advise us to stay home with clean hands, and for most of us that means increased online activity. Many are being forced into work from home for the first time. It’s putting local, regional, national, and global stress on our online services.

Thirty years ago our connectivity was more diverse. We connected the phone to the phone company; our TVs had an antenna, satellite, or cable connection; and our computers used a modem, DSL, cable, or ISDN connection. Today, all of that is usually delivered over the Internet — usually a wired broadband connection or 4G. The Internet is all that we need now, and works pretty well under normal conditions. Coronavirus ended normal. Additionally, we connected to a variety of independent services. Today, a big chunk of the Internet is hosted on three providers: AWS, Google Cloud, and Microsoft Azure.

Sites That Should Be Fine

Office Productivity: Cloud-delivered services for office productivity, specifically Office 365 and GSuite, should do well. As far as these services are concerned, all users are remote. There’s little difference if you access them from the office or home. I suppose there will be increased usage (more communications) and new customers (more users), but not in terms of of as significant percentage.

Travel Sites: Travel is declining, so there’s a bit of a spike in cancellations. New bookings are down. There’s probably more of an impact to customer service telephone lines than online activities because no one understands airline cancellation policies. Cancelling a hotel reservation is relatively simple, and who cares about car rental reservations. Uber and Lyft should be fine. In fact, there will be shorter waiting periods. Many people expected ride hailing stocks to go up as crowded public transportation becomes less desirable. However, they are seeing declines because there’s a greater drop in overall traffic due to reduced airport pickups, everything closing, and people staying home.

News Sites: News sites, especially newspapers, are doing well. I assume they were designed for major spikes in news, and presenting text is not as complex as interactive applications. Many sites also present video, but usually don’t host it. Even if video was causing problems, they would quickly remove it as necessary.

Potential Problems

IaaS Providers: The big three, AWS, Google Cloud, and Microsoft Azure are being tested unlike ever before. There is a perception that these infrastructure providers have infinite capacity but that’s an illusion. There’s two factors: the design/architecture of the app/service in question and actual capacity and resources. We are going to hit the limits of both. I don’t expect there will be big outages (AWS down globally), but I do expect a few straws-that-break-the-camel scattered outages. I’m more confident in Amazon and Google as their consumer facing services are so reliable. O365 is mostly reliable (Teams was down just recently because of an expired security certificate). All said and done, I expect IaaS to shine through this pandemic (actually stronger and in perception).

Conferencing Services: A similar story with meeting/conferencing providers. The major video and audio-conferencing providers are experiencing unprecedented demand and things are breaking. Zoom, Webex, and Teams have had scattered issues over the past week. These services are likely playing Whack-a-mole with capacity bottlenecks. Overall, the services are doing quite well. Of course there any many other options too including most UCaaS providers, Highfive, Bluejeans Networks, LifeSize, and Poly. It’s also a great time to stop using telephone dial-up with conferencing. IP voice solutions are more reliable and better sounding.

Financial Trading Sites: We are seeing big spikes at these financial sites as customers try to time their buying and selling (mostly selling). Though orders can be placed at any time, trading only occurs during a small set of  hours. Trading systems are highly secure which complicates scalability. Expect limited services or delayed services during high-trading days. Trading sites should be more responsive at night.

VPNs: This was an unexpected bottleneck. Not all “cloud” services come from IaaS providers. Best practices for access to corporate hosted data (onsite or in private data centers) involves VPNs. That creates a data center capacity challenge when you suddenly send all the employees home. This is a complex one, and most employees don’t understand it. For example, there’s no reason to access O365 or GSuite through a VPN (other than convenience). Doing so unnecessarily consumes VPN capacity. Some organizations limit VPN services to corporate-owned laptops causing a sudden surge in demand for more (corporate-owed) laptops (and for more VPN capacity). VPN design and capacity could be the big bottleneck to enabling WFH.

Entertainment Sites: Facebook and Twitter should be ok, but expect issues at video streaming sites (Netflix, YouTube, Disney+, etc.) and major game playing sites. Schools are closed, so entertainment sites are going to be much busier than usual. Yes, there’s less school in summer, but there’s also more distractions. People as being asked to stay home this time. Even winter destinations such as ski resorts are shutting down. Major movies are being delayed (and theaters are closed anyway). Disney sees an opportunity and has decided to release Frozen 2 three months ahead of schedule on Disney+. I will be very surprised if they can handle the load.

ISP Capacity: Same story – capacity is designed for expected workloads and this is unexpected. I don’t expect a lot of capacity issues with services such as residential broadband cable. The bigger challenges will be around wireless access and international services. For example, a conference that includes participants from around the world requires a reliable conference provider AND reliable connectivity to each location. The math works against us. Consider that two required components, each with 90% reliability, equates to a service with 81% reliability. These numbers are low to exaggerate how multiple components result with decreased reliability. It’s the proverbial weakest link thing as ISPs around the world are dealing with increased traffic.