What are self-operated Meta CDNs, and should ISPs be concerned?

The information in this post comes from the paper “Dissecting Apple’s Meta-CDN during an iOS update.” To read the full study, please click here.

Before 2014, Apple – one of the largest content generators on the web today – relied on external content delivery networks (CDNs), such as Akamai and Level 3, to deliver everything from music/video streaming to iOS updates.  In 2014 Apple released their CDN as an effort to take control over the quality of their content delivery as well as creating the final puzzle piece that gives them control over the entire customer experience (hardware, online platforms, ect.). Interestingly, as Dan Rayburn predicted, Apple was in no hurry to convert all of their traffic to their own CDN and would still need some time before they completely stopped offloading traffic onto third-party CDNs.

A 2017 study supported by Benocs shows that Apple, three years later, still relies on third-party CDNs, such as Akamai and Limelight, to deliver its iOS updates. Why? Because, when a company as large as Apple needs to deliver an operating system update multiple times per year to their over 1 billion devices, it needs to find a way to handle overload to supplement its own infrastructure’s limitations. Therefore, they rely on self-operated Meta CDNs to carry their traffic. No big deal, right? Wrong. As this study shows, traffic is not running as smoothly as originally thought.

Before we talk about the main issue, let us first look into the evolution of CDNs. As the internet continues to expand with more content and users, the more CDNs are challenged with providing their users the fastest delivery speeds possible while, at the same time, building as little infrastructure as possible. In order to solve this, CDNs are getting closer to their users in the form of Meta-CDNs – multihoming content amongst multiple CDNs, therefore having access to servers holding content closer to their users. This means that CDNs are publishing content on multiple CDNs thus, requiring additional request mapping – ways to see the additional servers holding the content. Therefore, it is not just CDNs moving the traffic for Apple, but rather Meta-CDNs. This collaboration of multiple CDNs carrying the traffic of a single CDN is thus called a self-operated Meta-CDN, due to its ability to direct traffic both to its own infrastructure or to a third-party CDN.

If self-operated Meta-CDNs exist in the network to help companies such as Apple provide their customers with the smoothest iOS update possible, then why is this a problem? Well, according to this study, by looking at the behavior of a self-operating Meta-CDN through the eyes of the internet service provider (ISP), it actually causes more chaos in the network than one would expect. Given that this type of CDN is rather new, not much is known about them to begin with.

By observing the iOS update in September 2017 through the eyes of a major European ISP, researchers found the following behaviors to occur:

  • If the Apple CDN is selected by the DNS-resolver (aka the traffic director), the traffic will move through Apples infrastructure.
  • If a third-party CDN is selected by the DNS-resolver, Apple will offload the traffic onto third-party CDNs.

Since Apple’s infrastructure is not as developed in Europe as it is in North America, when reaching their devices on a global scale, using CDNs such as Akamai – who have a global infrastructure – gives Apple the advantage of reaching their customers with ease.

The consequence this has for the ISPs is the amount of strain put on the network. By offloading the traffic, the ISP is unable to predict how much traffic to expect on its links, given they are unable to see which CDN the overarching Meta-CDN selects as well as how much traffic each CDN is carrying. On top of that, the individual CDNs are carrying traffic further than necessary, which creates overflow – where traffic is being forced to take longer paths.

Why is this kind of unpredictability risky? Because links that were originally thought to be unaffected are actually over capacity, which causes perilous behavior in the network. Therefore, making it necessary for ISPs to further investigate their assumptions about how the network is actually behaving during such high-stress situations such as operating system updates.