Towards application identification with a novel DNS-based approach

Application-oriented view of traffic sources in the form of a sankey diagram

Today’s internet revolves more around applications and less around networks. An interesting example of this current application-oriented approach is a global outage this year[1]. Nobody remembers that AS13414 reported a down, however, many people remember that X (formerly Twitter) had slowdowns and outages affecting many international users.

In this context, network players (e.g., ISPs) have been trying for decades to understand how application traffic is delivered to end-users. Existing tools are limited and only DPI (Deep Packet Inspection) has been the dominant technology to provide such insight; however, this faces increasing challenges with encryption and scaling.

In this post, we present a BENOCS implementation of a DNS-based correlation framework, called DNS Flow Analyzer (DFA), to annotate and classify the traffic flows with information about applications (e.g., TikTok, Disney+, AmazonPrime, DAZN) and CDN domains (e.g., fastly.net, akamai.net, cloudfront.net). This novel solution allows network providers to expand their traditional network-oriented view with an application-oriented view.

A network-oriented view is not enough

A few decades ago, content providers were building big data centers to serve different Internet-based applications to end-users. In recent years, however, Content Delivery Networks (CDNs) are being used to convey the increasing demands for online applications (including video, gaming, and social networks). These media contents, riding on the top of the network, are known as Over-The-Top applications (OTT-Applications) and they use globally distributed CDNs for sending their content. Currently, large content providers leverage more than one CDN and CDNs also convey traffic of multiple OTT-Applications.

In order to work efficiently, network operators need better knowledge on how traffic from the CDNs and OTT-Applications is delivered to their end-users. However, they have historically focused on obtaining information only about Autonomous Systems (ASes), transit providers, and peers. This network-oriented approach is not enough to answer one key question: how do OTT-Applications use the different CDN domains to distribute their traffic?

An application-oriented approach with DFA

Answering the above question has been a daunting task for network actors. Existing network-focused solutions such as legacy flow tools or DPI are limited in tying traffic information to individual applications. The latter also becomes increasingly inefficient due to encryption and requires a ridiculous amount of hardware, especially working on a large scale.

At BENOCS, we have developed a methodology that includes the analysis, design, and implementation of an application identification system called DNS Flow Analyzer (DFA). DFA annotates and extends the traffic flows with domain name information, so that two new layers are effectively obtained: (i) OTT-Application domain and (ii) CDN domain.

Specifically, we propose a large-scale real-time network data correlation system that uses a set of different data sources (e.g. Netflow, BGP) but mainly it feeds on DNS streams to obtain multi-dimensional traffic information. As a result, we obtain an application-oriented view to identify how a source OTT-Application (e.g. Disney+) is delivering traffic to a network using different CDN domains (e.g., akamai.net, cloudfront.net).

DFA architecture and workflow

The high-level DFA architecture and entire workflow rely on two developed components:

  1. DNS-Netflow Correlation. The output of this component includes extended and correlated data: Netflow and a list of URLs representing a DNS domain name resolution. The sequence of events are:

1.1) Live DNS records are classified in two lists (i) DNS A/4A to map an IP address to a domain name, and (ii) DNS CNAME to map a domain name to another domain name.

1.2) In parallel, live Netflow records are captured at the network ingress interfaces. Each Netflow record contains, among others, timestamp, srcIP, dstIP, bytes, etc.

1.3) DFA looks for the srcIP of a Netflow record in the DNS A/4A list to find the domain name it corresponds to (using getName(IP)). Then, looking at the DNS CNAME list, DFA searches for the previous domain name to find the CNAME it corresponds to (using getName(Name)). The search in the CNAME list continues until no further domain names are found (or a pre-defined loop limit is reached).

Diagram of DFA architecture
  1. CDN-APP Classification. This final output extends the traffic flows with CDN domain and OTT-Application information (including BGP). See the sequence of events below:

2.1) DNS-Netflow data is correlated with BGP to gain more knowledge about the traffic paths (source AS, handover AS, nexthop AS, and destination AS).

2.2) Regarding the CDN domain, getCDN() function uses the first URL in the list of domain names and selects the second-level domain (2LD) and top-level domain (TLD). In case of the latter, this component makes use of the Public Suffix List (PSL) database[2] published by Mozilla.

2.3) This second lookup goes through the list of domain names to obtain an OTT-Application. The getAPP() function uses a URL-APP database to associate a specific domain name or URL to the OTT-Application it belongs to (e.g., dssott.com is for Disney+, pv-cdn.net is for AmazonPrime, etc.). This URL-APP is a customized/curated list that continually evolves as new sources are discovered.

DFA architecture to front end (diagram)

DFA correlates flow and DNS data to see where the network traffic originates. It identifies CDN domains and OTT-Applications within the source AS based on DNS A/CNAME records pairing. This novel and future-proof way to identify applications can be typically used by:

  • Firstline maintenance (NOC) to respond to customer complaints, which are generally about applications, not IP-addresses or ASes.
  • DFA also includes an easy-to-understand multi-dimensional dashboard with a network-oriented view (by default), having the option to unlock two new dimensions to allow the visualization of the traffic flows in an application-oriented view with various OTT-Applications and CDN domains.
Screenshot BENOCS DNS Flow Analyzer

Get in touch with us if you’d like to learn more about DNS Flow Analyzer and see it in action!

[1] https://twitter.com/TwitterSupport/status/1632792942262747136

[2] https://publicsuffix.org/

Back to media