Video Streaming: Feature Extraction from Video Streams¶

In this assignment, you will explore a capture of a Netflix video stream. The packet capture itself has some additional traffic beyond Netflix traffic, and so part of the exercise involves filtering the traffic to include only the Netflix traffic.

Learning Objectives¶

In this hands-on activity, you will learn how to:

  • Identify service types using DNS
  • Calculate network counters
  • Infer video segment downloads

Step 1: Identifying Netflix Traffic from DNS¶

One of the challenges with packet captures is that they often contain a mix of traffic from devices, destinations, and applications. When diagnosing performance problems with a particular service, often the first challenge is identifying and extracting the subset of traffic corresponding to that service.

In this exercise, we will use the domain name system lookups to a set of domains that we know are associated with Netflix to identify the IP addresses (and thus, the traffic flows) that are associated with Netflix.

In [1]:
import dnslib

NF_DOMAINS = (["nflxvideo", 
              "netflix", 
              "nflxso", 
              "nflxext"])

Load the Packet Capture and Identify Netflix Traffic¶

First, load the traffic capture and inspect it.

In [2]:
import pandas as pd

ndf = pd.read_csv("data/netflix.csv.gz")
ndf.head(20)
Out[2]:
No. Time Source Destination Protocol Length Info
0 1 2018-02-11 08:10:00.534682 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 77 Standard query 0xed0c A fonts.gstatic.com
1 2 2018-02-11 08:10:00.534832 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 77 Standard query 0x301a AAAA fonts.gstatic.com
2 3 2018-02-11 08:10:00.539408 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 87 Standard query 0x11d3 A googleads.g.doubleclic...
3 4 2018-02-11 08:10:00.541204 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 87 Standard query 0x1284 AAAA googleads.g.doublec...
4 5 2018-02-11 08:10:00.545785 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 78 Standard query 0x3432 AAAA ytimg.l.google.com
5 6 2018-02-11 08:10:00.547036 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 96 Standard query 0xb756 A r4---sn-gxo5uxg-jqbe.g...
6 7 2018-02-11 08:10:00.547156 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 75 Standard query 0x62ab A ssl.gstatic.com
7 8 2018-02-11 08:10:00.547249 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 74 Standard query 0x42fb A www.google.com
8 9 2018-02-11 08:10:00.853950 ns-vip-pro.paris.inria.fr 192.168.43.72 DNS 386 Standard query response 0x11d3 A 216.58.213.162
9 10 2018-02-11 08:10:00.853970 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 75 Standard query 0x8756 A www.gstatic.com
10 11 2018-02-11 08:10:00.854433 ns-vip-pro.paris.inria.fr 192.168.43.72 DNS 389 Standard query response 0x301a AAAA 2a00:1450:...
11 12 2018-02-11 08:10:00.854620 ns-vip-pro.paris.inria.fr 192.168.43.72 DNS 346 Standard query response 0x62ab A 172.217.18.195
12 13 2018-02-11 08:10:00.854641 ns-vip-pro.paris.inria.fr 192.168.43.72 DNS 400 Standard query response 0xb756 A 193.51.224.143
13 14 2018-02-11 08:10:00.854676 ns-vip-pro.paris.inria.fr 192.168.43.72 DNS 398 Standard query response 0x1284 AAAA 2a00:1450:...
14 15 2018-02-11 08:10:00.859945 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 84 Standard query 0xdd32 A www.googleadservices.com
15 16 2018-02-11 08:10:00.861944 192.168.43.72 par10s38-in-f3.1e100.net TCP 78 58443 > 443 [SYN] Seq=0 Win=65535 Len=0 MSS=14...
16 17 2018-02-11 08:10:00.876320 ns-vip-pro.paris.inria.fr 192.168.43.72 DNS 377 Standard query response 0xed0c A 216.58.209.227
17 18 2018-02-11 08:10:00.883409 ns-vip-pro.paris.inria.fr 192.168.43.72 DNS 354 Standard query response 0x3432 AAAA 2a00:1450:...
18 19 2018-02-11 08:10:00.890223 ns-vip-pro.paris.inria.fr 192.168.43.72 DNS 338 Standard query response 0x42fb A 216.58.209.228
19 20 2018-02-11 08:10:00.890679 192.168.43.72 ns-vip-pro.paris.inria.fr DNS 75 Standard query 0xf38d A www.youtube.com

Filter for DNS Traffic¶

Next, write an expression that filters the dataframe to include only DNS traffic.

Filter for Netflix DNS Query IDs¶

Because you are looking for the IP addresses that are associated with Netflix traffic, you need to match the responses of the corresponding DNS lookups to the queries that contain Netflix domains. You can link them with the transaction ID in the DNS traffic.

Find the Associated Netflix IP Addresses¶

Get the IP addresses associated with all Netflix traffic in the trace.

Step 2: Counting Traffic to Each Netflix Destination¶

An important feature for inferring video quality of experience is the throughput of each flow in the video stream. To compute throughput, we need to divide the number of bytes transferred per unit time.

As a first step towards computing that feature, count the number of packets and bytes, in each direction, to each Netflix IP address in the trace.

Count the Number of Downstream Bytes and Packets¶

Count the Number of Upstream Bytes and Packets¶

Step 3: Inferring Segment Downloads¶

Another important feature that can be used in inferring video quality of experience is the number of segments per unit time. In this step we will infer the number of segments downloaded per unit time for each IP address.

The number of segments can be determined by counting the number of continuous downstream transfers separated by a packet with a payload of zero bytes. For the last step, compute the number of segment downloads from each Netflix IP address.

Thought Question¶

In this exercise, we used the domain name system (DNS) lookup traffic to identify Netflix traffic. This approach can work in practice but is far from perfect, for a number of reasons:

  • Domain names can change over time.
  • DNS traffic is becoming increasingly encrypted, making it difficult to see domain name lookups and responses.

This turns the problem of service identification (i.e., identifying Netflix traffic itself) into an inference/machine learning problem. What features can you think of that could work for developing a model that can perform this type of inference?