Hands-On 1: Python and Packet Capture¶

Learning Objective¶

In this short hands-on activity, we will make sure everyone is up to speed with Jupyter notebooks, and experiment with loading a network packet capture into a Pandas data frame.

You will achieve the following:

  • Set up a Jupyter notebook
  • Load packet captures into a Pandas data frame
  • Manipulate data to extract basic statistics from a packet trace

This notebook assumes that you have basic experience with Python and Jupyter notebooks. If you need additional background for that, please check the additional background resources provided.

Step 1: Packet Capture¶

One of the basic sources of data when measuring networked systems is a packet capture (sometimes also called a "packet trace"). The first step of the hands-on will be to perform a small packet capture and perform some basic analysis on that trace.

  1. Install wireshark, a tool that performs packet capture.

Wireshark is a powerful tool to capture network traffic according to specific filters. Other assignments will explore the use of different filters.

For now, we will perform a capture with a simple filter.

  1. The Computer Science Department web server, www.cs.uchicago.edu, is at 128.135.24.72. Set up a capture filter for this address by typing host 128.135.24.72 into the capture filter.

  2. Start your capture by selecting the network interface that is sending and receiving your network traffic.

If you are on WiFi, this interface may be called something like en0. From the home Wireshark page, it is often clear what the interface is because it shows some traffic activity.

  1. Open a web browser and load www.cs.uchicago.edu.

  2. Stop and save your trace by clicking the stop icon and saving the file. Save the file both as a csv and as a pcap (not pcapng). To save as a CSV, you should select "export dissected packets" as CSV, under the file menu.

Step 2: Basic Analysis of the Trace¶

Before pulling this data into Pandas, let's have a quick look at what is in the trace.

  1. How many packets did you capture?
  2. Approximately how long did the capture take?
  3. What is the approximate round trip latency (in milliseconds) between you and the web server?
  4. (Challenging) Were there any lost packets?

Step 3: Loading and Inspecting the Data¶

We have provided a library that uses Python's scapy library to load a pcap directly into Pandas. Yet, for convenience in the hands on, we will operate strictly with the CSV.

First, let's load the pandas library, as well as the dataset.

In [1]:
import pandas as pd

Enter some code to load your packet capture below into a pandas data frame called pcap.

Let's have a quick look at the data you collected.

In [3]:
pcap.head(10)
Out[3]:
No. Time Source Destination Protocol Length Info
0 1 2022-09-28 13:07:49.340190 10.152.5.30 csweb1.cs.uchicago.edu TCP 78 56064 > 443 [SYN] Seq=0 Win=65535 Len=0 MSS=14...
1 2 2022-09-28 13:07:49.344371 csweb1.cs.uchicago.edu 10.152.5.30 TCP 66 443 > 56064 [SYN, ACK] Seq=0 Ack=1 Win=64240 L...
2 3 2022-09-28 13:07:49.344467 10.152.5.30 csweb1.cs.uchicago.edu TCP 54 56064 > 443 [ACK] Seq=1 Ack=1 Win=262144 Len=0
3 4 2022-09-28 13:07:49.344609 10.152.5.30 csweb1.cs.uchicago.edu TLSv1.2 571 Client Hello
4 5 2022-09-28 13:07:49.348257 csweb1.cs.uchicago.edu 10.152.5.30 TCP 56 443 > 56064 [ACK] Seq=1 Ack=518 Win=64128 Len=0
5 6 2022-09-28 13:07:49.348261 csweb1.cs.uchicago.edu 10.152.5.30 TLSv1.2 1440 Server Hello
6 7 2022-09-28 13:07:49.348355 10.152.5.30 csweb1.cs.uchicago.edu TCP 54 56064 > 443 [ACK] Seq=518 Ack=1387 Win=260736 ...
7 8 2022-09-28 13:07:49.348398 csweb1.cs.uchicago.edu 10.152.5.30 TCP 1440 [TCP segment of a reassembled PDU]
8 9 2022-09-28 13:07:49.348433 10.152.5.30 csweb1.cs.uchicago.edu TCP 54 56064 > 443 [ACK] Seq=518 Ack=2773 Win=260736 ...
9 10 2022-09-28 13:07:49.349580 csweb1.cs.uchicago.edu 10.152.5.30 TLSv1.2 1372 CertificateServer Key Exchange, Server Hello Done

Step 3: Basic Analysis¶

  1. Compute the total time in your trace.

Optional: You may have let the capture run longer than the time taken to load the web page. Take a closer look at the packet capture or the pandas dataframe to see what may be at the end of the trace if anything. See if you need to alter your computation, and make any fixes to your computation.

  1. Compute basic statistics of packet sizes:
  • Mean
  • Median
  • 90th percentile

(In the assignment, we will explore some more sophisticated analysis.)

Step 4: Visualization¶

We can also visualize the data. Plot the following distributions as cumulative distribution functions:

  • Packet size
  • Packet interarrival time (if you have time)