In this short hands-on activity, we will make sure everyone is up to speed with Jupyter notebooks, and experiment with loading a network packet capture into a Pandas data frame.
You will achieve the following:
This notebook assumes that you have basic experience with Python and Jupyter notebooks. If you need additional background for that, please check the additional background resources provided.
One of the basic sources of data when measuring networked systems is a packet capture (sometimes also called a "packet trace"). The first step of the hands-on will be to perform a small packet capture and perform some basic analysis on that trace.
Wireshark is a powerful tool to capture network traffic according to specific filters. Other assignments will explore the use of different filters.
For now, we will perform a capture with a simple filter.
The Computer Science Department web server, www.cs.uchicago.edu
, is at 128.135.24.72
. Set up a capture filter for this address by typing host 128.135.24.72
into the capture filter.
Start your capture by selecting the network interface that is sending and receiving your network traffic.
If you are on WiFi, this interface may be called something like en0
. From the home Wireshark page, it is often clear what the interface is because it shows some traffic activity.
Open a web browser and load www.cs.uchicago.edu
.
Stop and save your trace by clicking the stop icon and saving the file. Save the file both as a csv
and as a pcap
(not pcapng
). To save as a CSV, you should select "export dissected packets" as CSV, under the file menu.
Before pulling this data into Pandas, let's have a quick look at what is in the trace.
We have provided a library that uses Python's scapy library to load a pcap directly into Pandas. Yet, for convenience in the hands on, we will operate strictly with the CSV.
First, let's load the pandas library, as well as the dataset.
import pandas as pd
Enter some code to load your packet capture below into a pandas data frame called pcap
.
Let's have a quick look at the data you collected.
pcap.head(10)
No. | Time | Source | Destination | Protocol | Length | Info | |
---|---|---|---|---|---|---|---|
0 | 1 | 2022-09-28 13:07:49.340190 | 10.152.5.30 | csweb1.cs.uchicago.edu | TCP | 78 | 56064 > 443 [SYN] Seq=0 Win=65535 Len=0 MSS=14... |
1 | 2 | 2022-09-28 13:07:49.344371 | csweb1.cs.uchicago.edu | 10.152.5.30 | TCP | 66 | 443 > 56064 [SYN, ACK] Seq=0 Ack=1 Win=64240 L... |
2 | 3 | 2022-09-28 13:07:49.344467 | 10.152.5.30 | csweb1.cs.uchicago.edu | TCP | 54 | 56064 > 443 [ACK] Seq=1 Ack=1 Win=262144 Len=0 |
3 | 4 | 2022-09-28 13:07:49.344609 | 10.152.5.30 | csweb1.cs.uchicago.edu | TLSv1.2 | 571 | Client Hello |
4 | 5 | 2022-09-28 13:07:49.348257 | csweb1.cs.uchicago.edu | 10.152.5.30 | TCP | 56 | 443 > 56064 [ACK] Seq=1 Ack=518 Win=64128 Len=0 |
5 | 6 | 2022-09-28 13:07:49.348261 | csweb1.cs.uchicago.edu | 10.152.5.30 | TLSv1.2 | 1440 | Server Hello |
6 | 7 | 2022-09-28 13:07:49.348355 | 10.152.5.30 | csweb1.cs.uchicago.edu | TCP | 54 | 56064 > 443 [ACK] Seq=518 Ack=1387 Win=260736 ... |
7 | 8 | 2022-09-28 13:07:49.348398 | csweb1.cs.uchicago.edu | 10.152.5.30 | TCP | 1440 | [TCP segment of a reassembled PDU] |
8 | 9 | 2022-09-28 13:07:49.348433 | 10.152.5.30 | csweb1.cs.uchicago.edu | TCP | 54 | 56064 > 443 [ACK] Seq=518 Ack=2773 Win=260736 ... |
9 | 10 | 2022-09-28 13:07:49.349580 | csweb1.cs.uchicago.edu | 10.152.5.30 | TLSv1.2 | 1372 | CertificateServer Key Exchange, Server Hello Done |
Optional: You may have let the capture run longer than the time taken to load the web page. Take a closer look at the packet capture or the pandas dataframe to see what may be at the end of the trace if anything. See if you need to alter your computation, and make any fixes to your computation.
(In the assignment, we will explore some more sophisticated analysis.)
We can also visualize the data. Plot the following distributions as cumulative distribution functions: