Packet Capture and Analysis¶
Learning Objective¶
In this short hands-on activity, we will make sure everyone is up to speed with Jupyter notebooks, and experiment with loading a network packet capture into a Pandas data frame.
You will achieve the following:
- Set up a Jupyter notebook
- Load packet captures into a Pandas data frame
- Manipulate data to extract basic statistics from a packet trace
This notebook assumes that you have basic experience with Python and Jupyter notebooks. If you need additional background for that, please check the additional background resources provided.
Step 1: Packet Capture¶
One of the basic sources of data when measuring networked systems is a packet capture (sometimes also called a "packet trace"). The first step of the hands-on will be to perform a small packet capture and perform some basic analysis on that trace.
- Install wireshark, a tool that performs packet capture.
Wireshark is a powerful tool to capture network traffic according to specific filters. Other assignments will explore the use of different filters.
For now, we will perform a capture with a simple filter.
- The Computer Science Department web server,
www.cs.uchicago.edu
, is at128.135.24.72
(example).
Note: This IP address may change from year to year. To find the current IP address, you can use the following command in your terminal:
nslookup www.cs.uchicago.edu
or
dig www.cs.uchicago.edu +short
Set up a capture filter for this address by typing host <IP_ADDRESS>
into the capture filter (e.g., host 128.135.24.72
).
- Start your capture by selecting the network interface that is sending and receiving your network traffic.
If you are on WiFi, this interface may be called something like en0
. From the home Wireshark page, it is often clear what the interface is because it shows some traffic activity.
Open a web browser and load
www.cs.uchicago.edu
.Stop and save your trace by clicking the stop icon and saving the file. Save the file both as a
csv
and as apcap
(notpcapng
). To save as a CSV, you should select "export dissected packets" as CSV, under the file menu.
Step 2: Basic Analysis of the Trace¶
Before pulling this data into Pandas, let's have a quick look at what is in the trace.
- How many packets did you capture?
- Approximately how long did the capture take?
- What is the approximate round trip latency (in milliseconds) between you and the web server?
- (Challenging) Were there any lost packets?
Step 3: Loading and Inspecting the Data¶
We have provided a library that uses Python's scapy library to load a pcap directly into Pandas. Yet, for convenience in the hands on, we will operate strictly with the CSV.
First, let's load the pandas library, as well as the dataset.
import pandas as pd
Enter some code to load your packet capture below into a pandas data frame called pcap
.
Let's have a quick look at the data you collected.
pcap.head(10)
No. | Time | Source | Destination | Protocol | Length | Info | |
---|---|---|---|---|---|---|---|
0 | 1 | 2022-09-28 13:07:49.340190 | 10.152.5.30 | csweb1.cs.uchicago.edu | TCP | 78 | 56064 > 443 [SYN] Seq=0 Win=65535 Len=0 MSS=14... |
1 | 2 | 2022-09-28 13:07:49.344371 | csweb1.cs.uchicago.edu | 10.152.5.30 | TCP | 66 | 443 > 56064 [SYN, ACK] Seq=0 Ack=1 Win=64240 L... |
2 | 3 | 2022-09-28 13:07:49.344467 | 10.152.5.30 | csweb1.cs.uchicago.edu | TCP | 54 | 56064 > 443 [ACK] Seq=1 Ack=1 Win=262144 Len=0 |
3 | 4 | 2022-09-28 13:07:49.344609 | 10.152.5.30 | csweb1.cs.uchicago.edu | TLSv1.2 | 571 | Client Hello |
4 | 5 | 2022-09-28 13:07:49.348257 | csweb1.cs.uchicago.edu | 10.152.5.30 | TCP | 56 | 443 > 56064 [ACK] Seq=1 Ack=518 Win=64128 Len=0 |
5 | 6 | 2022-09-28 13:07:49.348261 | csweb1.cs.uchicago.edu | 10.152.5.30 | TLSv1.2 | 1440 | Server Hello |
6 | 7 | 2022-09-28 13:07:49.348355 | 10.152.5.30 | csweb1.cs.uchicago.edu | TCP | 54 | 56064 > 443 [ACK] Seq=518 Ack=1387 Win=260736 ... |
7 | 8 | 2022-09-28 13:07:49.348398 | csweb1.cs.uchicago.edu | 10.152.5.30 | TCP | 1440 | [TCP segment of a reassembled PDU] |
8 | 9 | 2022-09-28 13:07:49.348433 | 10.152.5.30 | csweb1.cs.uchicago.edu | TCP | 54 | 56064 > 443 [ACK] Seq=518 Ack=2773 Win=260736 ... |
9 | 10 | 2022-09-28 13:07:49.349580 | csweb1.cs.uchicago.edu | 10.152.5.30 | TLSv1.2 | 1372 | CertificateServer Key Exchange, Server Hello Done |
Step 3: Basic Analysis¶
- Compute the total time in your trace.
Optional: You may have let the capture run longer than the time taken to load the web page. Take a closer look at the packet capture or the pandas dataframe to see what may be at the end of the trace if anything. See if you need to alter your computation, and make any fixes to your computation.
- Compute basic statistics of packet sizes:
- Mean
- Median
- 90th percentile
(In the assignment, we will explore some more sophisticated analysis.)
Step 4: Visualization¶
We can also visualize the data. Plot the following distributions as cumulative distribution functions:
- Packet size
- Packet interarrival time (if you have time)