Machine Learning for Networking
Version 0.1
Table of Contents
Chapter 1: Introduction
Chapter 2: Motivating Problems
Chapter 3: Network Data
Chapter 4: Machine Learning Pipeline
Chapter 5: Supervised Learning
Chapter 6: Unsupervised Learning
Chapter 7: Large Language Models
Chapter 8: Reinforcement Learning
Chapter 9: Deployment Considerations
Chapter 10: Looking Ahead
Appendix: Activities
Packet Capture
Packet Capture and Analysis
Security
Motivation: Network Security
Network Performance
Video Streaming: Feature Extraction from Video Streams
Data Acquisition
Data Acquisition
Feature Extraction
Training a Model
Data Preparation
Machine Learning Pipeline
Machine Learning Pipeline
Naive Bayes
Naïve Bayes Spam Classifier
Linear Regression
Logistic Regression
Logistic Regression
Model Selection/Cross Validation
Trees and Ensembles
Trees and Ensembles
Deep Learning
Deep Learning
Dimensionality Reduction
Dimensionality Reduction
Clustering
Clustering
K-means Clustering
Dimensionality Reduction for Visualization
Clustering
Density-Based Clustering
Distribution-Based Clustering
Automation
Automating Machine Learning for Networking
About The Book
About The Authors
Machine Learning for Networking
Appendix: Activities
View page source
Previous
Next
Appendix: Activities
¶
Packet Capture
¶
Packet Capture and Analysis
Learning Objective
Step 1: Packet Capture
Step 2: Basic Analysis of the Trace
Step 3: Loading and Inspecting the Data
Step 3: Basic Analysis
Step 4: Visualization
Security
¶
Motivation: Network Security
Learning Objectives
Background: Log4j Vulnerability
Part 1: Load and Explore the Traffic Traces
Sanity Checks
Basic Exploration
Feature Exploration
Thought Question
Network Performance
¶
Video Streaming: Feature Extraction from Video Streams
Learning Objectives
Step 1: Identifying Netflix Traffic from DNS
Load the Packet Capture and Identify Netflix Traffic
Filter for DNS Traffic
Filter for Netflix DNS Query IDs
Find the Associated Netflix IP Addresses
Step 2: Counting Traffic to Each Netflix Destination
Count the Number of Downstream Bytes and Packets
Count the Number of Upstream Bytes and Packets
Step 3: Inferring Segment Downloads
Thought Question
Data Acquisition
¶
Data Acquisition
Background
Step 1: Load a Packet Trace
Step 2: Generate Statistics for Each Flow
Total Number of Flows
Number of Bytes
Number of Packets
Duration
Bytes and Packets Per Second
Note
Thought Questions
Feature Extraction
¶
Training a Model
¶
Data Preparation
Load the Packet Capture Files
Convert the Packet Capture Into Flows
Explore the Flows
Extract Features for Each Dataset
Normalize the Shapes of Each Feature Set
Try Your Data on a Model
Test Your Trained Model
Bonus
Looking Ahead
Machine Learning Pipeline
¶
Machine Learning Pipeline
Convert the Packet Capture Into Flows
Evaluating a Machine Learning Model
Split into Training and Test Sets
Training Your Model
Test Your Trained Model
Confusion Matrix
Receiver Operating Characteristic
Area Under the Curve (AUC)
Thought Question
Naive Bayes
¶
Naïve Bayes Spam Classifier
Part 1: Data Preparation
Basic Data Statistics
Data Cleaning
Splitting Training and Testing Data
Sanity Check the Split
Build the Vocabulary
Word Counts for Each Message
Part 2: Naïve Bayes Classifier with sklearn
Part 3: Naïve Bayes Classifier from Scratch
Count the Occurrences of Each Word
Training the Model
Classifier
Testing the Model
Example Messages
Accuracy on Test Set
Linear Regression
¶
Logistic Regression
¶
Logistic Regression
Model Selection/Cross Validation
Hyper-Parameter Tuning: Grid Search
Trees and Ensembles
¶
Trees and Ensembles
Application to IoT Privacy
1. Import data and convert to points
2. Data cleaning
3. Convert to datetime format (optional).
4. Explore data representations
5. Represent rates as n-dimensional points
7. Associate k-dimensional points with activity labels.
Random Forest
Discussion Questions
1. Why is this attack a privacy risk?
2. How could we (IoT device programmers, network operators, etc.) protect people from this attack?
Additional Exercises
1. Adjust parameters to improve accuracy.
Deep Learning
¶
Deep Learning
Part 1: Data Preparation
Normal Traffic
Attack Traffic
Device 1 (WeMo Smart Switch) Attack Profile
Device 2 (YI Home Camera) Attack Profile
Device 3 (Android Phone) Attack Profile
Step 2: Generate Features and Labels
Step 3: Normalize Data
Step 4: Train and Test Neural Net
Step 5: Compare Against Other Machine Learning Models
Thought Questions
Dimensionality Reduction
¶
Dimensionality Reduction
Load the Packet Capture Files
Convert the Packet Capture Into Flows
Explore the Flows
Extract Features for Each Dataset
Try Your Data on a Tree-Based Model
Decision Tree
Visualize the Resulting Model
Evaluate the Model
Random Forest Classifiers
Train a Random Forest Classifier
Evaluate Your Classifier
Feature Importance
Dimensionality Reduction
Understanding Explained Variance
Understanding the Top Principal Components
Evaluate the Lower Dimension Model
Clustering
¶
Clustering
K-means Clustering
k-Means Clustering: Example
Tuning K in K-Means Clustering
Load Datasets
Dimensionality Reduction for Visualization
Clustering
K-Means Clustering
K-Means Failure Scenarios
Evaluation
Density-Based Clustering
DBSCAN
Distribution-Based Clustering
Gaussian Mixture Models
Automation
¶
Automating Machine Learning for Networking
Learning Objectives
Tasks
Requirements
Part 1: Generating nPrints from Traffic
Part 2: nPrint to Machine Learning Samples
Part 3: Training a Classifier
Understanding the Model
Part 4: Exploring Different Representations
Summary
Part 5: pcapML