A significant problem that arises in computer networking is the detection of timeseries anomalies. Specifically, data is captured over time, and network operators may wish to detect when metrics from the data deviate from "normal" behavior.
This activity involves performing timeseries anomaly detection with network performance data captured from a project at University of Chicago, to continually measure network performance metrics, including throughput, latency, DNS lookup times, and so forth.
In this hands-on, you will combine various anomaly detection approaches with time-series models including:
You might also try some of the following methods:
and apply them to detect anomalous performance in network measurements of latency and bandwidth data from 85 Chicago households. The data for this hands-on activity is available here.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
This pickle file has measurement data of various types, from different IDs, to different destinations. The data is also aggregated in various ways, as indicated in the method column.
df = pd.read_pickle('data/5m_data.pkl')
df.sort_values(by='Time_clean', inplace=True)
df.head(5)
| Time | RTT | Measurement | ID | Destination | Method | Time_clean | |
|---|---|---|---|---|---|---|---|
| 2532502 | 2021-09-30 19:00 | 8.775 | ping_latency | nm-mngd-20210623-1b49a7fa | Avg | 2021-09-30 19:00:00 | |
| 2432051 | 2021-09-30 19:00 | 10.977 | ping_latency | nm-mngd-20210511-9d9824d4 | Avg | 2021-09-30 19:00:00 | |
| 2504516 | 2021-09-30 19:00 | 9.244 | ping_latency | nm-mngd-20210519-84594993 | Avg | 2021-09-30 19:00:00 | |
| 2451551 | 2021-09-30 19:00 | 9.510 | ping_latency | nm-mngd-20210518-075ab2f0 | Avg | 2021-09-30 19:00:00 | |
| 2536760 | 2021-09-30 19:00 | 9.634 | ping_latency | nm-mngd-20210623-d9aab537 | Avg | 2021-09-30 19:00:00 |
Compute an hour-by-hour average RTT for each destination.
def split(df):
gb = df.groupby(['Destination'])
return [gb.get_group(x) for x in gb.groups]
df_split = split(df)
df_split_clean = []
for i in range(len(df_split)):
df_split_clean.append(df_split[i].set_index('Time_clean').resample('H').mean(numeric_only=True).interpolate(limit = 300, direction = "forward", method = "linear").interpolate(method='linear', axis=0).ffill().bfill())
df_split_clean
[ RTT
Time_clean
2021-09-30 19:00:00 34.275405
2021-09-30 20:00:00 34.631601
2021-09-30 21:00:00 34.432748
2021-09-30 22:00:00 35.768035
2021-09-30 23:00:00 36.075542
... ...
2022-02-28 13:00:00 38.242081
2022-02-28 14:00:00 39.605813
2022-02-28 15:00:00 39.191060
2022-02-28 16:00:00 43.936399
2022-02-28 17:00:00 41.158659
[3623 rows x 1 columns],
RTT
Time_clean
2021-09-30 19:00:00 34.802956
2021-09-30 20:00:00 35.417021
2021-09-30 21:00:00 35.777643
2021-09-30 22:00:00 36.562641
2021-09-30 23:00:00 37.100275
... ...
2022-02-28 13:00:00 38.889235
2022-02-28 14:00:00 39.203723
2022-02-28 15:00:00 39.830655
2022-02-28 16:00:00 45.174415
2022-02-28 17:00:00 41.694703
[3623 rows x 1 columns],
RTT
Time_clean
2021-09-30 19:00:00 26.132487
2021-09-30 20:00:00 27.109839
2021-09-30 21:00:00 26.890154
2021-09-30 22:00:00 28.306056
2021-09-30 23:00:00 28.859042
... ...
2022-02-28 13:00:00 30.168679
2022-02-28 14:00:00 31.661348
2022-02-28 15:00:00 31.664851
2022-02-28 16:00:00 39.208561
2022-02-28 17:00:00 33.262585
[3623 rows x 1 columns],
RTT
Time_clean
2021-09-30 19:00:00 10.662082
2021-09-30 20:00:00 11.861280
2021-09-30 21:00:00 12.066776
2021-09-30 22:00:00 13.008655
2021-09-30 23:00:00 12.227444
... ...
2022-02-28 13:00:00 12.590375
2022-02-28 14:00:00 13.207759
2022-02-28 15:00:00 14.037368
2022-02-28 16:00:00 20.439953
2022-02-28 17:00:00 15.960016
[3623 rows x 1 columns],
RTT
Time_clean
2021-09-30 19:00:00 9.071586
2021-09-30 20:00:00 10.307427
2021-09-30 21:00:00 7.937359
2021-09-30 22:00:00 9.504431
2021-09-30 23:00:00 11.833100
... ...
2022-02-28 13:00:00 12.578887
2022-02-28 14:00:00 14.300915
2022-02-28 15:00:00 13.528169
2022-02-28 16:00:00 20.140555
2022-02-28 17:00:00 16.066531
[3623 rows x 1 columns],
RTT
Time_clean
2021-09-30 19:00:00 8.486978
2021-09-30 20:00:00 8.487558
2021-09-30 21:00:00 8.707344
2021-09-30 22:00:00 11.370707
2021-09-30 23:00:00 9.904454
... ...
2022-02-28 13:00:00 12.727393
2022-02-28 14:00:00 13.166132
2022-02-28 15:00:00 14.130768
2022-02-28 16:00:00 23.193074
2022-02-28 17:00:00 16.203313
[3623 rows x 1 columns],
RTT
Time_clean
2021-09-30 19:00:00 8.867304
2021-09-30 20:00:00 8.332049
2021-09-30 21:00:00 9.048469
2021-09-30 22:00:00 10.902218
2021-09-30 23:00:00 11.669923
... ...
2022-02-28 13:00:00 12.163240
2022-02-28 14:00:00 13.047743
2022-02-28 15:00:00 12.813666
2022-02-28 16:00:00 16.323061
2022-02-28 17:00:00 15.214958
[3623 rows x 1 columns],
RTT
Time_clean
2021-09-30 19:00:00 11.843677
2021-09-30 20:00:00 12.655091
2021-09-30 21:00:00 13.644007
2021-09-30 22:00:00 12.787838
2021-09-30 23:00:00 12.416958
... ...
2022-02-28 13:00:00 12.564133
2022-02-28 14:00:00 14.010658
2022-02-28 15:00:00 13.559028
2022-02-28 16:00:00 17.598709
2022-02-28 17:00:00 16.040162
[3623 rows x 1 columns],
RTT
Time_clean
2021-09-30 19:00:00 18.355639
2021-09-30 20:00:00 28.862259
2021-09-30 21:00:00 25.952895
2021-09-30 22:00:00 20.599838
2021-09-30 23:00:00 14.535092
... ...
2022-02-28 13:00:00 13.455604
2022-02-28 14:00:00 14.706076
2022-02-28 15:00:00 15.228845
2022-02-28 16:00:00 21.131463
2022-02-28 17:00:00 16.582371
[3623 rows x 1 columns],
RTT
Time_clean
2021-09-30 19:00:00 28.298308
2021-09-30 20:00:00 29.271809
2021-09-30 21:00:00 29.655916
2021-09-30 22:00:00 30.600200
2021-09-30 23:00:00 30.250808
... ...
2022-02-28 13:00:00 33.059379
2022-02-28 14:00:00 33.396985
2022-02-28 15:00:00 34.415056
2022-02-28 16:00:00 41.848794
2022-02-28 17:00:00 36.197371
[3623 rows x 1 columns],
RTT
Time_clean
2021-09-30 19:00:00 8.692370
2021-09-30 20:00:00 8.741290
2021-09-30 21:00:00 8.531924
2021-09-30 22:00:00 11.173869
2021-09-30 23:00:00 9.863508
... ...
2022-02-28 13:00:00 12.747330
2022-02-28 14:00:00 13.564087
2022-02-28 15:00:00 13.848568
2022-02-28 16:00:00 22.951587
2022-02-28 17:00:00 15.585128
[3623 rows x 1 columns]]
The function below performs seasonal decomposition using moving averages. Also, calculate the Moving Average, One and Two Standard Deviations.
Count and plot the number of anomalies to date.