ml-systems

Machine Learning for Computer Systems

View the Project on GitHub noise-lab/ml-systems

Class Project

Project. The course will have a small project. You will have the following options for completing the project:

  1. ML/Net Leaderboard. nPrint/pcapML; your goal will be to re-produce or extend some of the best-known machine learning results for various applications of machine learning in computer networking. The nprint project has a leaderboard with links to datasets and the best-known results for each of these problems. You are encouraged to follow the directions on that page.

  2. Extensions to Existing Libraries. We have learned about various libraries in class, in particular the NetML and nPrint/pcapML libraries. These libraries, however, are works in progress, in terms of feature representations (how features are padded and interpolated, summary statistics, etc.). You could add different feature representations to these datasets and libraries and experiment with whether your representations result in greater model accuracy.

  3. Open Problem/Research. You are welcome to work on an independent research project that involves machine learning and computer systems. This option is probably better suited for graduate students in computer science who are comfortable working on open-ended problems. Your project must be approved by the instructor, based on a concrete research proposal.

    Some ideas of open-ended problem areas:

    • the systems costs of different features
    • the transferability of a model trained on one dataset transferred to another domain (e.g., campus to ISP, IoT to research network, etc.)
    • how a model “drifts” over time (using packet captures captured at different times, or over time).
    • time-series anomaly detection methods for unlabeled data

Parameters

Deliverables

I expect each project to be turned in as follows:

  1. Clean Jupyter notebook, that I can run, end-to-end, with your analysis and code inline.
  2. A project report, formatted in Sphinx, that I can read, end-to-end, with the relevant parts of your code inline. (How To)

The notebook must run, and it needs to be clean. Presentation of your results is as important as achieving them. To test that your notebook runs, make sure you try “restart kernel and run all” before turning it in. Also make sure you have included (or point to!) all data files needed to run your notebook.

The writeup has no “expected length”, but it needs to be clear. Imagine that you want to use this as a portfolio for a job interview (something you did as part of a class), or something I could show to future classes as a cool project you did (or, that I could use for a future demonstration in the class).

Grading Rubric