Machine Learning

This repository is a collection of various machine learning projects.

Malware Classification

This project provides a highly scalable way of implementing and testing malware classification using Support Vector Machines with Stochastic Gradient Descent. It additionally realizes One-vs-One and One-vs-All multiclass classifiers. The report shows the full process of creating them, as well as various experimentation stages with classifier parameters.

Setup

Install all the libraries specified in requirements.txt
If needed add the feature_vectors.zip and sha256_family.csv data files to 'malware_classification/corpus/'
The script will test a generic SVM and show accuracy.
To see One-vs-One and One-vs-All model performance and confusion matrices, uncomment the lines accordingly. To find the matrices, navigate to 'malware_classification/graphs'
To run the project ensure you are in the root directory (the directory containing the malware_classification folder) and run the following in the terminal:

pip install -r malware_classification/docs/requirements.txt
python3 malware_classification/classifier/main.py

Hyperparameter Search

main.py contains the grid search that can be used to find optimal hyperparameters.

Anomaly Detection using Clustering

This project provides an easy way to implement and test anomaly detection using clustering algorithms. Current iteration implements DBScan and K-Means algorithms. The report shows the full process of creating them, as well as various experimentation stages with hyperparameters. Final results are shown using somewhat neat graphs.

Setup

Install all the libraries specified in requirements.txt
If needed, adjust the data list in main() in main.py. This is not needed for the assignment - everything is already set up by default.
The script will show accuracy, true positive rate, false positive rate and F1 scores for the given clusterer. It also draws the graphs in graphs

pip install -r anomaly_detection/docs/requirements.txt
python3 anomaly_detection/clustering/main.py

Value Search and Graphs

If you need to find a parameter value that works best for a clustering algorithm, utilize the other methods in main.

Spam Detector Classification

This project provides an easy way to implement and test spam detection using classifiers. Current iteration realizes Naive Bayes and K-Nearest Neighbors classifiers. The report shows the full process of creating them, as well as various experimentation stages with classifier parameters. Final results are shown using neat graphs.

Setup

Install all the libraries specified in requirements.txt
If needed, adjust the classifiers dictionary or the data path in main() in main.py. This is not needed for the assignment - everything is already set up by default.
The script will show accuracy, precision, recall and F1 scores for the given classifiers. It also puts the results results.txt
Note that you will need additional libraries not included in requirements.txt if you plan to run value search or plotting methods. These are included in optional_requirements.txt

pip install -r spam_classification/docs/requirements.txt
python3 spam_classification/classifier/main.py

Value Search and Graphs

If you need to find a parameter value that works best for a classifier, you are out of luck - the code was refactored and value tester no longer works. make_graph() still works for old testing data.
Once testing is done, make_graph.py provides a way to plot line graphs based on test results. Disclaimer: the method currently only looks at the first value that it finds to be changing in a subset of the test results. It first looks at Stop Word ‰ Removed. If it changes, it will not look at other parameters! If not, it goes one by one through parameters until it finds the first one that changes.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
anomaly_detection		anomaly_detection
iot_classification		iot_classification
malware_classification		malware_classification
spam_classification		spam_classification
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning

Malware Classification

Setup

Hyperparameter Search

Anomaly Detection using Clustering

Setup

Value Search and Graphs

Spam Detector Classification

Setup

Value Search and Graphs

About

Releases

Packages

Contributors 3

Languages

License

evil-cry/cyber-analytics-ml

Folders and files

Latest commit

History

Repository files navigation

Machine Learning

Malware Classification

Setup

Hyperparameter Search

Anomaly Detection using Clustering

Setup

Value Search and Graphs

Spam Detector Classification

Setup

Value Search and Graphs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages