Hi-MLIC

Hierarchical Multilayer Lightweight Intrusion Classification
for Various Intrusion Scenarios

Hi-MLIC

Abstract

The need to develop an effective system to detect and classify intrusions in extensive network data exchanges is increasing. We propose Hi-MLIC, a hierarchical multilayer lightweight intrusion classification model that addresses various intrusion types. To address more kinds of intrusion scenarios and validate efficient data formats for intrusion detection, we consolidated packet capture data from two popular benchmark datasets into a new dataset with the two different original dataset formats. We introduced a hierarchical multilayer approach to reduce the misclassification rate of intrusion types caused by an imbalance between benign and malicious data. Layer-1 separates network traffic into malicious and benign. Layer-2 classifies malicious traffic into four groups, and Layer-3 identifies 23 specific intrusion types. By performing misclassification analysis and eliminating unnecessary features, we not only improved performance in relation to non-hierarchical classification but also reduced model complexity. Our model achieved a recall metric performance of up to 98.8%.

Pipeline

Layer-1 functions as a malicious detector, detecting whether the traffic is malicious. The model predominantly learns to differentiate between malicious and benign traffic. Layer-2 operates as a NIST standard classifier, categorizing malicious traffic into four categories: Access, DoS, Malware, and Reconnaissance. Layer-3, within the previously classified four intrusion categories, a further subdivision into 23 specific intrusion types takes place.

Getting Started

Installation

git clone https://github.com/CSID-DGU/Hi-MLIC.git
cd HiMLIC
pip install -e .

Download Datasets

dataset.md

Download Models

model.md

Execution

Create directories named "data" and "model" under the Pipeline directory.
Under the data directory, store X and y data for each layer in the following tree structure.
Under the model directory, store a total of 6 models for each layer in the following tree structure.
Execute the HiMLIC.py file.

Directory Tree

Pipeline/
│
├── data/
│   ├── X_data.csv
│   ├── L1_y_data.csv
│   ├── L2_y_data.csv
│   └── L3_y_data.csv
│
├── model/
│   ├── L1_model.pkl
│   ├── L2_model.pkl
│   ├── L3_1_model.pkl
│   ├── L3_2_model.pkl
│   ├── L3_3_model.pkl
│   └── L3_4_model.pkl
│
├── HiMLIC.py
└── utils.py

Data Preprocessing

Format Consolidation

Download PCAP Datasets

CICFlowMeter

Download the CICFlowMeter from the following link.
- CICFlowMeter
Execute the CICFlowMeter to generate the dataset.

Argus and Bro

Download the Argus and Bro from the following link.
- Argus
- Bro
Execute the Argus and Bro to generate the dataset.
Convert the Argus and Bro datasets to csv format.
Consolidate the Argus and Bro datasets into a single dataset.

Encoding

Create directories named "data" under the Encoding directory.
Under the data directory, store consolidated data. If the dataset is consolidated with the CICFlowMeter dataset, name the file "CICI.csv". If the dataset is consolidated with the Argus and Bro datasets, name the file "UNSW.csv".
Execute the Encoding.py file. (You can specify the input and output file names as arguments.)

python Encoding.py consolidated_data.csv encoded_data.csv

The data will be encoded and stored in the same directory as consolidated_data.csv.

Directory Tree

Encoding/
│
├── data/
│   └── consolidated_data.csv (CICI.csv or UNSW.csv)
|   └── encoded_data.csv
│
├── Encoding.py
└── utils.py

Feature Selection

Execution

Create directories named "data" and "model" under the FeatureSelection directory.
Under the data directory, store the encoded data.
Execute the FeatureSelection.py file.

python FeatureSelection.py encoded_data.csv

The feature importance will be calculated and printed.

Best Model Selection

Execution

Create directories named "data" under the BestModelSelection directory.
Under the data directory, store the encoded data.

Directory Tree

BestModels/
│
├── data/
│   └── X_train.csv
│   └── X_test.csv
│   └── y_train.csv
│   └── y_test.csv
│
├── GridSearch.py
└── utils.py

Execute the GridSearch.py file.

python GridSearch.py filename

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hierarchical Multilayer Lightweight Intrusion Classification
for Various Intrusion Scenarios

Contents

Hi-MLIC

Abstract

Pipeline

Getting Started

Installation

Download Datasets

Download Models

Execution

Directory Tree

Data Preprocessing

Format Consolidation

Download PCAP Datasets

CICFlowMeter

Argus and Bro

Encoding

Directory Tree

Feature Selection

Execution

Best Model Selection

Execution

Directory Tree

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Appendix		Appendix
BestModels		BestModels
Encoding		Encoding
FeatureSelection		FeatureSelection
Pipeline		Pipeline
README.md		README.md
dataset.md		dataset.md
model.md		model.md

CSID-DGU/Hi-MLIC

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Multilayer Lightweight Intrusion Classification for Various Intrusion Scenarios

Contents

Hi-MLIC

Abstract

Pipeline

Getting Started

Installation

Download Datasets

Download Models

Execution

Directory Tree

Data Preprocessing

Format Consolidation

Download PCAP Datasets

CICFlowMeter

Argus and Bro

Encoding

Directory Tree

Feature Selection

Execution

Best Model Selection

Execution

Directory Tree

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Hierarchical Multilayer Lightweight Intrusion Classification
for Various Intrusion Scenarios

Packages