Encroachment Detection System based on anomalies in Network — Using Deep Learning

Photo by Evgenia Eiter on Dribbble

Network encroachment detection systems (NEDS) are installed at a predetermined point in the network to analyse traffic from all connected devices.
It monitors all subnet traffic and compares it to a database of known threats.
An alarm can be issued to the administrator whenever an assault has been detected or strange behaviour has been discovered.

Data-set

https://www.kaggle.com/sampadab17/network-intrusion-detection

A data set including a wide range of intrusions simulated in a military network environment was supplied for auditing at the above mentioned URL. By mimicking a typical US Air Force LAN, it established an environment in which raw TCP/IP dump data for a network could be acquired. The LAN was focused as if it were a real setting, and various attacks were launched. A connection is a series of TCP packets that begin and stop at a specific time interval and allow data to flow from a source IP address to a target IP address using a well-defined protocol. In addition, each link is classified as either normal or an attack, with only one attack kind. Each connection record is around 100 bytes long.

From normal and attack data, 41 quantitative and qualitative features (3 qualitative and 38 quantitative features) are extracted for each TCP/IP connection. There are two types of classes in the class variable:
• Normal
• Anomalous

Data Visualization:

count per class

We can infer from the above graph, that the data is almost balanced.

Histograms of values of each column are displayed below:

histograms for each column

Data Pre-Processing:

We need to standardize the input data set as there are large differences between ranges of each feature.

https://gist.github.com/dheerajskylark/e2657bf63766284945c98bec31a3d4b5

Next, we have to encode categorical attributes,

https://gist.github.com/dheerajskylark/63c38ddfae29e8c3664fcf96515cd30b

Feature Selection:

Feature selection is a fundamental topic in machine learning that has a significant influence on your model’s performance. The data attributes you use to train your machine learning models have a significant impact on the results you can get.

https://gist.github.com/dheerajskylark/3cf03a6c9eb852132dd5707f0d9e84e7

feature importance

https://gist.github.com/dheerajskylark/9d244a0dcac403b848168899a80609e1

[\'src_bytes\',
\'dst_bytes\',
\'logged_in\',
\'count\',
\'srv_count\',
\'same_srv_rate\',
\'diff_srv_rate\',
\'dst_host_srv_count\',
\'dst_host_same_srv_rate\',
\'dst_host_diff_srv_rate\',
\'dst_host_same_src_port_rate\',
\'dst_host_srv_diff_host_rate\',
\'protocol_type\',
\'service\',
\'flag\']

Model Building:

A neural network is built as follows.

https://gist.github.com/dheerajskylark/61e5737a32c837fe890539e1761e97c8

Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_18 (Dense) (None, 8) 128
_________________________________________________________________
dense_19 (Dense) (None, 8) 72
_________________________________________________________________
dense_20 (Dense) (None, 1) 9
=================================================================
Total params: 209
Trainable params: 209
Non-trainable params: 0
_________________________________________________________________

Model Training

Model is fitted over pre-processed data, with epochs = 20

https://gist.github.com/dheerajskylark/50c2c725f794efa3b8eb0a168a5c5d3e

Epoch 1/20
395/395 [==============================] - 1s 2ms/step - loss: 0.3710 - accuracy: 0.8672 - val_loss: 0.1879 - val_accuracy: 0.9286
Epoch 2/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1695 - accuracy: 0.9361 - val_loss: 0.1595 - val_accuracy: 0.9378
Epoch 3/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1579 - accuracy: 0.9429 - val_loss: 0.1499 - val_accuracy: 0.9398
Epoch 4/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1513 - accuracy: 0.9451 - val_loss: 0.1443 - val_accuracy: 0.9422
Epoch 5/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1466 - accuracy: 0.9463 - val_loss: 0.1410 - val_accuracy: 0.9466
Epoch 6/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1435 - accuracy: 0.9475 - val_loss: 0.1380 - val_accuracy: 0.9548
Epoch 7/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1411 - accuracy: 0.9500 - val_loss: 0.1359 - val_accuracy: 0.9554
Epoch 8/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1392 - accuracy: 0.9501 - val_loss: 0.1383 - val_accuracy: 0.9466
Epoch 9/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1380 - accuracy: 0.9525 - val_loss: 0.1324 - val_accuracy: 0.9594
Epoch 10/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1350 - accuracy: 0.9530 - val_loss: 0.1322 - val_accuracy: 0.9554
Epoch 11/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1336 - accuracy: 0.9542 - val_loss: 0.1301 - val_accuracy: 0.9596
Epoch 12/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1319 - accuracy: 0.9558 - val_loss: 0.1286 - val_accuracy: 0.9604
Epoch 13/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1308 - accuracy: 0.9555 - val_loss: 0.1285 - val_accuracy: 0.9588
Epoch 14/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1306 - accuracy: 0.9559 - val_loss: 0.1268 - val_accuracy: 0.9602
Epoch 15/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1282 - accuracy: 0.9574 - val_loss: 0.1266 - val_accuracy: 0.9622
Epoch 16/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1269 - accuracy: 0.9569 - val_loss: 0.1271 - val_accuracy: 0.9576
Epoch 17/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1269 - accuracy: 0.9564 - val_loss: 0.1310 - val_accuracy: 0.9530
Epoch 18/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1266 - accuracy: 0.9573 - val_loss: 0.1229 - val_accuracy: 0.9640
Epoch 19/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1241 - accuracy: 0.9591 - val_loss: 0.1251 - val_accuracy: 0.9602
Epoch 20/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1231 - accuracy: 0.9577 - val_loss: 0.1213 - val_accuracy: 0.9620

A training accuracy of ~96% is achieved with built network.

Epochs vs Accuracy graph is as plotted below for the considered model:

No.of Epochs vs Accuracy for both Training and Validation

Epochs vs Loss graph is as plotted below:

No.of Epochs vs Loss for both Training and Validation

Evaluation:

https://gist.github.com/dheerajskylark/148e7dbf8329749dc4d24bb3d5390fad

Confusion Matrix (0-Anomaly, 1-Nornal)

Scores:

https://gist.github.com/dheerajskylark/fcfe12f9c892bdf08af7ee94ecd25479

============================== ANN Model Test Results ==============================
Model Accuracy:
0.9657316750463085

Classification report:
precision recall f1-score support
           0       0.98      0.95      0.96      3498
1 0.95 0.98 0.97 4060
    accuracy                           0.97      7558
macro avg 0.97 0.96 0.97 7558
weighted avg 0.97 0.97 0.97 7558

Accuracy on test data is ~96%

Predictions for Test data:

https://gist.github.com/dheerajskylark/f47e0214ae06381e495c2f687790c1a0

Anomaly
Anomaly
Normal
Anomaly
Anomaly
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Anomaly
Anomaly
Normal
Normal
Normal
Normal
.
.
.
Normal
Normal
Normal
Normal
Normal
Anomaly
Anomaly
Anomaly
Normal
Normal
Normal
Anomaly
Anomaly
Normal
Normal
Normal
Normal
Anomaly
Anomaly
Normal
Normal
Normal
Normal
Anomaly
Anomaly

Conclusion:

A deep learning model to detect Encroachments in networks is built with accuracy of ~96%.

Platform : cAInvas

Code: Here

Written By: Dheeraj Perumandla