Encroachment Detection System based on anomalies in Network

Network encroachment detection systems (NEDS) are installed at a predetermined point in the network to analyze traffic from all connected devices.

It monitors all subnet traffic and compares it to a database of known threats. An alarm can be issued to the administrator whenever an assault has been detected or strange behaviour has been discovered.

Data-set

https://www.kaggle.com/sampadab17/network-intrusion-detection

A data set including a wide range of intrusions simulated in a military network environment was supplied for auditing at the above mentioned URL. By mimicking a typical US Air Force LAN, it established an environment in which raw TCP/IP dump data for a network could be acquired.

The LAN was focused as if it were a real setting, and various attacks were launched. A connection is a series of TCP packets that begin and stop at a specific time interval and allow data to flow from a source IP address to a target IP address using a well-defined protocol.

In addition, each link is classified as either normal or an attack, with only one attack kind. Each connection record is around 100 bytes long.

From normal and attack data, 41 quantitative and qualitative features (3 qualitative and 38 quantitative features) are extracted for each TCP/IP connection.

There are two types of classes in the class variable:
• Normal
• Anomalous

Data Visualization:

We can infer from the above graph, that the data is almost balanced.

Histograms of values of each column are displayed below:

Data Pre-Processing:

We need to standardize the input data set as there are large differences between ranges of each feature.

Next, we have to encode categorical attributes,

Feature Selection:

Feature selection is a fundamental topic in machine learning that has a significant influence on your model’s performance. The data attributes you use to train your machine learning models have a significant impact on the results you can get.

[\'src_bytes\',
 \'dst_bytes\',
 \'logged_in\',
 \'count\',
 \'srv_count\',
 \'same_srv_rate\',
 \'diff_srv_rate\',
 \'dst_host_srv_count\',
 \'dst_host_same_srv_rate\',
 \'dst_host_diff_srv_rate\',
 \'dst_host_same_src_port_rate\',
 \'dst_host_srv_diff_host_rate\',
 \'protocol_type\',
 \'service\',
 \'flag\']

Model Building:

A neural network is built as follows.

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_18 (Dense)             (None, 8)                 128       
_________________________________________________________________
dense_19 (Dense)             (None, 8)                 72        
_________________________________________________________________
dense_20 (Dense)             (None, 1)                 9         
=================================================================
Total params: 209
Trainable params: 209
Non-trainable params: 0
_________________________________________________________________

Model Training

The model is fitted over pre-processed data, with epochs = 20

Epoch 1/20
395/395 [==============================] - 1s 2ms/step - loss: 0.3710 - accuracy: 0.8672 - val_loss: 0.1879 - val_accuracy: 0.9286
Epoch 2/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1695 - accuracy: 0.9361 - val_loss: 0.1595 - val_accuracy: 0.9378
Epoch 3/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1579 - accuracy: 0.9429 - val_loss: 0.1499 - val_accuracy: 0.9398
Epoch 4/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1513 - accuracy: 0.9451 - val_loss: 0.1443 - val_accuracy: 0.9422
Epoch 5/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1466 - accuracy: 0.9463 - val_loss: 0.1410 - val_accuracy: 0.9466
Epoch 6/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1435 - accuracy: 0.9475 - val_loss: 0.1380 - val_accuracy: 0.9548
Epoch 7/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1411 - accuracy: 0.9500 - val_loss: 0.1359 - val_accuracy: 0.9554
Epoch 8/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1392 - accuracy: 0.9501 - val_loss: 0.1383 - val_accuracy: 0.9466
Epoch 9/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1380 - accuracy: 0.9525 - val_loss: 0.1324 - val_accuracy: 0.9594
Epoch 10/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1350 - accuracy: 0.9530 - val_loss: 0.1322 - val_accuracy: 0.9554
Epoch 11/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1336 - accuracy: 0.9542 - val_loss: 0.1301 - val_accuracy: 0.9596
Epoch 12/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1319 - accuracy: 0.9558 - val_loss: 0.1286 - val_accuracy: 0.9604
Epoch 13/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1308 - accuracy: 0.9555 - val_loss: 0.1285 - val_accuracy: 0.9588
Epoch 14/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1306 - accuracy: 0.9559 - val_loss: 0.1268 - val_accuracy: 0.9602
Epoch 15/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1282 - accuracy: 0.9574 - val_loss: 0.1266 - val_accuracy: 0.9622
Epoch 16/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1269 - accuracy: 0.9569 - val_loss: 0.1271 - val_accuracy: 0.9576
Epoch 17/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1269 - accuracy: 0.9564 - val_loss: 0.1310 - val_accuracy: 0.9530
Epoch 18/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1266 - accuracy: 0.9573 - val_loss: 0.1229 - val_accuracy: 0.9640
Epoch 19/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1241 - accuracy: 0.9591 - val_loss: 0.1251 - val_accuracy: 0.9602
Epoch 20/20
395/395 [==============================] - 1s 2ms/step - loss: 0.1231 - accuracy: 0.9577 - val_loss: 0.1213 - val_accuracy: 0.9620

A training accuracy of ~96% is achieved with built network.

Epochs vs Accuracy graph is as plotted below for the considered model:

No.of Epochs vs Accuracy for both Training and Validation

Epochs vs Loss graph is as plotted below:

No.of Epochs vs Loss for both Training and Validation

Evaluation:

Scores:

============================== ANN Model Test Results ==============================

Model Accuracy:
 0.9657316750463085

Classification report:
               precision    recall  f1-score   support

           0       0.98      0.95      0.96      3498
           1       0.95      0.98      0.97      4060

    accuracy                           0.97      7558
   macro avg       0.97      0.96      0.97      7558
weighted avg       0.97      0.97      0.97      7558

Accuracy on test data is ~96%

Predictions for Test data:

Anomaly
Anomaly
Normal
Anomaly
Anomaly
Normal
Normal
Normal
Normal
Normal
Normal
Normal
Anomaly
Anomaly
Normal
Normal
Normal
Normal
.
.
.
Normal
Normal
Normal
Normal
Normal
Anomaly
Anomaly
Anomaly
Normal
Normal
Normal
Anomaly
Anomaly
Normal
Normal
Normal
Normal
Anomaly
Anomaly
Normal
Normal
Normal
Normal
Anomaly
Anomaly

Conclusion:

A deep learning model to detect Encroachments in networks is built with an accuracy of ~96%.

Platform: cAInvas

Code: Here

Written By: Dheeraj Perumandla

Also Read: Spider Breed Classification with Cainvas

Encroachment Detection System based on anomalies in Network — Using Deep Learning