Driver Drowsiness Detection using CNN

Drowsy Driving is a deadly combination of driving and sleepiness. The number of road accidents due to Drowsy Driving is increasing at an alarming rate worldwide. Not having a proper sleep is the main reason behind drowsiness while driving.

However, other reasons like sleep disorders, medication, alcohol consumption, or driving during night shifts can also cause drowsiness while driving.

According to a report of AIIMS Neurology India —

Sleep Disorders became the reason behind around more than 20% of all road accidents and around 23% of truck drivers have sleep deprivations.

A separate report by National Highway Traffic Safety Administration, USA, states that —

Drowsy Driving was responsible for around 72,000 crashes, 44,000 injuries and 800 deaths in 2013.

Whatever be the reason for drowsiness, the fatalities due to drowsy driving are increasing every year.

This article presents a solution for driver drowsiness detection using a Convolutional Neural Network. The implementation of the project uses a custom CNN architecture with less than 250K trainable parameters for easy deployment on edge or computationally less efficient devices.

As a result, the driver can be alerted at the right time if the system detects that the driver has fallen asleep before anything dangerous happens.

The Cainvas Platform is used for implementation, which provides seamless execution of python notebooks for building AI systems that can eventually be deployed on edge (i.e. an embedded system such as compact MCUs).
The notebook can be found here.

The flow of the article is as follows: –

Description of the Problem Statement
CNN Model Architecture
Drowsiness Detection Dataset
Training the Model
Performance of the Model
Testing the Model
Building Pipeline for Predictions on Full Face Images
Conclusion

Description of the Problem Statement

The project aims at detecting drowsiness while driving to alert the driver at the right time to prevent any mishappening. The project uses a CNN model to predict whether a person feels drowsy or not based on whether the eyes are closed or open.

The project’s main objective was to limit the number of trainable parameters of the CNN model to under 250K so that the system can be deployed on edge or computationally less efficient devices. The project has a direct application in the automobile industry, makes drive safer, and reduces the death toll caused by drowsy driving.

Building the CNN Model

The implementation uses a custom-designed Convolutional Neural Network that has the following characteristics —

Three Convolution Blocks having 2, 3, and 3 convolutional layers, respectively.
A BatchNormalization Layer follows each Convolution Layer.
A Dropout Layer follows each Convolution Block for avoiding overfitting and a MaxPool Layer.
3 Fully Connected Layers follow convolution Layers for classification.

Defining the model using Keras —

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1 (Conv2D)               (None, 32, 32, 32)        896       
_________________________________________________________________
batch_normalization (BatchNo (None, 32, 32, 32)        128       
_________________________________________________________________
conv2 (Conv2D)               (None, 32, 32, 32)        9248      
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 32)        128       
_________________________________________________________________
dropout (Dropout)            (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32)        0         
_________________________________________________________________
conv3 (Conv2D)               (None, 16, 16, 64)        18496     
_________________________________________________________________
batch_normalization_2 (Batch (None, 16, 16, 64)        256       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
conv4 (Conv2D)               (None, 8, 8, 64)          36928     
_________________________________________________________________
batch_normalization_3 (Batch (None, 8, 8, 64)          256       
_________________________________________________________________
dropout_1 (Dropout)          (None, 8, 8, 64)          0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
conv5 (Conv2D)               (None, 4, 4, 64)          36928     
_________________________________________________________________
batch_normalization_4 (Batch (None, 4, 4, 64)          256       
_________________________________________________________________
conv6 (Conv2D)               (None, 4, 4, 64)          36928     
_________________________________________________________________
batch_normalization_5 (Batch (None, 4, 4, 64)          256       
_________________________________________________________________
conv7 (Conv2D)               (None, 4, 4, 64)          36928     
_________________________________________________________________
batch_normalization_6 (Batch (None, 4, 4, 64)          256       
_________________________________________________________________
dropout_2 (Dropout)          (None, 4, 4, 64)          0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 2, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 256)               0         
_________________________________________________________________
fc1 (Dense)                  (None, 128)               32896     
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
fc2 (Dense)                  (None, 128)               16512     
_________________________________________________________________
dropout_4 (Dropout)          (None, 128)               0         
_________________________________________________________________
fc3 (Dense)                  (None, 2)                 258       
=================================================================
Total params: 227,554
Trainable params: 226,786
Non-trainable params: 768
_________________________________________________________________

The model was compiled with the Adam optimizer and a learning rate of 0.0001.

Drowsiness Detection Dataset

The project uses the Drowsiness_dataset present on the Kaggle platform. The dataset is present on this link. The original dataset contains four classes for classifying images into Open Eyes, Closed Eyes, Yawning, or No-Yawning.

However, this project’s scope is to classify drowsiness based on whether the eyes are closed or open. So, I will be using only two classes of the dataset. Characteristics of the dataset are as follows —

The dataset contains a total of 1452 images in two categories.
Each category has 726 images.
The dataset is already balanced, so no need to balance the dataset.
Class Labels — ‘Open Eye’ and ‘Closed Eye’.
Class Labels were encoded such that 0 represents Open Eye and 1 illustrates Closed Eye.

Loading the dataset —

Preprocess the images to make the size of each image equal to (32, 32, 3). Then, the dataset is split into Train and Test Set in 80%-20% proportion.

Training the Model

Model training runs for a total of 200 epochs with a batch size of 128. ImageDataGenerator is used for randomizing the training images for better performance of the model.

Performance of the Model

The following Performance Metrics are used —

Loss vs Number of Epochs Plot
Accuracy vs Number of Epochs Plot
Classification Report
Confusion Matrix

The results of the metrics mentioned above are as follows —

Classification Report —

precision    recall  f1-score   support

           0       0.99      0.98      0.99       169
           1       0.98      0.99      0.98       122

    accuracy                           0.99       291
   macro avg       0.98      0.99      0.99       291
weighted avg       0.99      0.99      0.99       291

Confusion Matrix —

Testing the Model

The predictions of the model on images of eyes can be seen in the following pictures —

Building a Pipeline for Predictions on Full Face Images

This project’s last and final step is to build a pipeline for making predictions on full-face images. The channel includes Face detection, face alignment, eyes detection, preprocessing the ROI of the image, passing to the model for prediction and displaying results on the image.

The implementation is as follows —

The implementation uses the ‘dlib’ library for face detection in the image. Face Alignment is done using FaceAlignment class of imutils.face_utils library for better eye detection. Eye detection is performed using Haar Cascade Classifiers.

Finally, the results can be viewed as follows —

Conclusion

Driver Drowsiness is a significant reason for thousands of road accidents all over the world. Driver drowsiness detection is a car safety technology that helps prevent accidents caused by the driver getting drowsy.

The project aims at providing a solution of Driver Drowsiness Detection using CNN and image processing. The project aimed at optimizing the model to limit the number of parameters under 250k for easy deployment on edge devices.

This deployment is possible through the Cainvas Platform by making use of their compiler called deepC. Thus effectively bringing AI out on edge — in actual and physical real-world use cases.

Notebook link is here.

Credit: YUVNISH MALHOTRA