Credit Card Fraud Detection-Using Deep Learning

Photo by XPLAI on Dribbble

It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.

The fraud usually happens when someone obtains your credit or debit card numbers through unprotected websites or through an identity theft scheme in order to get money or property fraudulently. Because of the frequency with which it occurs and the potential harm it may bring to both individuals and financial institutions, it is critical to take preventative steps as well as recognize when a transaction is fraudulent.

Data-set

https://www.kaggle.com/mlg-ulb/creditcardfraud

Data-set can be downloaded from the above link.

  • 492 frauds out of 284,807 transactions
  • features V1 — V28 are a result of the PCA transformation and are simply numerical representations
  • “Amount” is the value in dollars of the transaction
  • “Time” variable is the amount of time that passed from the time when the first transaction took place.
  • Fraud = 1 , Not Fraud = 0

Importing data-set can be done through the following lines of code:

https://gist.github.com/dheerajskylark/5e856c99c251ca0d17e7db7c37228e6c

Data Visualization:

Count per class label is visualized as below:

count per class

Histograms of values of each column are displayed below:

histograms for each column

Balancing the Data-set

As the data-set is highly imbalanced, there is a need for us to balance it, in order to get classes to close proximity.

https://gist.github.com/dheerajskylark/0267b3ca22fb793d03626f38d7b9d2c2

Visualization if data after balancing

Standardization

We need to standardize the input data set as there are large differences between ranges of each feature.

https://gist.github.com/dheerajskylark/5d860d6765d08dbbb9ebb3d768d8b2a1

Reshaping data

The data is reshaped into 3-Dimentional data

https://gist.github.com/dheerajskylark/7fb7210c4ae17e0f9125f5c852d9275a

Model Building:

A CNN neural network is built as follows.

https://gist.github.com/dheerajskylark/d7ebf4728f1227a126f3037e19b21bcb

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d (Conv1D) (None, 29, 32) 96
_________________________________________________________________
dropout (Dropout) (None, 29, 32) 0
_________________________________________________________________
batch_normalization (BatchNo (None, 29, 32) 128
_________________________________________________________________
conv1d_1 (Conv1D) (None, 28, 64) 4160
_________________________________________________________________
dropout_1 (Dropout) (None, 28, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 1792) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 1792) 0
_________________________________________________________________
dense (Dense) (None, 64) 114752
_________________________________________________________________
dropout_3 (Dropout) (None, 64) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 65
=================================================================
Total params: 119,201
Trainable params: 119,137
Non-trainable params: 64
_________________________________________________________________

Model Training

Model is fitted over scaled and reshaped data, with epochs = 20

https://gist.github.com/dheerajskylark/22276a10c0c904248e7edd85978e7eaa

Epoch 1/20
113/113 [==============================] - 0s 4ms/step - loss: 0.3254 - accuracy: 0.8984 - val_loss: 0.3610 - val_accuracy: 0.9800
Epoch 2/20
113/113 [==============================] - 0s 3ms/step - loss: 0.1309 - accuracy: 0.9683 - val_loss: 0.1611 - val_accuracy: 0.9844
Epoch 3/20
113/113 [==============================] - 0s 3ms/step - loss: 0.1125 - accuracy: 0.9722 - val_loss: 0.0813 - val_accuracy: 0.9844
Epoch 4/20
113/113 [==============================] - 0s 3ms/step - loss: 0.1001 - accuracy: 0.9763 - val_loss: 0.0610 - val_accuracy: 0.9855
Epoch 5/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0950 - accuracy: 0.9758 - val_loss: 0.0533 - val_accuracy: 0.9855
Epoch 6/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0891 - accuracy: 0.9777 - val_loss: 0.0506 - val_accuracy: 0.9889
Epoch 7/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0846 - accuracy: 0.9780 - val_loss: 0.0494 - val_accuracy: 0.9889
Epoch 8/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0935 - accuracy: 0.9772 - val_loss: 0.0489 - val_accuracy: 0.9889
Epoch 9/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0859 - accuracy: 0.9783 - val_loss: 0.0481 - val_accuracy: 0.9889
Epoch 10/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0786 - accuracy: 0.9786 - val_loss: 0.0487 - val_accuracy: 0.9900
Epoch 11/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0773 - accuracy: 0.9791 - val_loss: 0.0485 - val_accuracy: 0.9889
Epoch 12/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0748 - accuracy: 0.9811 - val_loss: 0.0473 - val_accuracy: 0.9889
Epoch 13/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0704 - accuracy: 0.9816 - val_loss: 0.0474 - val_accuracy: 0.9900
Epoch 14/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0798 - accuracy: 0.9763 - val_loss: 0.0466 - val_accuracy: 0.9889
Epoch 15/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0720 - accuracy: 0.9811 - val_loss: 0.0461 - val_accuracy: 0.9900
Epoch 16/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0745 - accuracy: 0.9808 - val_loss: 0.0469 - val_accuracy: 0.9900
Epoch 17/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0710 - accuracy: 0.9800 - val_loss: 0.0461 - val_accuracy: 0.9900
Epoch 18/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0722 - accuracy: 0.9816 - val_loss: 0.0467 - val_accuracy: 0.9900
Epoch 19/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0734 - accuracy: 0.9816 - val_loss: 0.0458 - val_accuracy: 0.9889
Epoch 20/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0675 - accuracy: 0.9819 - val_loss: 0.0458 - val_accuracy: 0.9900

A training accuracy of ~98% is achieved with built network.

Epochs vs Accuracy graph is as plotted below for the considered model:

No.of Epochs vs Accuracy for both Training and Validation

Epochs vs Loss graph is as plotted below:

No.of Epochs vs Loss for both Training and Validation

Evaluation:

https://gist.github.com/dheerajskylark/c5839d618015c3ef74968638931f4702

Confusion Matrix (0-Not Fraud, 1-Fraud)

https://gist.github.com/dheerajskylark/323d6d76aff3c4b47b7d14746d71521f

0.9899888765294772

Test accuracy of ~99% is achieved.

Calculating other metrics:

https://gist.github.com/dheerajskylark/e0a509a047eefddddc6ff0dae3749a4e

precision: [0.99001248 0.98979592]
recall: [0.99874055 0.92380952]
fscore: [0.99435737 0.95566502]
support: [794 105]

Conclusion:

A deep learning model to detect Credit card Fraud is built with accuracy of ~98%.

Platform : cAInvas

Code: Here

Credit: Dheeraj Perumandla