Heart Disease Prediction using Neural Networks

Photo by Diana Pasternak on Dribbble

Heart disease refers to any condition affecting the heart. There are many types, some of which are preventable. Share on Pinterest mikroman6/Getty Images. Unlike a cardiovascular disease, which includes problems with the entire circulatory system, heart disease affects only the heart.

This project will focus on predicting heart disease using neural networks. Based on attributes such as blood pressure, cholesterol levels, heart rate, and other characteristic attributes, patients will be classified according to varying degrees of coronary artery disease. This project will utilize a dataset of 303 patients and distributed by the UCI Deep Learning Repository.

We will be using some common Python libraries, such as pandas, NumPy, and matplotlib. Furthermore, for the deep learning side of this project, we will be using sklearn and Keras.

In this project, we are going into the sequence, and follow steps one by one:

Importing necessary libraries

https://gist.github.com/staricon12/d9060566e2573273c5fd13c84f1bd80a

https://gist.github.com/staricon12/f183affd1299edcb59bf1394f263cc74

Importing the Dataset

Now, we are importing the dataset or say we are reading the dataset.

This dataset contains patient data concerning heart disease diagnosis that was collected at several locations around the world. There are 76 attributes, including age, sex, resting blood pressure, cholesterol levels, echocardiogram data, exercise habits, and many others. To data, all published studies using this data focus on a subset of 14 attributes — so we will do the same. More specifically, we will use the data collected at the Cleveland Clinic Foundation.

https://gist.github.com/staricon12/f68ca994bfc99fae12f792ba7ae65c5d

Now, we are printing the dataframe, so we can see how many examples we have.

https://gist.github.com/staricon12/8a4b7cc8997f8b9573867291d0624eb7

Now, for preprocessing the data, we remove missing data (indicated with a “?”).

https://gist.github.com/staricon12/2001d9ddcb66fc8c854d67e37a6b6f87

Now, we are dropping the rows with NaN values from DataFrame.

https://gist.github.com/staricon12/09a6de41d10447b23931b2662c402b6a

Now, we transform data to numeric to enable further analysis.

https://gist.github.com/staricon12/27ac05505aff0bb1d6093fbabd173fef

Now, we print data characteristics, using pandas built-in describe() function.

https://gist.github.com/staricon12/23de38430c81f3d60921b6d67f7be841

Now, we are plotting the histograms for each variable.

https://gist.github.com/staricon12/572d680e8c1dcba6253e3aafed494425

https://gist.github.com/staricon12/d0331a02c86d1a4b218eeab7c36b11de

https://gist.github.com/staricon12/45895dce1623cb32e09b55a84a9000b3

https://gist.github.com/staricon12/6b3dca266b7b74d587efaf92cacf2948

Create Training and Testing Datasets

Now that we have preprocessed the data appropriately, we can split it into training and testings datasets. We will use Sklearn’s train_test_split() function to generate a training dataset (80 percent of the total data) and a testing dataset (20 percent of the total data).

https://gist.github.com/staricon12/01a615b81f3585df61cad2b356efecfb

https://gist.github.com/staricon12/a941037de1578e300c07901a1bb9697d

Now, we are creating X and Y datasets for training.

https://gist.github.com/staricon12/cce93354486477b8cd186affd1a4b43d

Then, we convert the data to categorical labels.

https://gist.github.com/staricon12/4d8afe7ee0ae6980225438a862650545

Building and Training the Neural Network

Now that we have our data fully processed and split into training and testing datasets, we can begin building a neural network to solve this classification problem. Using Keras, we will define a simple neural network with one hidden layer. Since this is a categorical classification problem, we will use a softmax activation function in the final layer of our network and a categorical_crossentropy loss during our training phase.

https://gist.github.com/staricon12/86aa6bca04768a06987b22026078cb9c

Now, we fit the model to the training data.

https://gist.github.com/staricon12/8b7912cf5a4da3cfae939ef93cf7a939

Now, we are plotting the graph of model accuracy.

https://gist.github.com/staricon12/80239dcf18a9f65ebcd86f94be2e0515

Now, we are plotting the graph of model loss.

https://gist.github.com/staricon12/4057f2f7ca7646f68fee0a6040eab1ae

Improving Results — A Binary Classification Problem

Although we achieved promising results, we still have a fairly large error. This could be because it is very difficult to distinguish between the different severity levels of heart disease (classes 1–4). Let’s simplify the problem by converting the data to a binary classification problem — heart disease or no heart disease.

https://gist.github.com/staricon12/f7c607c9a36445ac20e3be3ad089f13b

Now, we define a new Keras model for binary classification, and then later we also check the model accuracy and model loss by plotting their required graphs.

https://gist.github.com/staricon12/9a74e01e4d4ee385f764c89b0e3983a9

https://gist.github.com/staricon12/3d426e55f630514e10c785c5aead1f07

Now, we plot the graph of model accuracy but this time this is for the binary classification model.

https://gist.github.com/staricon12/cb1cfbe346afcb2a743f1fdb6622b794

Now, we plot the graph of model loss.

https://gist.github.com/staricon12/1de0b0b3bce56b1b27c745567b32d3b6

Results and Metrics

The accuracy results we have been seeing are for the training data, but what about the testing dataset? If our models cannot generalize to data that wasn’t used to train them, they won’t provide any utility.

Let’s test the performance of both our categorical model and binary model. To do this, we will make predictions on the training dataset and calculate performance metrics using Sklearn.

https://gist.github.com/staricon12/87382ac3b47ff924e6886bf4db001c47

https://gist.github.com/staricon12/d3faee8376f99487cc64ff51751c3d84

Now, we save our model

https://gist.github.com/staricon12/7923f474aa47ed0faadaa920236cb80e

This is all about the heart disease prediction project.

You can download or go through the notebook from the link given here.

Credit: Bhupendra Singh Rathore