Diabetes Prediction

Diabetes is a serious problem which many people face nowadays and which can lead to other serious health diseases.During the period of Covid-19 we also came to know that the conditions of a diabetic patient is much more critical than a non-diabetic patient.So if we can take help of deep learning to predict the risk of getting diabetic and early prediction of diabetic will help people take care of their health and prevent themselves from getting diabetic.

Photo by EPM Graduation Chamber, Paulista School of Medicine

Table of Content

  • Introduction to cAInvas
  • Importing the Dataset
  • Data Analysis and Data Cleaning
  • Trainset-TestSet Creation
  • Model Architecture and Model Training
  • Introduction to DeepC
  • Compilation with DeepC

Introduction to cAInvas

cAInvas is an integrated development platform to create intelligent edge devices.Not only we can train our deep learning model using Tensorflow,Keras or Pytorch, we can also compile our model with its edge compiler called DeepC to deploy our working model on edge devices for production.The Diabetes Prediction model which we are going to talk about, is also developed on cAInvas. All the dependencies which you will be needing for this project are also pre-installed.

cAInvas also offers various other deep learning notebooks in its gallery which one can use for reference or to gain insight about deep learning.It also has GPU support and which makes it the best in its kind.

Importing the dataset

While working on cAInvas one of its key features is UseCases Gallary.When working on any of its UseCases you don’t have to look for data manually.The data is in the of table and is present as csv format.We will load the dataset through pandas as a dataframe in our workspace.

https://gist.github.com/Gunjan933/e5f577b74012a844456f07a5100800d3

Data Analysis and Data Cleaning

To gain information about the data that we are dealing with, we will use the command df.info() and we get the following information.

RangeIndex: 2000 entries, 0 to 1999
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 2000 non-null int64
1 Glucose 2000 non-null int64
2 BloodPressure 2000 non-null int64
3 SkinThickness 2000 non-null int64
4 Insulin 2000 non-null int64
5 BMI 2000 non-null float64
6 DiabetesPedigreeFunction 2000 non-null float64
7 Age 2000 non-null int64
8 Outcome 2000 non-null int64
dtypes: float64(2), int64(7)

Next we will remove the duplicate data, replace zero with NaN, then drop all NaN values from dataframe and again obtain the information.For this we can run the following commands;

https://gist.github.com/Gunjan933/9e5784a2b0107bec48e17d6e6c15fde2

TrainSet-TestSet Creation

Next step is to create the train data and test data.For this we will drop the ‘outcome’ column from the dataframe and store it in a variable.For the labels we will use the ‘outcome’ column and store it in another variable.Then we will use the scikit learn’s train-test split module to create the train and test dataset.

https://gist.github.com/Gunjan933/b8f6d6f0bd1c06d7e7797a22ba825813

Model Architecture and Model Training

After creating the dataset next step is to pass our training data for our Deep Learning model to learn to classify Diabetic and Non-Diabetic Pateients.The model architecture used was:

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 8) 72
_________________________________________________________________
dense_1 (Dense) (None, 8) 72
_________________________________________________________________
dense_2 (Dense) (None, 4) 36
_________________________________________________________________
dense_3 (Dense) (None, 1) 5
=================================================================
Total params: 185
Trainable params: 185
Non-trainable params: 0
_________________________________________________________________

The loss function used was “binary_crossentropy” and optimizer used was “Adam”.For training the model we used Keras API with tensorflow at backend.The model showed good performance achieving a decent accuracy.Here are the training plots for the model:

Introduction to DeepC

DeepC Compiler and inference framework is designed to enable and perform deep learning neural networks by focussing on features of small form-factor devices like micro-controllers, eFPGAs, cpus and other embedded devices like raspberry-pi, odroid, arduino, SparkFun Edge, risc-V, mobile phones, x86 and arm laptops among others.

DeepC also offers ahead of time compiler producing optimized executable based on LLVM compiler tool chain specialized for deep neural networks with ONNX as front end.

Compilation with DeepC

While training the model, we saved the best model using the Keras’ Model Checkpoint and saved it in H5 format using Keras as it easily stores the weights and model configuration in a single file.

After saving the file in H5 format we can easily compile our model using DeepC compiler which comes as a part of cAInvas platform so that it converts our saved model to a format which can be easily deployed to edge devices.And all this can be done very easily using a simple command.

https://gist.github.com/Gunjan933/f9299a839a42654a512ee6724e59cf4a

And that’s it, it is that easy to create a Diabetes Prediction model which is also ready for deployment on edge devices.

Link for the cAInvas Notebook : https://cainvas.ai-tech.systems/use-cases/diabetes-prediction-app/

Credit : Ashish Arya