Resume Screening using Deep Learning on Cainvas

November 16, 2022

Resume Screening is necessary when companies receive thousands of applications for different roles and need to find suitable matches.

For this project, the dataset originally consists of 2 columns — Category and Resume, where the Category denotes the field (eg: Data Science, HR, Testing etc.). By using the resume as an input, we need to classify it into one of the categories

Content –

Analysing the Dataset
Pre-processing
Tokenize features and label
Training model
Evaluation

Analysis of the Dataset –

By using value_counts on Category, we can find the frequency-wise distribution of different categories present in our dataset.

resume[\'Category\'].value_counts()

We can visualize the most common set of words in all of the resumes from our dataset by using nltk and wordcloud.

The following wordcloud is obtained –

Pre-processing the Resumes –

During pre-processing, we need to remove links, hashtags, urls etc. as these are irrelevant in the resume. Further, using nltk, we also remove stopwords (for eg words like ‘are’, ‘the’, ‘or’) that provide no significance to the content.

Tokenizing Features and Labels –

After cleaning, pre-processing and splitting the data into train:test, we need to tokenize features and labels such that the most frequent words are given less weightage and the less frequent words are given more significance.

This makes the redundant words less important and the unique words are made more useful.