Disaster Tweet Prediction on Cainvas

Artificial Intelligence has taken the world by storm .It has passed major checkpoints in the history of mankind. Mastering the classical game of Go and Chess , Beating the Top Poker Players , assisting Scientist in Large Hadron Collider to many more fields . Of course each task lies in their respective field .

NLP is field of AI that covers text messages , human language ,etc . The ability of a Computer to make sense of the words is truly astounding and fascinating .You are already surrounded by the applications of NLP that you may not be even aware of , like Auto Correction in mails , auto filling Text message , making recommendations based on your previous searches ,etc.

Photo by Piotr Wojtczak on Dribbble

CONTENT

  • Problem Statement
  • Library and Concepts
  • Methodology
  • Modelling
  • Kernel Link

PROBLEM STATEMENT

Twitter has emerged as a platform that allows people to connect and share information with a huge crowd .This gives the user the power to spread the word of an emergency they may be observing in real time . Making a machine to be able to make these decisions with accuracy is the new approach to this situation which we will be tackling through this story .

Library and Concepts

NLTK (Text Proccessing )

Keras

LSTM

Sklearn

Methodology

We will be converting our text messages into Vector format as machines understand numbers . First we clean the data by removing the basic words like articles ,prepositions ,etc which are nearly present in each sentence .Then we also remove any hyperlinks , Emojis . Next comes Lemmatization and Stemming . Stemming and lemmatization is used to convert different forms of word into a simple one . Like run,running , run’s ,ran are nearly the same as run ,same goes for wait , waiting ,waited ,etc .We convert them to same word because although to us it may look simple enough but for machine it is not so trivial .

LSTM is used to capture the semantic meaning of the Sentence . ‘That was a really great movie ‘ and ‘Really that was a great movie ?‘ both the sentences have the same words but the order makes one a positive remark and another one negative . LSTM come here to help us capture this feature . LSTM are a special king of RNN (Recurrent Neural Network).RNN are made to take sequence of inputs , they have a memory(Hidden state) that stores the results from previous sequence of input allowing them to find patterns in Sequential data .We use Embedding layers to make our vectors dense from sparse helping us to save memory and capture important features at the same time .

Modelling

Simple LSTM layer with 3 cells with a dense layer for making predictions if the tweet is about a Disaster or not . The Model was simply too fast to train and just took 5 epochs with 55 second per epoch totalling a mere 5 mins.

https://gist.github.com/Devanshchowdhury2212/35ccaadb9311087c110d5f59f3b36789

Evaluation

We achieve 88 % accuracy on our test data.

https://gist.github.com/Devanshchowdhury2212/08f6bb908d5d9f45af9349594ba55876

https://gist.github.com/Devanshchowdhury2212/62db5375d97758d25aa5cc7637326d20

https://gist.github.com/Devanshchowdhury2212/101a02735232048825721df89c99dd2e

Link to Notebook on Cainvas — https://cainvas.ai-tech.systems/use-cases/disaster-tweet-prediction-app/

Credit: Devansh Chowdhury