Skip to content

NLP project involving several stages of data cleaning, organization, exploration, sentiment analysis, topic modeling, and text generation.

Notifications You must be signed in to change notification settings

aarushijain-24/Natural-Language-Processing-with-Disaster-Tweets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Natural Language Processing with Disaster Tweets

About Dataset

Each sample in the dataset has the following information:

id - a unique identifier for each tweet

text - the text of the tweet

location - the location the tweet was sent from (may be blank)

keyword - a particular keyword from the tweet (may be blank)

target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)

About Notebook:

  1. Cleaning the data
    i) Making text all lower case.
    ii) Removeing punctuation.
    iii) Removing numerical values.
    iv) Removing common non-sensical text.
    v) Tokenizing text.
    vi) Removing stop words.
  2. Organizing the data
    i) Generating corpus.
    ii) Generating Document-Term Matrix(dtm).
  3. Exploring the data
    i) Creating word cloud for most common words.
    ii) Finding the count of top 30 words associated with top 10 keywords.
    iii) Adding most common words to stopword list.
    iv) Updating document-matrix with new stop_words.
    v) Creating word cloud for top 5 keywords.
    vi) Finding the number of unique words associated with each unique keyword.
    vii) Check the profanity by analysing the common bad words.
  4. Sentiment Analysis
    i) Find the polarity and subjectivity of each tweet.
    ii) Visualizing the results through scatter plot.
    iii) Split each tweet into 10 parts and finding their polarity.
    iv) Visualizing the results through subplots.
  5. Topic Modeling
    i) Putting dtm into new gensim format.
    ii) Generating dictionary of the all terms and their respective location in dtm.
    iii) Applying Latent Dirichlet Allocation (LDA) for all text.
    iv) Applying Latent Dirichlet Allocation (LDA) for nouns only.
    v) Applying Latent Dirichlet Allocation (LDA) for nouns and adjectives.
  6. Text Generation
    i) Building a Markov Chain Function.
    ii) Creating the dictionary of text data.
    iii) Creating a Text Generator Function.

About

NLP project involving several stages of data cleaning, organization, exploration, sentiment analysis, topic modeling, and text generation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published