Skip to content

Text pre-processing, topic modelling and text classification on the reviews from Yelp dataset.

Notifications You must be signed in to change notification settings

juliabuixuan/Text-Mining

Repository files navigation

Text-Mining

For this project a dataset of 6,990,280 reviews and 150,346 businesses was used. It was obtained from Yelp, an online platform for sharing experiences and opinions about local businesses. The project employs topic modeling and classification techniques to extract the main topics and contents of the reviews, as well as classify them into different categories. Preprocessing steps such as normalization, stopwords removal, tokenization and lemmatization were performed on the text. Latent Dirichlet Allocation (LDA) was used for topic modeling, and two different text representations (TF-IDF and Doc2Vec) were tested for the classification task. The results showed that topic modeling identified 8 food-related topics, and the classification model achieved a precision of 93% and a recall of 74% on a multilabel multiclass problem with 44 different classes.

About

Text pre-processing, topic modelling and text classification on the reviews from Yelp dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published