For this project a dataset of 6,990,280 reviews and 150,346 businesses was used. It was obtained from Yelp, an online platform for sharing experiences and opinions about local businesses. The project employs topic modeling and classification techniques to extract the main topics and contents of the reviews, as well as classify them into different categories. Preprocessing steps such as normalization, stopwords removal, tokenization and lemmatization were performed on the text. Latent Dirichlet Allocation (LDA) was used for topic modeling, and two different text representations (TF-IDF and Doc2Vec) were tested for the classification task. The results showed that topic modeling identified 8 food-related topics, and the classification model achieved a precision of 93% and a recall of 74% on a multilabel multiclass problem with 44 different classes.
-
Notifications
You must be signed in to change notification settings - Fork 0
juliabuixuan/Text-Mining
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Text pre-processing, topic modelling and text classification on the reviews from Yelp dataset.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published