Text-Mining

For this project a dataset of 6,990,280 reviews and 150,346 businesses was used. It was obtained from Yelp, an online platform for sharing experiences and opinions about local businesses. The project employs topic modeling and classification techniques to extract the main topics and contents of the reviews, as well as classify them into different categories. Preprocessing steps such as normalization, stopwords removal, tokenization and lemmatization were performed on the text. Latent Dirichlet Allocation (LDA) was used for topic modeling, and two different text representations (TF-IDF and Doc2Vec) were tested for the classification task. The results showed that topic modeling identified 8 food-related topics, and the classification model achieved a precision of 93% and a recall of 74% on a multilabel multiclass problem with 44 different classes.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Classification.ipynb		Classification.ipynb
Presentazione-2.pdf		Presentazione-2.pdf
README.md		README.md
Report.pdf		Report.pdf
Text_Preprocessing.ipynb		Text_Preprocessing.ipynb
Topic-Modeling.ipynb		Topic-Modeling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Mining

About

Releases

Packages

Languages

juliabuixuan/Text-Mining

Folders and files

Latest commit

History

Repository files navigation

Text-Mining

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages