Skip to content

nhungL/Amazon-Reviews-Analysis

Repository files navigation

Amazon-Reviews-Analysis

Class project

Dataset: amazon_us_reviews (https://huggingface.co/datasets/amazon_us_reviews)

Tasks:

  • Explore dataset by finding correlation, calculating means, and sorting values of helpfulness scores and review ratings using AWK and SED
  • Change review ratings into binary numbers based on median, recompute correlation using datamash, and plot correlations via Gnuplot
  • Process text in review bodies such as remove stop words, lemmatize words, and so on using AWK and SED
  • Compare common words between cleaned reviews and clean tweet posts using shell script to find top common words in helpful and unhelpful reviews
  • Write shell script to train predicting model with Weka on files that contain both Amazon reviews and Twitter tweets, this gives better than model with only AMZ reviews

Tools:

  • Bash, awk, sed, datamash, Gnulot, Weka

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages