Amazon-Reviews-Analysis

Class project

Tasks:

Explore dataset by finding correlation, calculating means, and sorting values of helpfulness scores and review ratings using AWK and SED
Change review ratings into binary numbers based on median, recompute correlation using datamash, and plot correlations via Gnuplot
Process text in review bodies such as remove stop words, lemmatize words, and so on using AWK and SED
Compare common words between cleaned reviews and clean tweet posts using shell script to find top common words in helpful and unhelpful reviews
Write shell script to train predicting model with Weka on files that contain both Amazon reviews and Twitter tweets, this gives better than model with only AMZ reviews

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Correlation_and_Top10_Review_AMZ2		Correlation_and_Top10_Review_AMZ2
README.md		README.md
clean_data_and_common_words_AMZ3		clean_data_and_common_words_AMZ3
explore_data_AMZ1		explore_data_AMZ1
text_binary_classify_AMZ4.sh		text_binary_classify_AMZ4.sh
topcust_AMZ2.png		topcust_AMZ2.png
topprod_AMZ2.png		topprod_AMZ2.png
training_data_Weka_AMZ4		training_data_Weka_AMZ4