A KNN model for classifying text reviews into binary 1 and -1 scores to indicate positive and negative sentiments with NLP text preprocessing, feature extraction, dimensionality reduction and k-fold cross validation.
- Preprocessing:
- Removing
- HTML tags
- URLs
- Email-ids
- Numbers
- Punctuation
- Accented text
- Tokenizing
- Removing stop words
- Lemmatizing
- Feature extraction - Countvectorizer
- Similarity/Distance metric used for knn - Cosine similarity
- Cross-validation - k-fold