Skip to content

Latest commit

 

History

History
18 lines (16 loc) · 565 Bytes

README.md

File metadata and controls

18 lines (16 loc) · 565 Bytes

KNN-Text-review-classification

A KNN model for classifying text reviews into binary 1 and -1 scores to indicate positive and negative sentiments with NLP text preprocessing, feature extraction, dimensionality reduction and k-fold cross validation.

  1. Preprocessing:
  • Removing
    • HTML tags
    • URLs
    • Email-ids
    • Numbers
    • Punctuation
    • Accented text
  • Tokenizing
  • Removing stop words
  • Lemmatizing
  1. Feature extraction - Countvectorizer
  2. Similarity/Distance metric used for knn - Cosine similarity
  3. Cross-validation - k-fold