Skip to content

Latest commit

 

History

History
26 lines (16 loc) · 998 Bytes

README.md

File metadata and controls

26 lines (16 loc) · 998 Bytes

Text similarity checker using SBERT and NLTK

This tool uses the SBERT SentenceTransformer and NLTK Punkt Sentence Tokenizer to compare two texts for similarity.

It outputs a sorted list of high-scoring sentence pairs with scores for each, the most similar pair, and the similarity index (average similarity).

To use

Install the required dependencies:

pip install -r requirements.txt

Run the text_similarity_checker script from the command line:

python3 text_similarity_checker.py

Note

By default, the script compares Bob Dylan's rather infamous Nobel lecture to its alleged source. Add your own texts for comparison as Python strings to the project's root directory. You will then need to modify text_similarity_checker slightly to use them. See the script for details.

Status

In development.