Skip to content

popolopo21/wyrm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Where You Read Me?

This project is in it's early stage, it is more like an idea for now.

Overview

"Where You Read Me?" is an innovative project designed to trace the path of news stories across different websites based on their embedding vectors. This unique approach involves comparing the vectorized representation of news articles to determine similarities and track how a particular news story evolves and gets represented in various forms on different news platforms. Users can set a similarity threshold to find articles similar to a chosen piece across sites. Additionally, the project features a dashboard to compare news based on later discovered features and their embeddings.

Project Structure

Database

  • EdgeDB: Used for storing and managing the data efficiently.

Scraper

  • Located in the scraper folder.
  • A Scrapy scraper that currently extracts news from three Hungarian websites: index.hu, telex.hu, mandiner.hu.
  • The scraping component is complete.

Postprocessor

  • Found in the postprocessor folder.
  • Processes the scraped news, extracts relevant information, and converts the title, description, and content text into vectors.
  • Utilizes bert-multilingual for vectorizing due to its high performance in embeddings and because its free. Longer texts are stored in chunks due to the small context window of the model.
  • The postprocessor is complete.

Frontend

  • Will be developed using SvelteKit.
  • Currently, only the basic skeleton of the frontend is completed.

Backend

  • Implemented using FastAPI.
  • The basic template of the backend is in place.

Installation

Not yet.

Usage

Not yet.

Future Plans

  • Enhance the vectorizing process by integrating OpenAI embeddings.
  • Complete the development of the SvelteKit frontend.
  • Further backend development to support advanced features.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published