Job-Search-Tool

Organizer for job searching across multiple sites. Fetch offers, measure recruitment progress, collect info about potential employer

Demonstration:

Job-Search-Tool-Demo.mp4

THIS BRANCH:

TODO:

Data processing

Location fetching adjustments

If site puts selected location on first place - use only the first location
Else - fetch html with location block hovered to show extract list of all locations

Get proper search links

Raw data extraction improvements:

Location extraction improvements - making sure that either a list or the proper location is extracted

Synchronization ETL module:

Use tag and location dictionaries to unify variable elements

Records visualization:

Prepare record template - fetch one record from CSV, fill specific fields
Initially scrolled up, showing minimal info. Click, to show full record details

Cloud related issues

Session and data access:

Introduce session for admin user
Columns not for public info available only for admin
Saving data/files available only for admin

Move to docker container and host it remotely

Run updater on a scheduler

Ideas for the future:

Ideas for the future

Scrape each interesting offer (3+ stars)
Fetch and unify requirements, additional info etc
Build RAG using CV to analyze each offer in relation to skills
Use RAG with scraped offers to generate unified offer template

Changelog:

Click to see the details

01.12.2024

Fixed synchronization module misdetecting changed records

29.11.2024

Fully migrated to SQL database
Dropped using CSV files
Introduced settings file

20.11.2024

Popup and terminal report if update is needed
Prevent crashes if update file is missing while search link is active

18.11.2024

Report points of failure while scraping
Prevent app crashes caused by missing data

08.10.2024

Moved Selenium Chromedriver to Docker container
Properly extracting link to multi-location offers from Pracuj.pl (remote offers only)
Created framework for additional actions upon scraping website

04.10.2024

Updated download links
Minor performance and data processing tweaks

25.09.2024

Refactoring
Minor tweaks and bugfixes
Synchronization tab shows only changed records

20.09.2024

Synchronization module improvements
Forcing file structure for synchronization

19.09.2024

Working sync module with archive

16.09.2024

Improvement in extracting job location. Added separate field for remote job status
Properly extracting salary details (currency etc)
Fixed logo extraction from Nofluffjobs
Storing job tags as a string

14.09.2024

Introduced Streamlit

11.09.2024

Integrated JustJoinIT.pl site
Integrated Solid.jobs site
Integrated it.pracuj.pl site

10.09.2024

Integrated Rocketjobs.pl site
Integrated Bulldogjob.pl site
Minor improvements to handling data extraction

09.09.2024

Massively reduced update time complexity by reusing one webdriver

06.09.2024

Moved data extraction to containers: Instead of only pointing containers, functions now handle data extraction. This greatly improves scaleability for the project
Big improvements to code clarity
Solved theprotocol fetching inconsistencies by setting fixed chromedriver window size (not displayed anyway) The point of failure was rendering site in mobile version by default

05.09.2024

Now salary extraction properly handles various notations

04.09.2024

Moved to Selenium scraping. This provides better results than requests.
Introduced file handling. Now data is extracted from saved files, resulting in improved performance. Update function scrapes search links to their respective file.
Search links are now stored in a dictionary with this structure: {website_tag1-tag2-tag3 : link} This enables using multiple links from same website.

03.09.2024

Temporarily dropped Streamlit and Selenium to work on basics.

27.08.2024

Moved to Streamlit
Added function to turn records into dataframe

26.08.2024

Introduced JobRecord class to handle HTML records