Skip to content

Organizer for job searching across multiple sites. Fetch offers, measure recruitment progress, collect info about potential employer

Notifications You must be signed in to change notification settings

Ne0bliviscaris/Job-Search-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Job-Search-Tool

Organizer for job searching across multiple sites. Fetch offers, measure recruitment progress, collect info about potential employer

Demonstration:

Job-Search-Tool-Demo.mp4



THIS BRANCH:

TODO:

Data processing

Location fetching adjustments

  • If site puts selected location on first place - use only the first location
  • Else - fetch html with location block hovered to show extract list of all locations

Get proper search links

Raw data extraction improvements:

  • Location extraction improvements - making sure that either a list or the proper location is extracted

Synchronization ETL module:

  • Use tag and location dictionaries to unify variable elements

Records visualization:

  • Prepare record template - fetch one record from CSV, fill specific fields
  • Initially scrolled up, showing minimal info. Click, to show full record details
Cloud related issues

Session and data access:

  • Introduce session for admin user
  • Columns not for public info available only for admin
  • Saving data/files available only for admin

Move to docker container and host it remotely

  • Run updater on a scheduler



Ideas for the future:

Ideas for the future
  • Scrape each interesting offer (3+ stars)
  • Fetch and unify requirements, additional info etc
  • Build RAG using CV to analyze each offer in relation to skills
  • Use RAG with scraped offers to generate unified offer template



Changelog:

Click to see the details

01.12.2024

  • Fixed synchronization module misdetecting changed records

29.11.2024

  • Fully migrated to SQL database
  • Dropped using CSV files
  • Introduced settings file

20.11.2024

  • Popup and terminal report if update is needed
  • Prevent crashes if update file is missing while search link is active

18.11.2024

  • Report points of failure while scraping
  • Prevent app crashes caused by missing data

08.10.2024

  • Moved Selenium Chromedriver to Docker container
  • Properly extracting link to multi-location offers from Pracuj.pl (remote offers only)
  • Created framework for additional actions upon scraping website

04.10.2024

  • Updated download links
  • Minor performance and data processing tweaks

25.09.2024

  • Refactoring
  • Minor tweaks and bugfixes
  • Synchronization tab shows only changed records

20.09.2024

  • Synchronization module improvements
  • Forcing file structure for synchronization

19.09.2024

  • Working sync module with archive

16.09.2024

  • Improvement in extracting job location. Added separate field for remote job status
  • Properly extracting salary details (currency etc)
  • Fixed logo extraction from Nofluffjobs
  • Storing job tags as a string

14.09.2024

  • Introduced Streamlit

11.09.2024

  • Integrated JustJoinIT.pl site
  • Integrated Solid.jobs site
  • Integrated it.pracuj.pl site

10.09.2024

  • Integrated Rocketjobs.pl site
  • Integrated Bulldogjob.pl site
  • Minor improvements to handling data extraction

09.09.2024

  • Massively reduced update time complexity by reusing one webdriver

06.09.2024

  • Moved data extraction to containers: Instead of only pointing containers, functions now handle data extraction. This greatly improves scaleability for the project
  • Big improvements to code clarity
  • Solved theprotocol fetching inconsistencies by setting fixed chromedriver window size (not displayed anyway) The point of failure was rendering site in mobile version by default

05.09.2024

  • Now salary extraction properly handles various notations

04.09.2024

  • Moved to Selenium scraping. This provides better results than requests.
  • Introduced file handling. Now data is extracted from saved files, resulting in improved performance. Update function scrapes search links to their respective file.
  • Search links are now stored in a dictionary with this structure: {website_tag1-tag2-tag3 : link} This enables using multiple links from same website.

03.09.2024

  • Temporarily dropped Streamlit and Selenium to work on basics.

27.08.2024

  • Moved to Streamlit
  • Added function to turn records into dataframe

26.08.2024

  • Introduced JobRecord class to handle HTML records

About

Organizer for job searching across multiple sites. Fetch offers, measure recruitment progress, collect info about potential employer

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages