Organizer for job searching across multiple sites. Fetch offers, measure recruitment progress, collect info about potential employer
Job-Search-Tool-Demo.mp4
Data processing
- If site puts selected location on first place - use only the first location
- Else - fetch html with location block hovered to show extract list of all locations
- Location extraction improvements - making sure that either a list or the proper location is extracted
- Use tag and location dictionaries to unify variable elements
- Prepare record template - fetch one record from CSV, fill specific fields
- Initially scrolled up, showing minimal info. Click, to show full record details
Cloud related issues
- Introduce session for admin user
- Columns not for public info available only for admin
- Saving data/files available only for admin
- Run updater on a scheduler
Ideas for the future
- Scrape each interesting offer (3+ stars)
- Fetch and unify requirements, additional info etc
- Build RAG using CV to analyze each offer in relation to skills
- Use RAG with scraped offers to generate unified offer template
Click to see the details
- Fixed synchronization module misdetecting changed records
- Fully migrated to SQL database
- Dropped using CSV files
- Introduced settings file
- Popup and terminal report if update is needed
- Prevent crashes if update file is missing while search link is active
- Report points of failure while scraping
- Prevent app crashes caused by missing data
- Moved Selenium Chromedriver to Docker container
- Properly extracting link to multi-location offers from Pracuj.pl (remote offers only)
- Created framework for additional actions upon scraping website
- Updated download links
- Minor performance and data processing tweaks
- Refactoring
- Minor tweaks and bugfixes
- Synchronization tab shows only changed records
- Synchronization module improvements
- Forcing file structure for synchronization
- Working sync module with archive
- Improvement in extracting job location. Added separate field for remote job status
- Properly extracting salary details (currency etc)
- Fixed logo extraction from Nofluffjobs
- Storing job tags as a string
- Introduced Streamlit
- Integrated JustJoinIT.pl site
- Integrated Solid.jobs site
- Integrated it.pracuj.pl site
- Integrated Rocketjobs.pl site
- Integrated Bulldogjob.pl site
- Minor improvements to handling data extraction
- Massively reduced update time complexity by reusing one webdriver
- Moved data extraction to containers: Instead of only pointing containers, functions now handle data extraction. This greatly improves scaleability for the project
- Big improvements to code clarity
- Solved theprotocol fetching inconsistencies by setting fixed chromedriver window size (not displayed anyway) The point of failure was rendering site in mobile version by default
- Now salary extraction properly handles various notations
- Moved to Selenium scraping. This provides better results than requests.
- Introduced file handling. Now data is extracted from saved files, resulting in improved performance. Update function scrapes search links to their respective file.
- Search links are now stored in a dictionary with this structure: {website_tag1-tag2-tag3 : link} This enables using multiple links from same website.
- Temporarily dropped Streamlit and Selenium to work on basics.
- Moved to Streamlit
- Added function to turn records into dataframe
- Introduced JobRecord class to handle HTML records