Tools to gather pricing data from the Internet
- Currently contains scrapers to hit amazon & costco, and consumes from the walmart api for pricing information
- The project contains a food_list.txt to search for 50 different kinds of foods by default, but can search for anything you put into the file separated by line breaks
- The scrapers & api recorder save info into a json file
- If the data looks good, a user can populate their mongodb collection with the information gathered
Python 3.6.5
Walmart API Key
MLabs Mongodb database (for production only)
install python 3 & pip
get into the project:
git clone
cd benchscraper
set up
python -m venv venv
. ./venv/bin/activate
pip install -r requirements.txt
to run api recorder, both scrapers and save to local database
-> to save to production database, set PROD=<your production mongodb uri>
to run scrapers individually:
from within services directory:
scrapy crawl costco_spider
scrapy crawl amazon_spider
to populate db from json records:
from root directory
python db/
to test scrapers generate urls & parse return as expected:
from within test directory:
python -m unittest