Json data API : https://aviationstack.com/documentation
- Create an account and get your API key
-
Get daily data
-
Cleaning data (handle nulls, bad schema, naming etc.)
-
Save it to parquet for further analysis
Try it :
- Create your venv :
python3 -m venv venv
- Activate your venv :
source venv/bin/activate
- Install requirements :
pip install -r ./utils/requirements.txt
- Add your API key to your .env
- Finally run the main.py
- Airflow scheduler : schedule daily ETL
- Database : use bronze, silver and gold database
- Processing : Aggregate data to create usefull datamarts in gold table
- Enhance logging and code modularity