Skip to content

Getting daily aviation data, clean it and make it available for analysis

Notifications You must be signed in to change notification settings

Ouuumar/pyspark-aviation-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySpark Data Piepeline - Aviation data

Json data API : https://aviationstack.com/documentation

  • Create an account and get your API key

What it does for the moment ?

  • Get daily data

  • Cleaning data (handle nulls, bad schema, naming etc.)

  • Save it to parquet for further analysis

Try it :

  • Create your venv :
   python3 -m venv venv
  • Activate your venv :
   source venv/bin/activate
  • Install requirements :
   pip install -r ./utils/requirements.txt
  • Add your API key to your .env
  • Finally run the main.py

TODO :

  • Airflow scheduler : schedule daily ETL
  • Database : use bronze, silver and gold database
  • Processing : Aggregate data to create usefull datamarts in gold table
  • Enhance logging and code modularity

About

Getting daily aviation data, clean it and make it available for analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published