This repository contains a custom ELT project that involves utilization of Docker and postgresql alongside with dbt.
Extract - Data from a source database is extracted and is converted into a dump file.
Load - The file is then loaded into a destination database using a Python script.
Transform - The database can be transformed accordingly to our use.
- source_db_init/init.sql: This SQL script initializes the source database. In this case, it creates the tables users, films, film_category, actors and film_actors.
- elt_script/elt_script.py: This Python script performs the ELT process. It waits for the source database and then dumps into a SQL file and loads the data into the destination database.
- elt_script/Dockerfile: This file installs the PostgreSQL cilent, sets up the Python environment and copies the elt script and sets it as the default command.
- docker-compose.yaml: This file has the configuration to create and manage multiple containers and has the following services:
source_postgres
: Source databasedestination_postgres
: Destination databaseelt_script
: The service that runs the python script.
Make sure that you have installed Docker desktop or Docker compose.
-
Clone the repository:
git clone https://github.com/Ashrithiiitdm/Data-pipeline.git cd Data-pipeline/elt
-
Running the containers:
docker compose up
-
After running this you can check the database by:
docker exec -it <name_of_destinationdb_container> psql -U postgres