stream approach for the solution.
- Install docker
- Install python
- Install pip
- Install the packages from the requirements.txt file in producer directory
- clone the project
- open terminal
- run the follwing command in the cloned folder directory
docker-compose up
- cd into the producer directory
- execute the producer
python3 producer.py
- cd into the consumer directory 7.execute the consumer
python3 consumer.py
The docker-compose command basically lays the whole infrastructure required for the program to be executed:
For this approach the following components are required:
- Kafka
- Zookeeper
- Postgres DB
- Producer (Python program)
- Consumer (Python program)
- Firstly the docker spawns the containers for the zookeeper, Kafka and postgres DB (also creating the data tables required for the ETL).
- Then when the producer script gets executed it invokes the twitter API and pushes the data to the Kafka topic 'tweets_topic' for consumer to consume the API data.
- Then the consumer extracts the event data from kafka topic and after transforming the data ingests the relevant detials in the PostgresDB's Tables.
- This completes the ETL.
Execute the following commands on the terminal
docker container ps
view the DB containers CONTAINER ID and copy it
docker exec -it <copied CONTAINER ID> bash
this will route your terminal to the postgres DB instance
psql postgres://username:secret@localhost:5432/database
this command will connect to your Database
Please find attached the design diagram for the solution.