Skip to content

amjadAwad95/Twitter-Stream-Processing-Pipeline

Repository files navigation

Twitter-Stream-Processing-Pipeline

A scalable and real-time data pipeline for processing, analyzing, and visualizing Twitter data.

Table of Contents

Overview

The Tweet Analysis Pipeline is designed to process live Twitter streams for sentiment analysis, trending hashtag identification, and data visualization.

  • Problem: Extract meaningful insights from the vast amount of Twitter data in real time.
  • Target Audience: Data scientists, social media analysts, and researchers.
  • How It Works: Tweets are ingested using Kafka, analyzed with Scala ML models, and stored in Elasticsearch for visualization in Kibana.

Features

  • Simulation of real-time tweet ingestion using Kafka.
  • Sentiment analysis with pre-trained ML models.
  • Interactive dashboards in Kibana for data visualization.

Installation

Prerequisites

  • Java 1.8.0_202
  • Kafka 2-12-3.8.1
  • Scala 2.12.17
  • Spark 3.5.3
  • Python 3.11.12
  • Elasticsearch 8.16.1
  • Kibana 8.16.1

Steps

  1. Clone the repository:
git clone https://github.com/amjadAwad95/Twitter-Stream-Processing-Pipeline.git
cd Twitter-Stream-Processing-Pipeline
  1. Set up the Python environment:

In Elastic-Search-Index/src create file .env

elastic_name=YOUR_ELASTIC_NAME
password=YOUR_ELASTIC_PASSWORD
ca_certs=YOUR_ELASTIC_http_ca.crt
host=YOUR_ELASTIC_HOST_IP
pip install elasticsearch dotenv
  1. Start Kafka

Open the root directory where you installed kafka, then run the following commands:

  • In linux and mac
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
  • In windows
bin\windows\zookeeper-server-start.bat config\zookeeper.properties
bin\windows\kafka-server-start.bat config\server.properties
  1. Start elastic search and kibana

Open the root directory where you installed elasticsearch, then run the following command:

  • In linux and mac
bin/elasticsearch
  • In windows
bin\elasticsearch.bat

Then open the root directory where you installed kibana, then run the following command:

  • In linux and mac
bin/kibana
  • In windows
bin\kibana.bat
  1. Run the pipeline:

To create an elastic search schema

cd Elastic-Search-Index\src
python main.py
  • To Start the Producer go to the scala folder named Kafka Producer and run the Main.scala
  • To Start the consumer go to the scala folder Kafka Consumer and run the Main.scala

To visualize data go to http://localhost:5601 then follow these steps on your local Kibana instance:

  1. Go to the Management tab.
  2. Navigate to Saved Objects.
  3. Click the Import button.
  4. Upload the .ndjson file, you can find it in KibanaDashboard directory.

Technologies Used

  • Stream Processing: Kafka
  • Data Storage: Elasticsearch
  • Visualization: Kibana
  • Machine Learning: Stanford corenlp

Pipeline

Pipeline Overview

graph LR
A[Data Ingestion] -->|Tweets| B[Kafka Topic]
B --> C[Data Processing]
C --> D[Sentiment Analysis]
D --> F[Elasticsearch Storage]
F --> G[Visualization in Kibana]
Loading

Pipeline Stages

  1. Data Ingestion

    • Tweets are fetched in real-time using Kafka.
    • Data is pushed to a Kafka topic named tweets-stream.
  2. Data Processing

    • Extract the important features from the data.
    • Scala scripts perform sentiment analysis.
    • Machine learning models classify tweet text into sentiment categories.
  3. Data Storage

    • Processed data is stored in Elasticsearch for querying.
  4. Data Visualization

    • Kibana dashboards visualize sentiment trends, top hashtags, and tweet statistics.

Contributing

We welcome contributions! Follow these steps:

  1. Fork this repository.
  2. Create a new branch:
git checkout -b feature-name
  1. Make your changes and commit them:
git commit -m "Your commit"
  1. Push to the branch:
git push origin feature-name
  1. Open a pull request.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •