Twitter-Stream-Processing-Pipeline

A scalable and real-time data pipeline for processing, analyzing, and visualizing Twitter data.

Overview

The Tweet Analysis Pipeline is designed to process live Twitter streams for sentiment analysis, trending hashtag identification, and data visualization.

Problem: Extract meaningful insights from the vast amount of Twitter data in real time.
Target Audience: Data scientists, social media analysts, and researchers.
How It Works: Tweets are ingested using Kafka, analyzed with Scala ML models, and stored in Elasticsearch for visualization in Kibana.

Features

Simulation of real-time tweet ingestion using Kafka.
Sentiment analysis with pre-trained ML models.
Interactive dashboards in Kibana for data visualization.

Installation

Prerequisites

Java 1.8.0_202
Kafka 2-12-3.8.1
Scala 2.12.17
Spark 3.5.3
Python 3.11.12
Elasticsearch 8.16.1
Kibana 8.16.1

Steps

Clone the repository:

git clone https://github.com/amjadAwad95/Twitter-Stream-Processing-Pipeline.git
cd Twitter-Stream-Processing-Pipeline

Set up the Python environment:

In Elastic-Search-Index/src create file .env

elastic_name=YOUR_ELASTIC_NAME
password=YOUR_ELASTIC_PASSWORD
ca_certs=YOUR_ELASTIC_http_ca.crt
host=YOUR_ELASTIC_HOST_IP

pip install elasticsearch dotenv

Start Kafka

Open the root directory where you installed kafka, then run the following commands:

In linux and mac

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

In windows

bin\windows\zookeeper-server-start.bat config\zookeeper.properties
bin\windows\kafka-server-start.bat config\server.properties

Start elastic search and kibana

Open the root directory where you installed elasticsearch, then run the following command:

In linux and mac

bin/elasticsearch

In windows

bin\elasticsearch.bat

Then open the root directory where you installed kibana, then run the following command:

In linux and mac

bin/kibana

In windows

bin\kibana.bat

Run the pipeline:

To create an elastic search schema

cd Elastic-Search-Index\src
python main.py

To Start the Producer go to the scala folder named Kafka Producer and run the Main.scala
To Start the consumer go to the scala folder Kafka Consumer and run the Main.scala

To visualize data go to http://localhost:5601 then follow these steps on your local Kibana instance:

Go to the Management tab.
Navigate to Saved Objects.
Click the Import button.
Upload the .ndjson file, you can find it in KibanaDashboard directory.

Technologies Used

Stream Processing: Kafka
Data Storage: Elasticsearch
Visualization: Kibana
Machine Learning: Stanford corenlp

Pipeline

Pipeline Overview

graph LR
A[Data Ingestion] -->|Tweets| B[Kafka Topic]
B --> C[Data Processing]
C --> D[Sentiment Analysis]
D --> F[Elasticsearch Storage]
F --> G[Visualization in Kibana]

Pipeline Stages

Data Ingestion
- Tweets are fetched in real-time using Kafka.
- Data is pushed to a Kafka topic named tweets-stream.
Data Processing
- Extract the important features from the data.
- Scala scripts perform sentiment analysis.
- Machine learning models classify tweet text into sentiment categories.
Data Storage
- Processed data is stored in Elasticsearch for querying.
Data Visualization
- Kibana dashboards visualize sentiment trends, top hashtags, and tweet statistics.

Contributing

We welcome contributions! Follow these steps:

Fork this repository.
Create a new branch:

git checkout -b feature-name

Make your changes and commit them:

git commit -m "Your commit"

Push to the branch:

git push origin feature-name

Open a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.idea		.idea
Data-Processing		Data-Processing
Elastic-Search-Index		Elastic-Search-Index
KafkaProducer		KafkaProducer
KibanaDashboard		KibanaDashboard
Presentation		Presentation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter-Stream-Processing-Pipeline

Table of Contents

Overview

Features

Installation

Prerequisites

Steps

Technologies Used

Pipeline

Pipeline Overview

Pipeline Stages

Contributing

About

Releases

Packages

Contributors 4

Languages

amjadAwad95/Twitter-Stream-Processing-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Twitter-Stream-Processing-Pipeline

Table of Contents

Overview

Features

Installation

Prerequisites

Steps

Technologies Used

Pipeline

Pipeline Overview

Pipeline Stages

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages