Political Bias Classification of German Media

This project is the first attempt to do Political Bias classification of German news.

Check out paper: Fine-grained Classification of Political Bias in German News: A Data Set and Initial Experiments

We crawled out data from various German news sites using news-please library. After that, we manually cleaned the data and labeled it using Medienkompass. Then the dataset was preprocessed using HuggingFace NLP library.

Due to copyright issues, we can not publish the data, but we provided the list of URLs you can use to build this dataset on your own. To download all the data run:

NewsPlease.from_file('urls/urls.txt')

Then run the preprocessing script:

python preprocess.py -data_folder='path/to/your/downloaded/data'

We evaluated several classification models on the dataset, using Bag-of-Words, TF-IDF, and BERT features. For reproduction the former two, run BOW_baseline.ipynb and TFIDF_baseline.ipynb notebooks. To train BERT-based models you need to fine-tune HuggingFace implementation of German BERT.

python train.py -data_folder="data" model_folder="models/BERT" -batch_size=8 -num_epochs=2

After that run BERT_baseline.ipynb notebook.

Using our two based models for TF-IDF and BERT features, we implemented the demo system that can predict the political bias of a single arbitary text and generate the list of the words that pushes the system to make the decision. The models can be download from here. To use the system run:

python predict.py -file_path="text_sample.txt" -method="tfidf" -explain=False

or call in python:

from BiasPredictor import biasPredictor
predictor = biasPredictor("bert")
prediction = predictor.predict(text = "Ein politischer Text", explain=True)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
docs/imgs		docs/imgs
labels		labels
models		models
urls		urls
BERT_baseline.ipynb		BERT_baseline.ipynb
BOW_baseline.ipynb		BOW_baseline.ipynb
BiasPredictor.py		BiasPredictor.py
README.md		README.md
TFIDF_baseline.ipynb		TFIDF_baseline.ipynb
de_politik_news.py		de_politik_news.py
draw_corona_plot.ipynb		draw_corona_plot.ipynb
draw_plot.ipynb		draw_plot.ipynb
predict.py		predict.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test.py		test.py
text_sample.txt		text_sample.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Political Bias Classification of German Media

t-SNE on SVD of BOW representation of the dataset

Effect of Covid-19 on German news

About

Releases

Packages

Languages

axenov/politik-news

Folders and files

Latest commit

History

Repository files navigation

Political Bias Classification of German Media

t-SNE on SVD of BOW representation of the dataset

Effect of Covid-19 on German news

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages