Skip to content

Latest commit

 

History

History
36 lines (21 loc) · 2.23 KB

README.md

File metadata and controls

36 lines (21 loc) · 2.23 KB

The Bad Science Corpus

Author: Eric Nichols [email protected], Tohoku University, Japan

Project Goal

This is a pre-release of the Bad Science Corpus, an automatically-constructed collection of posts on scientifically-contraversial topics gathered from blogs by scientists and skeptics. The goal of this project is to produce a self-expanding source of controversial claims and their refutations by knowledgable parties to support research on analysis rhetorical and logical argumentation structures. See reference [1] for more details.

Resources

  • RSS feed: a curated collection of RSS feeds for several hundred science/skeptic blogs aggregated in a Google Reader account. Stored in data/xml.
  • Blog posts: Web archives of all blog posts from July 2009 to November 2012. Stored in data/warc.
  • Discussions: Blog posts that refer to the same external link are grouped into discussions representing a single topic to aid extraction of controversial opinions and rebuttals. Stored in data/disc

Utilities

There are Python scripts for accessing the Bad Science RSS feeds and updating the corpus as new posts are added.

  • bin/gr_rss.py: downloads RSS feeds as XML files
  • bin/gr_links.py: extracts all links from RSS feed XML
  • bin/gr_discussions.py: groups all links from feeds into discussions that have common external link

Dependencies

See dependencies.md and python_deps.txt for dependencies.

License

Please cite reference [1] if you use the Bad Science Corpus. Otherwise, it may be used without restriction. All blog posts and other downloaded webpages are copyright their original authors and are being shared under the fair use. All code is copyright Eric Nichols and may be used, modified, and redistributed under the terms of the GNU General Public License, version 2.

References

[1] Constructing a Scientific Blog Corpus for Information Credibility Analysis. Eric Nichols, Koji Murakami, Kentaro Inui, and Yuji Matsumoto. Proceedings of Pacling 2009. Hokkaido, Japan. 2009.