Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 740 Bytes

cli.md

File metadata and controls

22 lines (13 loc) · 740 Bytes

Command Line Tools

This list contains network and data processing tools with command line interface written in any programming langauge.

Contents

Network

EMPTY CONTENT

Web Scraping

  • pipet - A swiss-army tool for scraping and extracting data using selectors, JavaScript and unix pipes
  • trafilatura - Gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

URLs

  • courlan - Clean, filter and sample URLs to optimize data collection: Deduplication, spam, content and language filters