Skip to content
This repository was archived by the owner on Oct 28, 2020. It is now read-only.

jfilter/scrape-gutenberg-de

Repository files navigation

Scrape Gutenberg DE

Scrape all Books from Projekt Gutenberg-DE. Usefull, i.e., if you need a large corpus of German text to do some serious language modeling.

Usage

git clone https://github.com/jfilter/scrape-gutenberg-de --depth 1
pipenv install
pipenv run scrapy runspider scrape.py -o data.json

License

MIT.