Surprise backup of knowhow.vdv.de PDFs

Interesting bits

Baseurl for documents: https://knowhow.vdv.de/documents/${INT}/

example cURL to get the pdf (copied from Firefox dev tools):

curl 'https://knowhow.vdv.de/wp-admin/admin-post.php?action=serve_pdf&token=62637013c4106' \
  -H 'User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:99.0) Gecko/20100101 Firefox/99.0' \
  -H 'Accept: */*' \
  -H 'Accept-Language: en-US,en;q=0.5' \
  -H 'Accept-Encoding: gzip, deflate, br' \
  -H 'Referer: https://knowhow.vdv.de/wp-content/themes/kh/pdf.js/web/viewer.html?file=https%3A%2F%2Fknowhow.vdv.de%2Fwp-admin%2Fadmin-post.php%3Faction%3Dserve_pdf%26token%3D62637013c4106' \
  -H 'DNT: 1' \
  -H 'Connection: keep-alive' \
  -H 'Cookie: PHPSESSID=cttcgu6nb06ou7f5mb00oqs1d9' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Site: same-origin' \

In practice can be cut down to:

curl 'https://knowhow.vdv.de/wp-admin/admin-post.php?action=serve_pdf&token=62637013c4106'

Which still works on a different machine and from a different IP

Crawling

curl 'https://knowhow.vdv.de/wp-json/wp/v2/documents?reduced=true&frontend&per_page=1000&page=1&filter%5Bnopaging%5D=true'

Returns all the documents metadata. You can find it in documents.json

Token generation

curl https://knowhow.vdv.de/wp-json/kh-tm/v1/document/${DOC_ID}?variant=lang-${LANG}_fulltext

Returns JSON with status code and if 200, a token.

Import into Zotero

This relies a lot on the code of this github gist. It uses the debug-bridge plugin to import all the files with metadata into zotero. Execute ./zotero-import.py after adjusting the password variable.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
documents.json		documents.json
knowhow-scraper.py		knowhow-scraper.py
zotero-import.py		zotero-import.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Surprise backup of knowhow.vdv.de PDFs

Interesting bits

Crawling

Token generation

Import into Zotero

About

Releases

Packages

Contributors 2

Languages

tlm-solutions/scrape-knowhow

Folders and files

Latest commit

History

Repository files navigation

Surprise backup of knowhow.vdv.de PDFs

Interesting bits

Crawling

Token generation

Import into Zotero

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages