Capture screenshots of websites as a (host it yourself) API. This project is a wrapper around this library: https://github.com/sindresorhus/capture-website
- Pull the image:
docker pull robvanderleek/capture-website-api
- Start the container:
docker run -it -p 8080:8080 robvanderleek/capture-website-api
- Make screenshot test request:
curl 'localhost:8080/capture?url=https://news.ycombinator.com/' -o screenshot.png
- Clone the repo:
git clone [email protected]:robvanderleek/capture-website-api.git
- Go to the
standalone
directory:
cd capture-website-api/standalone
- Build the image:
docker build -t cwa .
- Start the container:
docker run -it -p 8080:8080 cwa
- Do screenshot test request:
curl 'localhost:8080/capture?url=https://www.youtube.com' -o screenshot.png
Run in a terminal:
- Clone the repo:
git clone [email protected]:robvanderleek/capture-website-api.git
- Go to the
standalone
directory:
cd capture-website-api/standalone
- Install dependencies:
yarn
- Start the server:
yarn start
- Do screenshot test request:
curl 'localhost:8080/capture?url=https://www.reddit.com' -o screenshot.png
Deploy and run on Vercel:
- Clone the repo:
git clone [email protected]:robvanderleek/capture-website-api.git && cd capture-website-api/serverless
- Deploy to Vercel:
vercel deploy
- Get site URL:
vercel ls
- Make screenshot test request:
curl "${SITE_URL}/api/capture?url=https://www.linkedin.com" -o screenshot.png
Call the /capture
endpoint and pass the site URL using the query parameters url
:
curl 'https://capture-website-api.vercel.app/api/capture?url=http://gmail.com' -o screenshot.png
Simple as that.
Application configuration options can be set as environment veriables or in
a .env
file in the root folder. There's an example .env
file in the codebase: .env.example
Supported options are:
Name | Descrition | Default |
---|---|---|
TIMEOUT | Timeout in seconds for loading a web page | 20 |
CONCURRENCY | Number of captures that run in parallel, more memory allows more captures to run in parallel | 2 |
MAX_QUEUE_LENGTH | Requests that can't be handled directly are queued until the queue is full | 6 |
SHOW_RESULTS | Enable web endpoint to show latest capture | false |
SECRET | Secret string to prevent undesired usage on public endpoints | "" |
Most of the configuration options from the wrapped capture-website
library are supported using query parameters.
For example, to capture a site with a 650x350 viewport, no default background and animations disabled use:
curl 'https://capture-website-api.vercel.app/api/capture?url=http://amazon.com&width=650&height=350&scaleFactor=1&defaultBackground=false&disableAnimations=true&wait_before_screenshot_ms=300' -o screenshot.png
See https://github.com/sindresorhus/capture-website for a full list of options.
You may require to wait for async requests or animations to finish before capturing the screenshot. There are two ways of doing this, both specified in the query parameters:
wait_before_screenshot_ms
(in ms, defaults to300
) will wait before capturing a screenshot.- For standalone:
capture-website
library'sdelay
(in seconds)
Sometimes the capture-website
library has problems capturing sites. You can try to
capture these sites with plain Puppeteer by supplying the query parameter plainPuppeteer=true
This app looks at two environment variables:
SHOW_RESULTS
: iftrue
the latest capture result can be viewed in the browser by browsing the base urlSECRET
: when set all capture requests need to contain a query parametersecret
whose value matches the value of this environment variable
If you have suggestions for improvements, or want to report a bug, open an issue!
ISC © 2019 Rob van der Leek [email protected] (https://twitter.com/robvanderleek)