Replies: 2 comments 2 replies
-
For now I have a silly workaround, but it still visits a website even if it ignores the content. class BodyScraperPlugin {
constructor(body) {
this.body = body;
}
apply(registerAction) {
registerAction('afterResponse', async ({response}) => {
if (response.headers['content-type'].includes('text/html')) {
console.log("html AFTER REPONSE " + response.url)
return this.body
}
return response.body
});
}
}
const options = {
urls: ['https://nodejs.org'], // dummy since it does not use this content
directory: saveDir,
plugins: [ new BodyScraperPlugin(article.content) ]
};
var result = await scrape(options); |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hello @jonocodes
I would recommend to run a http server in the directory with needed files. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It would be nice if this could be used 'offline'. So I could scrape something like 'file://home/me/site/mypage.html'
Or perhaps a way to feed the scraper raw html instead of a url.
Beta Was this translation helpful? Give feedback.
All reactions