Adding support for local file scraping #569

jonocodes · 2024-07-27T16:23:36Z

jonocodes
Jul 27, 2024

It would be nice if this could be used 'offline'. So I could scrape something like 'file://home/me/site/mypage.html'
Or perhaps a way to feed the scraper raw html instead of a url.

jonocodes · 2024-07-28T00:52:39Z

jonocodes
Jul 28, 2024
Author

For now I have a silly workaround, but it still visits a website even if it ignores the content.

class BodyScraperPlugin {

  constructor(body) {
    this.body = body;
  }

	apply(registerAction) {
		registerAction('afterResponse', async ({response}) => {
      
      if (response.headers['content-type'].includes('text/html')) {
        console.log("html AFTER REPONSE " + response.url)

        return this.body
      }

      return response.body
    });
	}
}


    const options = {
      urls: ['https://nodejs.org'],	// dummy since it does not use this content
      directory: saveDir,
      plugins: [ new BodyScraperPlugin(article.content) ]
    };

    var result = await scrape(options);

0 replies

aivus · 2024-07-28T18:04:05Z

aivus
Jul 28, 2024
Maintainer

Hello @jonocodes

file protocol is not supported by got module we use for http requests which means we doesn't support it as well.

I would recommend to run a http server in the directory with needed files.
Simplest way to do it run of python -m SimpleHTTPServer 9000

2 replies

jonocodes Jul 28, 2024
Author

Yeah, behind a server this works. But I'm trying to do this in a self contained script against local files. Having to start and stop a server just to read a text file is too much overhead for my simple task.

aivus Aug 21, 2024
Maintainer

Module, as per name website, is not supposed to work with local files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for local file scraping #569

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Adding support for local file scraping #569

jonocodes Jul 27, 2024

Replies: 2 comments · 2 replies

jonocodes Jul 28, 2024 Author

aivus Jul 28, 2024 Maintainer

jonocodes Jul 28, 2024 Author

aivus Aug 21, 2024 Maintainer

jonocodes
Jul 27, 2024

Replies: 2 comments 2 replies

jonocodes
Jul 28, 2024
Author

aivus
Jul 28, 2024
Maintainer

jonocodes Jul 28, 2024
Author

aivus Aug 21, 2024
Maintainer