Skip to content

Commit

Permalink
fix: check .isFinished() before RequestList reads (#2695)
Browse files Browse the repository at this point in the history
Adds a simple `.isFinished()` check before the
`RequestList.fetchNextRequest()` call. This eliminates long-blocking
calls with non-trivial `RequestList` implementations (e.g.
`SitemapRequestList` in some edge cases).
  • Loading branch information
barjin authored Oct 4, 2024
1 parent d38645c commit 6fa170f
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion packages/basic-crawler/src/internals/basic-crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1138,7 +1138,10 @@ export class BasicCrawler<Context extends CrawlingContext = BasicCrawlingContext
* and RequestQueue is present then enqueues it to the queue first.
*/
protected async _fetchNextRequest() {
if (!this.requestList) return this.requestQueue!.fetchNextRequest();
if (!this.requestList || (await this.requestList.isFinished())) {
return this.requestQueue?.fetchNextRequest();
}

const request = await this.requestList.fetchNextRequest();
if (!this.requestQueue) return request;
if (!request) return this.requestQueue.fetchNextRequest();
Expand Down

0 comments on commit 6fa170f

Please sign in to comment.