apify · B4nan · Nov 5, 2024 · Nov 5, 2024
diff --git a/package.json b/package.json
@@ -114,9 +114,9 @@
         "vite-tsconfig-paths": "^4.3.2",
         "vitest": "^2.0.0"
     },
-    "packageManager": "[email protected].0",
+    "packageManager": "[email protected].1",
     "volta": {
         "node": "22.11.0",
-        "yarn": "4.5.0"
+        "yarn": "4.5.1"
     }
 }
diff --git a/website/blog/2024/02-22-launching-crawlee-blog/index.md b/website/blog/2024/02-22-launching-crawlee-blog/index.md
@@ -3,11 +3,7 @@ slug: crawlee-blog-launch
 title: 'Launching Crawlee Blog'
 description: 'Your Node.js resource hub for web scraping and automation.'
 image: https://raw.githubusercontent.com/souravjain540/crawlee-first-blog/main/og-image.webp
-author: Saurav Jain
-authorTitle: Developer Community Manager
-authorURL: https://github.com/souravjain540
-authorImageURL: https://avatars.githubusercontent.com/u/53312820?v=4&s=48
-authorTwitter: sauain
+authors: [SauravJ]
 ---
 
 Hey, crawling masters!

diff --git a/...g/2024/03-27-how-to-scrape-amazon-using-typescript-cheerio-and-crawlee/index.md b/...g/2024/03-27-how-to-scrape-amazon-using-typescript-cheerio-and-crawlee/index.md
@@ -3,10 +3,7 @@ slug: how-to-scrape-amazon
 title: 'How to scrape Amazon products'
 description: 'A detailed step-by-step guide to scraping products on Amazon using TypeScript, Cheerio, and Crawlee.'
 image: ./img/how-to-scrape-amazon.webp
-author: Lukáš Průša
-authorTitle: Junior Web Automation Engineer
-authorURL: https://github.com/Patai5
-authorImageURL: ./img/lukasp.webp
+authors: [LukasP]
 ---
 
 ## Introduction

diff --git a/website/blog/2024/04-23-scrapy-vs-crawlee/index.md b/website/blog/2024/04-23-scrapy-vs-crawlee/index.md
@@ -3,23 +3,19 @@ slug: scrapy-vs-crawlee
 title: 'Scrapy vs. Crawlee'
 description: 'Which web scraping library should you use in 2024? Learn how each handles headless mode, autoscaling, proxy rotation, errors, and anti-scraping techniques.'
 image: ./img/scrapy-vs-crawlee.webp
-author: Saurav Jain
-authorTitle: Developer Community Manager
-authorURL: https://github.com/souravjain540
-authorImageURL: https://avatars.githubusercontent.com/u/53312820?v=4&s=48
-authorTwitter: sauain
+authors: [SauravJ]
 ---
 
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 
 Hey, crawling masters!
 
-Welcome to another post on the Crawlee blog; this time, we are going to compare Scrapy, one of the oldest and most popular web scraping libraries in the world, with Crawlee, a relative newcomer. This article will answer your questions about when to use Scrapy and help you decide when it would be better to use Crawlee instead. This article will be the first in a series comparing the various technical aspects of Crawlee with Scrapy. 
+Welcome to another post on the Crawlee blog; this time, we are going to compare Scrapy, one of the oldest and most popular web scraping libraries in the world, with Crawlee, a relative newcomer. This article will answer your questions about when to use Scrapy and help you decide when it would be better to use Crawlee instead. This article will be the first in a series comparing the various technical aspects of Crawlee with Scrapy.
 
 ## Introduction:
 
-[Scrapy](https://scrapy.org/) is an open-source Python-based web scraping framework that extracts data from websites. With Scrapy, you create spiders, which are autonomous scripts to download and process web content. The limitation of Scrapy is that it does not work very well with JavaScript rendered websites, as it was designed for static HTML pages. We will do a comparison later in the article about this. 
+[Scrapy](https://scrapy.org/) is an open-source Python-based web scraping framework that extracts data from websites. With Scrapy, you create spiders, which are autonomous scripts to download and process web content. The limitation of Scrapy is that it does not work very well with JavaScript rendered websites, as it was designed for static HTML pages. We will do a comparison later in the article about this.
 
 Crawlee is also an open-source library that originated as [Apify SDK](https://docs.apify.com/sdk/js/). Crawlee has the advantage of being the latest library in the market, so it already has many features that Scrapy lacks, like autoscaling, headless browsing, working with JavaScript rendered websites without any plugins, and many more, which we are going to explain later on.
 
@@ -28,7 +24,7 @@ Crawlee is also an open-source library that originated as [Apify SDK](https://do
 ## Feature comparison
 
 
-We'll start comparing Scrapy and Crawlee by looking at language and development environments, and then features to make the scraping process easier for developers, like autoscaling, headless browsing, queue management, and more. 
+We'll start comparing Scrapy and Crawlee by looking at language and development environments, and then features to make the scraping process easier for developers, like autoscaling, headless browsing, queue management, and more.
 
 
 ### Language and development environments
@@ -39,9 +35,9 @@ Crawlee is one of the few web scraping and automation libraries that supports Ja
 
 ### Headless browsing and JS rendering
 
-Scrapy does not support headless browsers natively, but it supports them with its plugin system, similarly it does not support scraping JavaScript rendered websites, but the plugin system makes this possible. One of the best examples is its [Playwright plugin](https://github.com/scrapy-plugins/scrapy-playwright/tree/main).  
+Scrapy does not support headless browsers natively, but it supports them with its plugin system, similarly it does not support scraping JavaScript rendered websites, but the plugin system makes this possible. One of the best examples is its [Playwright plugin](https://github.com/scrapy-plugins/scrapy-playwright/tree/main).
 
-Apify Store is a JavaScript rendered website, so we will scrape it in this example using the `scrapy-playwright` integration. 
+Apify Store is a JavaScript rendered website, so we will scrape it in this example using the `scrapy-playwright` integration.
 
 For installation and to make changes to [`settings.py`], please follow the instructions on the `scrapy-playwright` [repository on GitHub](https://github.com/scrapy-plugins/scrapy-playwright/tree/main?tab=readme-ov-file#installation).
 
@@ -67,7 +63,7 @@ class ActorSpider(scrapy.Spider):
         page = response.meta['playwright_page']
         await page.wait_for_selector('.ActorStoreItem-title-wrapper')
         actor_card = await page.query_selector('.ActorStoreItem-title-wrapper')
-            
+
         if actor_card:
             actor_text = await actor_card.text_content()
             yield {
@@ -91,8 +87,8 @@ const crawler = new PlaywrightCrawler({
     async requestHandler({ page }) {
         const actorCard = page.locator('.ActorStoreItem-title-wrapper').first();
         const actorText = await actorCard.textContent();
-        await crawler.pushData({ 
-            'actor': actorText 
+        await crawler.pushData({
+            'actor': actorText
         });
     },
 });
@@ -112,8 +108,8 @@ const crawler = new PuppeteerCrawler({
         const actorText = await page.$eval('.ActorStoreItem-title-wrapper', (el) => {
             return el.textContent;
         });
-        await crawler.pushData({ 
-            'actor': actorText 
+        await crawler.pushData({
+            'actor': actorText
         });
     },
 });
@@ -137,8 +133,8 @@ Crawlee has [built-in autoscaling](https://crawlee.dev/api/core/class/Autoscaled
 Scrapy supports both breadth-first and depth-first crawling strategies using a disk-based queuing system. By default, it uses the LIFO queue for the pending requests, which means it is using depth-first order, but if you want to use breadth-first order, you can do it by changing these settings:
 
 ```py title="settings.py"
-DEPTH_PRIORITY = 1 
-SCHEDULER_DISK_QUEUE = "scrapy.squeues.PickleFifoDiskQueue" 
+DEPTH_PRIORITY = 1
+SCHEDULER_DISK_QUEUE = "scrapy.squeues.PickleFifoDiskQueue"
 SCHEDULER_MEMORY_QUEUE = "scrapy.squeues.FifoMemoryQueue"
 ```
 
@@ -154,15 +150,15 @@ Scrapy CLI comes with Scrapy. Just run this command, and you are good to go:
 pip install scrapy
 ```
 
-Crawlee also [includes a CLI tool](https://crawlee.dev/docs/quick-start#installation-with-crawlee-cli) (`crawlee-cli`) that facilitates project setup, crawler creation and execution, streamlining the development process for users familiar with Node.js environments. The command for installation is: 
+Crawlee also [includes a CLI tool](https://crawlee.dev/docs/quick-start#installation-with-crawlee-cli) (`crawlee-cli`) that facilitates project setup, crawler creation and execution, streamlining the development process for users familiar with Node.js environments. The command for installation is:
 
 ```bash
 npx crawlee create my-crawler
 ```
- 
+
 ### Proxy rotation and storage management
 
-Scrapy handles it via custom middleware. You have to install their [`scrapy-rotating-proxies`](https://pypi.org/project/scrapy-rotating-proxies/) package using pip. 
+Scrapy handles it via custom middleware. You have to install their [`scrapy-rotating-proxies`](https://pypi.org/project/scrapy-rotating-proxies/) package using pip.
 
 ```bash
 pip install scrapy-rotating-proxies
@@ -215,15 +211,15 @@ Scrapy provides this functionality out of the box with the [`Feed Exports`](http
 To do this, you need to modify your `settings.py` file and enter:
 
 ```py title="settings.py"
-# To store in CSV format 
+# To store in CSV format
 FEEDS = {
-    'data/crawl_data.csv': {'format': 'csv', 'overwrite': True} 
+    'data/crawl_data.csv': {'format': 'csv', 'overwrite': True}
 }
 
 # OR to store in JSON format
 
-FEEDS = { 
-    'data/crawl_data.json': {'format': 'json', 'overwrite': True} 
+FEEDS = {
+    'data/crawl_data.json': {'format': 'json', 'overwrite': True}
 }
 ```
 
@@ -243,7 +239,7 @@ Let's see how Crawlee stores the result:
 
             const title = await page.title();
             const price = await page.textContent('.price');
-    
+
             await crawler.pushData({
                 url: request.url,
                 title,
@@ -267,7 +263,7 @@ Let's see how Crawlee stores the result:
 
 In Scrapy, handling anti-blocking strategies like [IP rotation](https://pypi.org/project/scrapy-rotated-proxy/), [user-agent rotation](https://python.plainenglish.io/rotating-user-agent-with-scrapy-78ca141969fe), custom solutions via middleware, and plugins are needed.
 
-Crawlee provides HTTP crawling and [browser fingerprints](https://crawlee.dev/docs/guides/avoid-blocking) with zero configuration necessary; fingerprints are enabled by default and available in `PlaywrightCrawler` and `PuppeteerCrawler` but also work with `CheerioCrawler` and the other HTTP Crawlers. 
+Crawlee provides HTTP crawling and [browser fingerprints](https://crawlee.dev/docs/guides/avoid-blocking) with zero configuration necessary; fingerprints are enabled by default and available in `PlaywrightCrawler` and `PuppeteerCrawler` but also work with `CheerioCrawler` and the other HTTP Crawlers.
 
 ### Error handling
 
@@ -300,20 +296,20 @@ Crawlee also provides a built-in [logging mechanism](https://crawlee.dev/api/cor
 
 Scrapy can be containerized using Docker, though it typically requires manual setup to create Dockerfiles and configure environments. While Crawlee includes [ready-to-use Docker configurations](https://crawlee.dev/docs/guides/docker-images), making deployment straightforward across various environments without additional configuration.
 
-## Community 
+## Community
 
-Both projects are open source. Scrapy benefits from a large and well-established community. It has been around since 2008 and has attracted a lot of attention among developers, particularly those in the Python ecosystem. 
+Both projects are open source. Scrapy benefits from a large and well-established community. It has been around since 2008 and has attracted a lot of attention among developers, particularly those in the Python ecosystem.
 
 Crawlee started its journey as Apify SDK in 2018. It now has more than [12K stars on GitHub](https://github.com/apify/crawlee), a community of more than 7,000 developers in its [Discord Community](https://apify.com/discord), and is used by the TypeScript and JavaScript community.
 
 ## So which is better - Scrapy or Crawlee?
 
 Both frameworks can handle a wide range of scraping tasks, and the best choice will depend on specific technical needs like language preference, project requirements, ease of use, etc.
 
-If you are comfortable with Python and want to work only with it, go with Scrapy. It has very detailed documentation, and it is one of the oldest and most stable libraries in the space. 
+If you are comfortable with Python and want to work only with it, go with Scrapy. It has very detailed documentation, and it is one of the oldest and most stable libraries in the space.
 
 But if you want to explore or are comfortable working with TypeScript or JavaScript, our recommendation is Crawlee. With all the valuable features like a single interface for HTTP requests and headless browsing, making it work well with JavaScript rendered websites, autoscaling and fingerprint support, it is the best choice for scraping websites that can be complex, resource intensive, using JavaScript, or even have blocking methods.
 
-As promised, this is just the first of the many articles comparing Scrapy and Crawlee. With the upcoming articles, you will learn more about every technical detail. 
+As promised, this is just the first of the many articles comparing Scrapy and Crawlee. With the upcoming articles, you will learn more about every technical detail.
 
 Meanwhile, if you want to learn more about Crawlee, read our [introduction to Crawlee](https://crawlee.dev/docs/introduction) or Apify's [Crawlee web scraping tutorial](https://blog.apify.com/crawlee-web-scraping-tutorial/).
diff --git a/...2024/06-10-creating-a-netflix-show-recommender-using-crawlee-and-react/index.md b/...2024/06-10-creating-a-netflix-show-recommender-using-crawlee-and-react/index.md
@@ -4,11 +4,7 @@ title: 'Building a Netflix show recommender using Crawlee and React'
 tags: [community]
 description: 'Create a Netflix show recommendation system using Crawlee to scrape the data, JavaScript to code, and React to build the front end.'
 image: ./img/create-netflix-show-recommender.webp
-author: Ayush Thakur
-authorTitle: Community Member of Crawlee
-authorURL: https://github.com/ayush2390
-authorImageURL: https://avatars.githubusercontent.com/u/43995654?v=4&s=48
-authorTwitter: JSAyushThakur
+authors: [AyushT]
 ---
 
 # Building a Netflix web show recommender with Crawlee and React
@@ -33,7 +29,7 @@ To use Crawlee, you need to have Node.js 16 or newer.
 If you like the posts on the Crawlee blog so far, please consider [giving Crawlee a star on GitHub](https://github.com/apify/crawlee), it helps us to reach and help more developers.
 :::
 
-You can install the latest version of Node.js from the [official website](https://nodejs.org/en/). This great [Node.js installation guide](https://blog.apify.com/how-to-install-nodejs/) gives you tips to avoid issues later on. 
+You can install the latest version of Node.js from the [official website](https://nodejs.org/en/). This great [Node.js installation guide](https://blog.apify.com/how-to-install-nodejs/) gives you tips to avoid issues later on.
 
 ## Creating a React app
 
@@ -130,7 +126,7 @@ await pushData({
       genres: genres,
       shows: shows,
     });
-```  
+```
 
 This will save the `genres` and `shows` arrays as values in the `genres` and `shows` keys.
 
@@ -294,4 +290,4 @@ Project link - [https://github.com/ayush2390/web-show-recommender](https://githu
 
 In this project, we used Crawlee to scrape Netflix; similarly, Crawlee can be used to scrape single application pages (SPAs) and JavaScript-rendered websites. The best part is all of this can be done while coding in JavaScript/TypeScript and using a single library.
 
-If you want to learn more about Crawlee, go through the [documentation](https://crawlee.dev/docs/quick-start) and this step-by-step [Crawlee web scraping tutorial](https://blog.apify.com/crawlee-web-scraping-tutorial/) from Apify.
+If you want to learn more about Crawlee, go through the [documentation](https://crawlee.dev/docs/quick-start) and this step-by-step [Crawlee web scraping tutorial](https://blog.apify.com/crawlee-web-scraping-tutorial/) from Apify.
diff --git a/website/blog/2024/06-24-proxy-management-in-crawlee/index.md b/website/blog/2024/06-24-proxy-management-in-crawlee/index.md
@@ -4,11 +4,7 @@ title: 'How Crawlee uses tiered proxies to avoid getting blocked'
 tags: [proxy]
 description: 'Find out how Crawlee’s tiered proxy system rotates between different types of proxies to control web scraping costs and avoid getting blocked.'
 image: ./img/tiered-proxies.webp
-author: Saurav Jain
-authorTitle: Developer Community Manager @ Crawlee
-authorURL: https://github.com/souravjain540
-authorImageURL: https://avatars.githubusercontent.com/u/53312820?v=4
-authorTwitter: sauain
+authors: [SauravJ]
 ---
 
 Hello Crawlee community,
@@ -19,6 +15,8 @@ Proxies vary in quality, speed, reliability, and cost. There are a [few types of
 
 It is hard for developers to decide which proxy to use while scraping data. We might get blocked if we use [datacenter proxies](https://blog.apify.com/datacenter-proxies-when-to-use-them-and-how-to-make-the-most-of-them/) for low-cost scraping, but residential proxies are sometimes too expensive for bigger projects. Developers need a system that can manage both costs and avoid getting blocked. To manage this, we recently introduced tiered proxies in Crawlee. Let’s take a look at it.
 
+<!--truncate-->
+
 :::note
 
 If you like reading this blog, we would be really happy if you gave [Crawlee a star on GitHub!](https://github.com/apify/crawlee/)

diff --git a/website/blog/2024/07-05-launching-crawlee-python/index.md b/website/blog/2024/07-05-launching-crawlee-python/index.md
@@ -3,11 +3,7 @@ slug: launching-crawlee-python
 title: 'Announcing Crawlee for Python: Now you can use Python to build reliable web crawlers'
 description: 'Launching Crawlee for Python, a web scraping and automation library to build reliable scrapers in Python fastly.'
 image: ./img/crawlee-python.webp
-author: Saurav Jain
-authorTitle: Developer Community Manager
-authorURL: https://github.com/souravjain540
-authorImageURL: https://avatars.githubusercontent.com/u/53312820?v=4&s=48
-authorTwitter: sauain
+authors: [SauravJ]
 ---
 
 > Testimonial from early adopters

diff --git a/website/blog/2024/08-20-problems-in-web-scraping/index.md b/website/blog/2024/08-20-problems-in-web-scraping/index.md
@@ -4,10 +4,7 @@ title: 'Current problems and mistakes of web scraping in Python and tricks to so
 tags: [community]
 description: 'Current problems and mistakes that developers encounters while scraping and crawling the internet with the advises and solution from an web scraping expert.'
 image: ./img/problems-in-scraping.webp
-author: Max
-authorTitle: Community Member of Crawlee and web scraping expert
-authorURL: https://github.com/Mantisus
-authorImageURL: https://avatars.githubusercontent.com/u/34358312?v=4
+authors: [MaxB]
 ---
 
 ## Introduction

diff --git a/website/blog/2024/08-27-how-to-scrape-infinite-scrolling-pages/index.md b/website/blog/2024/08-27-how-to-scrape-infinite-scrolling-pages/index.md
@@ -3,11 +3,7 @@ slug: infinite-scroll-using-python
 title: 'How to scrape infinite scrolling webpages with Python'
 description: 'Learn how to scrape infinite scrolling pages with Python and scrape Nike shoes using Crawlee for Python.'
 image: ./img/infinite-scroll.webp
-author: Saurav Jain
-authorTitle: Developer Community Manager
-authorURL: https://github.com/souravjain540
-authorImageURL: https://avatars.githubusercontent.com/u/53312820?v=4
-authorTwitter: Sauain
+authors: [SauravJ]
 ---
 
 # How to scrape infinite scrolling webpages with Python

diff --git a/website/blog/2024/09-12-finding-students-accommodations/index.md b/website/blog/2024/09-12-finding-students-accommodations/index.md
@@ -4,10 +4,7 @@ title: 'Web scraping of a dynamic website using Python with HTTP Client'
 tags: [community]
 description: 'Learn how to scrape dynamic websites using Crawlee for Python with HTTP client.'
 image: ./img/dynamic-websites.webp
-author: Max
-authorTitle: Community Member of Crawlee and web scraping expert
-authorURL: https://github.com/Mantisus
-authorImageURL: https://avatars.githubusercontent.com/u/34358312?v=4
+authors: [MaxB]
 ---
 
 # Web scraping of a dynamic website using Crawlee for Python with HTTP client
@@ -77,7 +74,7 @@ Great, let's also look at the parameters used in the search API request and make
 - `sortBy: price` - the field by which sorting is performed
 - `order: asc` - type of sorting
 
-But there's another important point to pay attention to. Let's look at our link in the browser bar, which looks like this: 
+But there's another important point to pay attention to. Let's look at our link in the browser bar, which looks like this:
 
 ```plaintext
 https://www.accommodationforstudents.com/search-results?location=London&beds=0&occupancy=min&minPrice=0&maxPrice=500&latitude=51.509865&longitude=-0.118092&geo=false&page=1