Web Scraping With SeleniumBase

Simplify web scraping with SeleniumBase using its advanced features and step-by-step guide. Interested in Selenium web scraping? Check out this guide.

What Is SeleniumBase?

SeleniumBase is a Python framework for browser automation, built on top of Selenium/WebDriver APIs. It supports tasks from testing to scraping and includes features like CAPTCHA bypassing and bot-detection avoidance.

SeleniumBase vs Selenium: Feature and API Comparison

Feature	SeleniumBase	Selenium
Built-in test runners	Integrates with pytest, pynose, and behave	Requires manual setup for test integration
Driver management	Auto-downloads matching browser driver	Manual download and configuration
Web automation logic	Combines steps into single method call	Requires multiple lines of code
Selector handling	Auto-detects CSS or XPath selectors	Requires explicit selector types
Timeout handling	Default timeouts to prevent failures	Immediate failures without explicit timeouts
Error outputs	Clean, readable error messages	Verbose, less interpretable error logs
Dashboards and reports	Built-in dashboards, reports, and screenshots	No built-in dashboards or reporting
Desktop GUI applications	Visual tools for test running	Lacks desktop GUI tools
Test recorder	Built-in test recorder	Requires manual script writing
Test case management	Provides CasePlans	No built-in test case management
Data app support	Includes ChartMaker for data apps	No additional tools for data apps

Using SeleniumBase for Web Scraping: Step-By-Step Guide

Step #1: Project Initialization

mkdir seleniumbase-scraper
cd seleniumbase-scraper
python -m venv env

Activate the virtual environment:

On Linux/macOS: ./env/bin/activate
On Windows: env/Scripts/activate

Install SeleniumBase:

pip install seleniumbase

Step #2: SeleniumBase Test Setup

from seleniumbase import SB

with SB() as sb:
    pass

Run the script:

python3 scraper.py --headless

Step #3: Connect to the Target Page

sb.open("https://quotes.toscrape.com/")

Step #4: Select the Quote Elements

quote_elements = sb.find_elements(".quote")

Step #5: Scrape Quote Data

from selenium.webdriver.common.by import By

for quote_element in quote_elements:
    text_element = quote_element.find_element(By.CSS_SELECTOR, ".text")
    text = text_element.text.replace("“", "").replace("”", "")
    author_element = quote_element.find_element(By.CSS_SELECTOR, ".author")
    author = author_element.text
    tags = [tag.text for tag in quote_element.find_elements(By.CSS_SELECTOR, ".tag")]

Step #6: Populate the Quotes Array

quotes.append({"text": text, "author": author, "tags": tags})

Step #7: Implement Crawling Logic

while sb.is_element_present(".next"):
    sb.click(".next a")

Step #8: Export the Scraped Data

import csv

with open("quotes.csv", mode="w", newline="", encoding="utf-8") as file:
    writer = csv.DictWriter(file, fieldnames=["text", "author", "tags"])
    writer.writeheader()
    for quote in quotes:
        writer.writerow({"text": quote["text"], "author": quote["author"], "tags": ";".join(quote["tags"])})

Step #9: Put It All Together

from seleniumbase import SB
from selenium.webdriver.common.by import By
import csv

with SB() as sb:
    sb.open("https://quotes.toscrape.com/")
    quotes = []
    while sb.is_element_present(".next"):
        quote_elements = sb.find_elements(".quote")
        for quote_element in quote_elements:
            text_element = quote_element.find_element(By.CSS_SELECTOR, ".text")
            text = text_element.text.replace("“", "").replace("”", "")
            author_element = quote_element.find_element(By.CSS_SELECTOR, ".author")
            author = author_element.text
            tags = [tag.text for tag in quote_element.find_elements(By.CSS_SELECTOR, ".tag")]
            quotes.append({"text": text, "author": author, "tags": tags})
        sb.click(".next a")
    with open("quotes.csv", mode="w", newline="", encoding="utf-8") as file:
        writer = csv.DictWriter(file, fieldnames=["text", "author", "tags"])
        writer.writeheader()
        for quote in quotes:
            writer.writerow({"text": quote["text"], "author": quote["author"], "tags": ";".join(quote["tags"])})

Run the scraper:

python3 script.py --headless

Advanced SeleniumBase Scraping Use Cases

Automate Form Filling and Submission

from seleniumbase import BaseCase
BaseCase.main(__name__, __file__)

class LoginTest(BaseCase):
    def test_submit_login_form(self):
        self.open("https://quotes.toscrape.com/login")
        self.type("#username", "test")
        self.type("#password", "test")
        self.click("input[type=\"submit\"]")
        self.assert_text("Top Ten tags")

Run the test:

pytest login.py

Bypass Simple Anti-Bot Technologies

from seleniumbase import SB

with SB(uc=True) as sb:
    url = "https://www.scrapingcourse.com/antibot-challenge"
    sb.uc_open_with_reconnect(url, reconnect_time=4)
    sb.uc_gui_click_captcha()
    sb.save_screenshot("screenshot.png")

Bypass Complex Anti-Bot Technologies

from seleniumbase import SB

with SB(uc=True, test=True) as sb:
    url = "https://gitlab.com/users/sign_in"
    sb.activate_cdp_mode(url)
    sb.uc_gui_click_captcha()
    sb.sleep(2)
    sb.save_screenshot("screenshot.png")

Conclusion

SeleniumBase offers advanced features for web scraping, including UC Mode and CDP Mode for bypassing anti-bot measures. For more robust solutions, consider using cloud-based browsers like Scraping Browser from Bright Data.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping With SeleniumBase

What Is SeleniumBase?

SeleniumBase vs Selenium: Feature and API Comparison

Using SeleniumBase for Web Scraping: Step-By-Step Guide

Step #1: Project Initialization

Step #2: SeleniumBase Test Setup

Step #3: Connect to the Target Page

Step #4: Select the Quote Elements

Step #5: Scrape Quote Data

Step #6: Populate the Quotes Array

Step #7: Implement Crawling Logic

Step #8: Export the Scraped Data

Step #9: Put It All Together

Advanced SeleniumBase Scraping Use Cases

Automate Form Filling and Submission

Bypass Simple Anti-Bot Technologies

Bypass Complex Anti-Bot Technologies

Conclusion

About

luminati-io/Seleniumbase-web-scraping

Folders and files

Latest commit

History

Repository files navigation

Web Scraping With SeleniumBase

What Is SeleniumBase?

SeleniumBase vs Selenium: Feature and API Comparison

Using SeleniumBase for Web Scraping: Step-By-Step Guide

Step #1: Project Initialization

Step #2: SeleniumBase Test Setup

Step #3: Connect to the Target Page

Step #4: Select the Quote Elements

Step #5: Scrape Quote Data

Step #6: Populate the Quotes Array

Step #7: Implement Crawling Logic

Step #8: Export the Scraped Data

Step #9: Put It All Together

Advanced SeleniumBase Scraping Use Cases

Automate Form Filling and Submission

Bypass Simple Anti-Bot Technologies

Bypass Complex Anti-Bot Technologies

Conclusion

About

Topics

Resources

Stars

Watchers

Forks