Simplify web scraping with SeleniumBase using its advanced features and step-by-step guide. Interested in Selenium web scraping? Check out this guide.
SeleniumBase is a Python framework for browser automation, built on top of Selenium/WebDriver APIs. It supports tasks from testing to scraping and includes features like CAPTCHA bypassing and bot-detection avoidance.
Feature | SeleniumBase | Selenium |
---|---|---|
Built-in test runners | Integrates with pytest, pynose, and behave | Requires manual setup for test integration |
Driver management | Auto-downloads matching browser driver | Manual download and configuration |
Web automation logic | Combines steps into single method call | Requires multiple lines of code |
Selector handling | Auto-detects CSS or XPath selectors | Requires explicit selector types |
Timeout handling | Default timeouts to prevent failures | Immediate failures without explicit timeouts |
Error outputs | Clean, readable error messages | Verbose, less interpretable error logs |
Dashboards and reports | Built-in dashboards, reports, and screenshots | No built-in dashboards or reporting |
Desktop GUI applications | Visual tools for test running | Lacks desktop GUI tools |
Test recorder | Built-in test recorder | Requires manual script writing |
Test case management | Provides CasePlans | No built-in test case management |
Data app support | Includes ChartMaker for data apps | No additional tools for data apps |
mkdir seleniumbase-scraper
cd seleniumbase-scraper
python -m venv env
Activate the virtual environment:
- On Linux/macOS:
./env/bin/activate
- On Windows:
env/Scripts/activate
Install SeleniumBase:
pip install seleniumbase
from seleniumbase import SB
with SB() as sb:
pass
Run the script:
python3 scraper.py --headless
sb.open("https://quotes.toscrape.com/")
quote_elements = sb.find_elements(".quote")
from selenium.webdriver.common.by import By
for quote_element in quote_elements:
text_element = quote_element.find_element(By.CSS_SELECTOR, ".text")
text = text_element.text.replace("“", "").replace("”", "")
author_element = quote_element.find_element(By.CSS_SELECTOR, ".author")
author = author_element.text
tags = [tag.text for tag in quote_element.find_elements(By.CSS_SELECTOR, ".tag")]
quotes.append({"text": text, "author": author, "tags": tags})
while sb.is_element_present(".next"):
sb.click(".next a")
import csv
with open("quotes.csv", mode="w", newline="", encoding="utf-8") as file:
writer = csv.DictWriter(file, fieldnames=["text", "author", "tags"])
writer.writeheader()
for quote in quotes:
writer.writerow({"text": quote["text"], "author": quote["author"], "tags": ";".join(quote["tags"])})
from seleniumbase import SB
from selenium.webdriver.common.by import By
import csv
with SB() as sb:
sb.open("https://quotes.toscrape.com/")
quotes = []
while sb.is_element_present(".next"):
quote_elements = sb.find_elements(".quote")
for quote_element in quote_elements:
text_element = quote_element.find_element(By.CSS_SELECTOR, ".text")
text = text_element.text.replace("“", "").replace("”", "")
author_element = quote_element.find_element(By.CSS_SELECTOR, ".author")
author = author_element.text
tags = [tag.text for tag in quote_element.find_elements(By.CSS_SELECTOR, ".tag")]
quotes.append({"text": text, "author": author, "tags": tags})
sb.click(".next a")
with open("quotes.csv", mode="w", newline="", encoding="utf-8") as file:
writer = csv.DictWriter(file, fieldnames=["text", "author", "tags"])
writer.writeheader()
for quote in quotes:
writer.writerow({"text": quote["text"], "author": quote["author"], "tags": ";".join(quote["tags"])})
Run the scraper:
python3 script.py --headless
from seleniumbase import BaseCase
BaseCase.main(__name__, __file__)
class LoginTest(BaseCase):
def test_submit_login_form(self):
self.open("https://quotes.toscrape.com/login")
self.type("#username", "test")
self.type("#password", "test")
self.click("input[type=\"submit\"]")
self.assert_text("Top Ten tags")
Run the test:
pytest login.py
from seleniumbase import SB
with SB(uc=True) as sb:
url = "https://www.scrapingcourse.com/antibot-challenge"
sb.uc_open_with_reconnect(url, reconnect_time=4)
sb.uc_gui_click_captcha()
sb.save_screenshot("screenshot.png")
from seleniumbase import SB
with SB(uc=True, test=True) as sb:
url = "https://gitlab.com/users/sign_in"
sb.activate_cdp_mode(url)
sb.uc_gui_click_captcha()
sb.sleep(2)
sb.save_screenshot("screenshot.png")
SeleniumBase offers advanced features for web scraping, including UC Mode and CDP Mode for bypassing anti-bot measures. For more robust solutions, consider using cloud-based browsers like Scraping Browser from Bright Data.