Using a Proxy in Python with Selenium

James Thompson

Scraping and Proxy Management Expert

12-Sep-2024

Have you been identified as a bot using Selenium for web scraping?

Understandable. Although Selenium is a great tool for scraping dynamic webpages, it is unable to work against sophisticated anti-bot defenses on its own. You may add a proxy to your Selenium scraper to control rate limitations, avoid geographical restrictions, and prevent IP bans.

Selenium Proxy: What Is It?

A proxy serves as a go-between for a client and a server. By using it, the client circumvents geographic limitations and sends anonymous, secure requests to other servers.

Proxy servers can be used by headless browsers in the same way as HTTP clients. When accessing websites, a Selenium proxy helps safeguard your IP address and circumvent bans.

Selenium with proxy support is very helpful for browser automation tasks like site scraping and testing. To find out how to set up a proxy in Selenium for web scraping, continue reading!

How to Configure a Selenium Proxy

The code line that follows loads a headless Chrome driver and navigates to httpbin, a website that provides the client's IP address. The script prints the HTML answer at the end.

language Copy

# pip install selenium webdriver-manager
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options

# set Chrome options to run in headless mode
options = Options()
options.add_argument("--headless=new")

# initialize Chrome driver
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()), 
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the HTML of the target webpage
print(driver.page_source)

# release the resources and close the browser
driver.quit()

The following HTML will be printed by the code:

language Copy

<html><head><meta name="color-scheme" content="light dark"><meta charset="utf-8"></head><body><pre>{
  "origin": "50.217.226.40:80"
}
</pre><div class="json-formatter-container"></div></body></html>

In Selenium, to set a proxy, you must:

Obtain a reliable proxy server
Enter it in the Chrome option's --proxy-server field
Go to the page you want to visit.

First, visit the Free Proxy List website to obtain a free proxy address. Set up Selenium with Options such that a proxy is used to open Chrome. Print the destination webpage's body text after that.

language Copy

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# define the proxy address and port
proxy = "20.235.159.154:80"

# set Chrome options to run in headless mode using a proxy
options = Options()
options.add_argument("--headless=new")
options.add_argument(f"--proxy-server={proxy}")

# initialize Chrome driver
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()),
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the body content of the target webpage
print(driver.find_element(By.TAG_NAME, "body").text)

# release the resources and close the browser
driver.quit()

Now, every request made by the controlled instance of Chrome will be routed through the designated proxy.

The proxy server IP and the site response match. This indicates that Selenium is using the proxy server to see websites.

Proxy Authentication in Selenium

Certain proxy servers utilize authentication to prevent users without legitimate credentials from accessing their servers. That is typically the case when using premium proxies or commercial solutions.

The following is the Selenium syntax for providing a username and password in an authenticated proxy URL:

language Copy

<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>

However, because the Chrome driver ignores the username and password by default, entering a URL in --proxy-server won't work. A third-party plugin like Selenium Wire can help in this situation.

With the help of Selenium Wire, you may modify browser requests as you see fit and have access to the requests themselves. To install it, use the command shown below:

language Copy

pip install blinker==1.7.0 selenium-wire

To handle proxy authentication, use Selenium Wire, as demonstrated below:

language Copy

from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# configure the proxy
proxy_username = "<YOUR_USERNAME>"
proxy_password = "<YOUR_PASSWORD>"
proxy_address = "20.235.159.154"
proxy_port = "80"

# formulate the proxy url with authentication
proxy_url = f"http://{proxy_username}:{proxy_password}@{proxy_address}:{proxy_port}"

# set selenium-wire options to use the proxy
seleniumwire_options = {
    "proxy": {
        "http": proxy_url,
        "https": proxy_url
    },
}

# set Chrome options to run in headless mode
options = Options()
options.add_argument("--headless=new")

# initialize the Chrome driver with service, selenium-wire options, and chrome options
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()),
    seleniumwire_options=seleniumwire_options,
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the body content of the target webpage
print(driver.find_element(By.TAG_NAME, "body").text)

# release the resources and close the browser
driver.quit()

Are you tired of continuous web scraping blocks?

Scrapeless: the best all-in-one online scraping solution available!

Stay anonymous and avoid IP-based bans with our intelligent, high-performance proxy rotation:

Try it for free!

The Best Selenium Proxy Protocols

The most popular alternatives for selecting a protocol for a Selenium proxy are HTTP, HTTPS, and SOCKS5.

HTTPS proxies add an additional degree of security by encrypting the data they transfer over the internet, in contrast to HTTP proxies. The latter is hence more favored and safe.

SoCKS5, or SOCKS, is another helpful protocol for Selenium proxies. It is a more flexible protocol since it can handle a greater variety of online traffic, such as email and file transfer protocols.

All things considered, web scraping and crawling benefit greatly from HTTP and HTTPS proxies, whereas non-HTTP traffic duties are a suitable fit for SOCKS.

Utilize a Rotating Proxy in Python for Selenium

If your script sends out several queries in a little period of time, the server can flag it as suspicious and ban your IP. You have less success scraping data when you try to use specific IP addresses since websites may identify and block requests from those addresses.

Nonetheless, this issue may be resolved by employing a rotating proxy strategy. After a certain amount of time or requests, your end IP will continually change as a result of changing proxies. This keeps you from being banned by the server by making you seem as a distinct user each time.

Let's look at how to use selenium-wire to create a proxy rotator in Selenium.

It is necessary to first establish a proxy pool. We'll make use of several free proxies in this example.

As follows, put them in an array:

language Copy

PROXIES = [
    "http://19.151.94.248:88",
    "http://149.169.197.151:80",
    # ...
    "http://212.76.118.242:97"
]

Next, use random.choice() to extract a random proxy, which you can then use to start a new instance of the driver. This is how your finished code should appear:

language Copy

from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

import random

# the list of proxy to rotate on 
PROXIES = [
    "http://20.235.159.154:80",
    "http://149.169.197.151:80",
    # ...
    "http://212.76.118.242:97"
]

# randomly select a proxy
proxy = random.choice(PROXIES)

# set selenium-wire options to use the proxy
seleniumwire_options = {
    "proxy": {
        "http": proxy,
        "https": proxy
    },
}

# set Chrome options to run in headless mode
options = Options()
options.add_argument("--headless=new")

# initialize the Chrome driver with service, selenium-wire options, and chrome options
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()),
    seleniumwire_options=seleniumwire_options,
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the body content of the target webpage
print(driver.find_element(By.TAG_NAME, "body").text)

# release the resources and close the browser
driver.quit()

In actuality, using free proxies will typically result in blocking. Although we utilized them to illustrate the fundamentals, you should never depend on them for a practical project.

Selenium Grid's Error 403: Proxy is Forbidden

Parallel execution of cross-platform scripts and remote browser control are made possible using Selenium Grid. Nevertheless, if you use it, you can receive one of the most frequent issues during web scraping: Error 403: Forbidden for Proxy. There are two reasons why that occurs:

Port 4444 is already in use by another process.
The right URL isn't being received by your RemoteWebDriver requests.
Make sure you're connecting the remote driver to the correct hub URL, as indicated below, if it doesn't resolve the problem:

language Copy

import selenium.webdriver as webdriver
# ...
webdriver.Remote('http://localhost:4444/wd/hub', {})

Conclusion

Proxy servers can help circumvent anti-bot detection systems, but they need a lot of human upkeep and are not always reliable. Use a web scraping API, like Scrapeless, to reliably get around any anti-bot measures and save yourself the trouble of locating and setting proxies. Get a free trial of Scrapeless!

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Using a Proxy in Python with Selenium

Selenium Proxy: What Is It?

How to Configure a Selenium Proxy

Proxy Authentication in Selenium

The Best Selenium Proxy Protocols

Utilize a Rotating Proxy in Python for Selenium

Selenium Grid's Error 403: Proxy is Forbidden

Conclusion

Most Popular Articles

Scrapeless Deep SerpApi: The Fastest Google Search Data API for SEO & LLMs and RAG

Introducing Scrapeless: Intelligent Web Scraping Toolkit is Officially Launched — Unlock a New Era of Data Scraping!

Scraping Product Details from Google Shopping with Scrapeless