How to Use Pyppeteer with a Proxy In 2024

Scraping and Proxy Management Expert
It's crucial to route HTTP requests across many IP addresses in order to avoid being banned during web scraping. That's why in this tutorial we'll be learning how to construct a Pyppeteer proxy!
Prerequisites
Make sure your local system is running Python 3.6 or above.
Next, use pip to install Pyppeteer from PyPI by executing the line below.
language
pip install pyppeteer
Are you tired of continuous web scraping blocks?
Scrapeless: the best all-in-one online scraping solution available!
Stay anonymous and avoid IP-based bans with our intelligent, high-performance proxy rotation:
Try it for free!
How to Utilize Pyppeteer as a Proxy
To get started, write the script scraper.py
to request your current IP address from ident.me
.
language
import asyncio
from pyppeteer import launch
async def main():
# Create a new headless browser instance
browser = await launch()
# Create a new page
page = await browser.newPage()
# Navigate to target website
await page.goto('https://ident.me')
# Select the body element
body = await page.querySelector('body')
# Get the text content of the selected element
content = await page.evaluate('(element) => element.textContent', body)
# Dump the result
print(content)
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
To obtain the body content of the target page, run the script.
language
python scraper.py
It's time to update your script to include a Pyppeteer proxy. To do that, get a free proxy from FreeProxyList (you might not be able to use the one we used).
The scraper.py
script uses the launch()
function, which opens a new browser instance and lets you pass in certain parameters. Set the --proxy-server
parameter to tell the browser to route Pyppeteer requests through a proxy. One of the choices is args
, which is a list of extra arguments to send to the browser process.
language
# ...
async def main():
# Create a new headless browser instance
browser = await launch(args=['--proxy-server=http://20.219.108.109:8080'])
# Create a new page
page = await browser.newPage()
# ...
This is the whole code:
language
import asyncio
from pyppeteer import launch
async def main():
# Create a new headless browser instance
browser = await launch(args=['--proxy-server=http://20.219.108.109:8080'])
# Create a new page
page = await browser.newPage()
# Navigate to target website
await page.goto('https://ident.me')
# Select the body element
body = await page.querySelector('body')
# Get the text content of the selected element
content = await page.evaluate('(element) => element.textContent', body)
# Dump the result
print(content)
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
This time, when you run the script again with the command line option python scraper.py, the IP address of your proxy should appear on the screen.
language
20.219.108.109
Pyppeteer Authentication via a Proxy
You will require a username and password for authentication if you use a premium proxy. Use the --proxy-auth parameter for that.
language
# ...
# Create a new headless browser instance
browser = await launch(args=[
'--proxy-server=http://20.219.108.109:8080'
'--proxy-auth=<YOUR_USERNAME>:<YOUR_PASSWORD>'
])
# ...
As an alternative, you may authenticate using the page API as seen below:
language
# ...
# Create a new page
page = await browser.newPage()
await page.authenticate({ 'username': '<YOUR_USERNAME>', 'password': '<YOUR_PASSWORD>' })
# ...
Use Pyppeteer to Configure a Dynamic Proxy
To prevent being blacklisted, you must utilize a dynamic proxy for web scraping instead of the static proxy you previously used. Using Pyppeteer, you may create numerous browser instances, each with a unique proxy setup.
To begin, obtain additional free proxies and compile a list of them:
language
# ...
import random
proxies = [
'http://20.219.108.109:8080',
'http://210.22.77.94:9002',
'http://103.150.18.218:80',
]
# ...
Next, write an asynchronous function that makes a Pyppeteer request to ident.me
using an asynchronous function that accepts a proxy as an argument:
language
# ...
async def init_pyppeteer_proxy_request(url):
# Create a new headless browser instance
browser = await launch(args=[
f'--proxy-server={url}',
])
# Create a new page
page = await browser.newPage()
# Navigate to target website
await page.goto('https://ident.me')
# Select the body element
body = await page.querySelector('body')
# Get the text content of the selected element
content = await page.evaluate('(element) => element.textContent', body)
# Dump the result
print(content)
await browser.close()
# ...
Now, change the main()
function such that it calls the newly constructed function via a randomly chosen proxy:
language
# ...
async def main():
for i in range(3):
await init_pyppeteer_proxy_request(random.choice(proxies))
# ...
This is how your code should currently appear:
language
import asyncio
from pyppeteer import launch
import random
proxies = [
'http://20.219.108.109:8080',
'http://210.22.77.94:9002',
'http://103.150.18.218:80',
]
async def init_pyppeteer_proxy_request(url):
# Create a new headless browser instance
browser = await launch(args=[
f'--proxy-server={url}',
])
# Create a new page
page = await browser.newPage()
# Navigate to target website
await page.goto('https://ident.me')
# Select the body element
body = await page.querySelector('body')
# Get the text content of the selected element
content = await page.evaluate('(element) => element.textContent', body)
# Dump the result
print(content)
await browser.close()
async def main():
for i in range(3):
await init_pyppeteer_proxy_request(random.choice(proxies))
asyncio.get_event_loop().run_until_complete(main())
Install Python Requests (or any other HTTP request library) after placing the Python scraper code that the request builder produced into a new file:
language
pip install requests
Now that your scraper is running, the HTML page for OpenSea will be scraped and shown on the console.
Conclusion
Your web scraping success may be greatly increased by using a proxy with Pyppeteer, and you now know how to send requests using both static and dynamic proxies.
You also discovered that a different tool can complete the task more quickly and accurately. The web scraping tool from Scrapeless might be your ally if you need to scrape on a wide scale without worrying about infrastructure and have greater assurances that you will acquire the data you need.
At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.