How to Use Puppeteer to Bypass CAPTCHA

Ava Wilson

Expert in Web Scraping Technologies

26-Sep-2024

Automated web scraping and crawling are essential for large-scale data collection and analysis from websites. However, automated online access has become increasingly difficult due to anti-bot tools like CAPTCHA.

As a security precaution, a lot of websites frequently load CAPTCHAs or block screens. Your automated scraper will probably avoid loading a block screen or CAPTCHA on the target website if it can in some way look human to the website. As a result, your scraper may complete the scraping tasks and avoid the CAPTCHA and reCAPTCHA challenges.

But how can the websites make a scraper look human? Let's investigate.

Tutorial: Using Puppeteer to Get Around CAPTCHA

You must figure out how to stop CAPTCHA from loading in order to access material from the blocked websites. Puppeteer can assist us with this. It's a Node.JS package that offers an easy-to-use API for DevTools Protocol management of Chrome and Chromium. Instead of using Puppeteer's normal headless mode, you may set it to run in full Chrome/Chromium mode.

Why isn't a puppeteer enough on their own?

What happens if you use Puppeteer by itself to attempt automatic access to a website protected by a CAPTCHA? The target website notifies you of the automated access and displays a block screen or a CAPTCHA test.

Let's use these procedures to confirm it:

Node.JS has to be installed on your computer. Use the following npm command to install Puppeteer in a newly created Node.JS project:

language Copy

npm i puppeteer

Add the Puppeteer library to the Node.JS file you created.

language Copy

const puppeteer = require('puppeteer');

Use the following code to create a new page and a headless browser instance:

language Copy

(async () => {
  // Create a browser instance
  const browserObj = await puppeteer.launch();

  // Create a new page
  const newpage = await browserObj.newPage();

Since the desktop device is required for taking the snapshot, we can use the following code to adjust the viewport size:

language Copy

  // Set the width and height of viewport
  await newpage.setViewport({ width: 1920, height: 1080 });

The webpage's size is set via the setViewPort() function. You may adjust it to fit the specifications of your device.

Next, go to the URL of a website you believe to be CAPTCHA-protected, and snap a screenshot of it.

Puppeteer-stealth is used to get around CAPTACHA

Installing the Stealth addon with Puppeteer will allow you to increase its capabilities. With its array of capabilities, the Stealth plugin can address the majority of techniques used by secured websites to identify artificial access attempts.

Your Puppeteer's automated headless accesses can become so "human" through stealth that many websites won't be able to tell the difference. Therefore, for some websites, CAPTCHA cannot load due to stealth-based visits. Thus, you may allow your Puppeteer script to run automatically and access the data hidden behind the CAPTCHA.

Note: This tutorial's demonstration of all bypassing techniques is solely for educational purposes.

Are you tired with CAPTCHAs and continuous web scraping blocks?

Scrapeless: the best all-in-one online scraping solution available!

Utilize our formidable toolkit to unleash the full potential of your data extraction:

Best CAPTCHA Solver

Automated resolution of complex CAPTCHAs to ensure ongoing and smooth scraping.

Try it for free!

In summary

Web automation projects may be hampered by CAPTCHA problems; however, by using Puppeteer Stealth and Scrapeless' captcha solver, you may get around CAPTCHAs and streamline your automation procedure. If you're interested in different web scraping libraries, you should also read this blog article on how to use Playwright to get around CAPTCHAs. Always remember to stay inside the law and get legal advice before beginning any sort of scraping activity.

To get the most of Scrapeless' captcha solver, we advise you to sign up for a free trial and go through our thorough instructions.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

How to Use Puppeteer to Bypass CAPTCHA

Tutorial: Using Puppeteer to Get Around CAPTCHA

Why isn't a puppeteer enough on their own?

Puppeteer-stealth is used to get around CAPTACHA

In summary

Most Popular Articles

Scrapeless MCP Server Is Officially Live! Build Your Ultimate AI-Web Connector

Product Updates | New Profile Feature

How to Track Your Ranking on ChatGPT?