🎯 A customizable, anti-detection cloud browser powered by self-developed Chromium designed for web crawlers and AI Agents.👉Try Now
Back to Blog

How to Set a Crawlee Proxy in 2026

Isabella Garcia
Isabella Garcia

Web Data Collection Specialist

17-Dec-2025
Take a Quick Look

Configure premium residential proxies in Crawlee to avoid IP blocks and scale your web scraping operations reliably across any target website.

Key Takeaways

  • Crawlee is a modern web scraping framework built on Puppeteer/Playwright for JavaScript-heavy sites
  • Free proxies are unreliable and actively blocked by anti-scraping systems
  • Premium residential proxies provide legitimate ISP-assigned IPs that defeat most blocking mechanisms
  • Proxy authentication requires username and password credentials embedded in connection strings
  • Proper proxy configuration enables large-scale scraping without IP bans or request timeouts

Understanding Crawlee

Crawlee is a web scraping framework that simplifies crawling and scraping workflows. Built on headless browser technologies like Puppeteer (Node.js) and Playwright (Python), Crawlee handles browser automation, session management, and result storage. Unlike simpler HTTP libraries, Crawlee executes JavaScript, manages cookies, and interacts with dynamic content—crucial for modern websites that render content client-side.

However, websites detect and block Crawlee's default behavior through various mechanisms. The standard user agent identifies Crawlee scripts to anti-bot systems. Requests from datacenter IP addresses raise suspicion. Rate-limiting triggers when Crawlee makes rapid successive requests. Proxies solve these problems by distributing requests across legitimate residential IPs and hiding the true request origin.

Limitations of Free Proxies

Free proxies listed in public databases seem attractive for cost-conscious developers. However, they introduce significant disadvantages:

Unreliable availability: Free proxies frequently disappear or become inaccessible, breaking scrapers mid-operation
Slow performance: Free proxies route traffic through multiple intermediate servers, introducing latency that slows data collection
High block rates: Websites maintain blocklists of known free proxy IPs, making them ineffective for serious scraping
Security concerns: Free proxy operators cannot guarantee legitimate operations—some intercept traffic or inject malware
No support: Free proxy services provide zero customer support when problems arise

Budget-friendly premium proxies like Scrapeless Residential Proxies starting at $0.40/GB dramatically outperform free alternatives despite minimal cost differences.

Premium Proxy Benefits

Premium residential proxies provide legitimate advantages for Crawlee operations:

Real residential IPs: Proxies use IP addresses assigned by ISPs to actual home internet users, making them indistinguishable from genuine traffic
IP rotation: Smart allocation algorithms automatically cycle through diverse addresses, preventing per-IP accumulation of suspicious patterns
Geographic targeting: Select proxy locations matching your target website's geographic expectations
High uptime: Professional providers guarantee 99.9%+ availability with SLA protections
Smart routing: Automatic detection and avoidance of slow or blocked connections

These capabilities transform Crawlee from a tool requiring extensive manual management into a production-grade scraping platform.

Basic Crawlee Proxy Configuration

Crawlee supports proxies through configuration objects passed to crawler instances. The basic structure requires proxy URL with authentication:

javascript Copy
import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
    proxyUrls: [
        'http://username:password@proxy.example.com:8080'
    ]
});

await crawler.addRequests([
    { url: 'https://example.com/page1' },
    { url: 'https://example.com/page2' }
]);

await crawler.run();

The proxy URL format follows the standard pattern: protocol://[username:password@]host[:port]

Setting Up Scrapeless Residential Proxies

Scrapeless Residential Proxies integrate seamlessly with Crawlee through straightforward configuration. Access your account dashboard to obtain auto-generated proxy credentials:

Step 1: Access Proxy Generator

Log into your Scrapeless account and navigate to the Proxy Generator dashboard. Your auto-generated residential proxy credentials appear at the top of the page.

Step 2: Configure Credentials

Set your username and password through the credentials management interface. Scrapeless supports multiple credential sets for different applications.

Step 3: Format Proxy URL

Combine your credentials and proxy endpoint into a valid proxy URL:

Copy
http://username:password@superproxy.scrapeless.com:1337

Scrapeless provides separate endpoints for HTTP (port 1337) and HTTPS (port 1338) traffic.

Step 4: Integrate with Crawlee

Apply the proxy URL to your Crawlee configuration:

javascript Copy
import { PuppeteerCrawler } from 'crawlee';

const proxyUrl = 'http://username:password@superproxy.scrapeless.com:1337';

const crawler = new PuppeteerCrawler({
    proxyUrls: [proxyUrl],
    useSessionPool: true
});

await crawler.addRequests([
    { url: 'https://target-website.com' }
]);

await crawler.run();

Advanced Proxy Configuration

Multiple Proxy URLs: Crawlee accepts arrays of proxy URLs, automatically distributing requests across multiple proxies:

javascript Copy
const crawler = new PuppeteerCrawler({
    proxyUrls: [
        'http://user1:pass1@proxy1.scrapeless.com:1337',
        'http://user2:pass2@proxy2.scrapeless.com:1337',
        'http://user3:pass3@proxy3.scrapeless.com:1337'
    ]
});

Dynamic Proxy Selection: For complex scraping operations, Scrapeless provides intelligent proxy selection that optimizes IP allocation based on target website characteristics.

Geographic Targeting: Specify proxy geolocation through URL parameters:

javascript Copy
const proxyUrl = 'http://username:password@superproxy.scrapeless.com:1337?country=US&state=NY';

This parameter forces all requests through proxies in New York, ensuring locale-appropriate responses.

Handling Authentication and Sessions

Some websites require login credentials. Crawlee handles authentication through session management. When combined with proxy rotation, sessions maintain login state across requests from different IPs:

javascript Copy
import { PuppeteerCrawler } from 'crawlee';

const crawler = new PuppeteerCrawler({
    proxyUrls: ['http://user:pass@superproxy.scrapeless.com:1337'],
    useSessionPool: true,
    sessionPoolOptions: {
        maxPoolSize: 50
    }
});

crawler.addPostResponseHandler(async ({ page, session }) => {
    // Each session maintains its own cookies and authentication state
    if (session.isValid) {
        // Process authenticated page
    }
});

Crawlee's session pool isolates cookies and state per session, ensuring that rotating IPs doesn't disrupt authentication.

Avoiding Common Proxy Problems

Proxy Timeouts: If requests frequently timeout, increase timeout values:

javascript Copy
const crawler = new PuppeteerCrawler({
    navigationTimeoutSecs: 30,
    proxyUrls: [proxyUrl]
});

Connection Refused: Verify credentials match your proxy provider's requirements. Typos or format errors cause immediate connection failures.

Rate Limiting Despite Proxies: Even with proxy rotation, excessive request rates trigger blocking. Implement request delays:

javascript Copy
const crawler = new PuppeteerCrawler({
    proxyUrls: [proxyUrl],
    handlePageTimeoutSecs: 60,
    preNavigationHooks: [
        async ({ request }) => {
            await page.waitForTimeout(Math.random() * 3000)
        }
    ]
});

Blocked Proxies: If individual Scrapeless proxies get blocked, the service automatically rotates to different addresses. Contact support if blocks persist.

Comprehensive Solution: Scrapeless Browser

For maximum reliability, Scrapeless Browser provides drop-in replacement for Puppeteer with built-in proxy rotation, JavaScript rendering, and anti-bot bypass:

The browser handles proxy configuration automatically, eliminating manual setup while delivering superior success rates against protected websites.

Testing Your Configuration

Verify proxy setup by checking returned IP addresses:

javascript Copy
const { PuppeteerCrawler } = require('crawlee');

const crawler = new PuppeteerCrawler({
    proxyUrls: ['http://user:pass@superproxy.scrapeless.com:1337']
});

crawler.addPostResponseHandler(async ({ page }) => {
    const ipInfo = await page.evaluate(() => {
        return fetch('https://httpbin.io/ip').then(r => r.json());
    });
    console.log('Request IP:', ipInfo.origin);
});

If the returned IP differs from your computer's IP, the proxy works correctly. If it matches, requests bypass the proxy—check credentials and connection details.

Performance Optimization

Properly configured proxies enable high-performance scraping:

  • Concurrency: Run 50+ parallel requests when using proxy rotation
  • Speed: Requests average 1-2 seconds with premium proxies versus 5-10 seconds with free proxies
  • Reliability: 99%+ success rates versus 50-70% for free or manual proxy management

These improvements translate directly to faster data collection and lower operational costs despite proxy expenses.


FAQ

Q: Do I need different proxy credentials for each Crawlee instance?

A: No. Single proxy credentials work across unlimited Crawlee instances. However, running multiple large-scale scrapers simultaneously may benefit from separate credentials enabling independent rate-limit management.

Q: Can I mix Scrapeless proxies with other proxy providers?

A: Yes. Crawlee accepts arrays of diverse proxy URLs, automatically distributing requests. However, managing multiple providers increases complexity. Single-provider solutions usually prove more reliable.

Q: What should I do if a proxy gets permanently blocked?

A: Premium providers like Scrapeless automatically rotate away from blocked IPs. If issues persist, contact support—they often whitelist specific domains or adjust routing to resolve blocks.

Q: How many concurrent requests can Scrapeless proxies handle?

A: Scrapeless infrastructure supports thousands of concurrent requests. Limit concurrency based on your target website's tolerance rather than proxy capacity. Test gradually from 10 concurrent up to 100+.

Q: Is proxy rotation in Crawlee automatic or manual?

A: Crawlee handles rotation automatically when provided with multiple proxy URLs. The framework distributes requests across proxies without developer intervention, simplifying large-scale operations.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Most Popular Articles

Catalogue