🎯 A customizable, anti-detection cloud browser powered by self-developed Chromium designed for web crawlers and AI Agents.πŸ‘‰Try Now
Back to Blog

How to Integrate Scrapeless MCP Server into ZeroClaw: A Step-by-Step Guide

Sophia Martinez
Sophia Martinez

Specialist in Anti-Bot Strategies

18-May-2026

Key Takeaways:

  • One TOML block wires the cloud browser into a local Rust agent. ZeroClaw is a single-binary AI agent runtime that talks to LLM providers, listens on 30+ channels, and acts through tools. Adding the Scrapeless MCP Server is a four-line [mcp] block in ~/.zeroclaw/config.toml β€” no SDK install, no daemon to manage, no agent-side code change.
  • Twenty MCP tools, two surfaces. The Scrapeless MCP Server exposes google_search, google_trends, the full browser_* cloud-browser primitive set, and scrape_html / scrape_markdown / scrape_screenshot. Stdio transport runs locally via npx -y scrapeless-mcp-server; streamable HTTP points at https://api.scrapeless.com/mcp.
  • MCP and Agent Skills are complementary, not alternatives. The MCP server gives ZeroClaw the tools; the Scrapeless OpenClaw skills β€” webunlocker-skill and llm-chat-scraper-skill β€” give it the how-to for the underlying Scrapeless APIs. ZeroClaw migrated from OpenClaw and reads the same SKILL.md format, so the skills drop into ~/.zeroclaw/workspace/skills/ and become callable through zeroclaw skills list.
  • Anti-detection cloud browser, residential proxies in 195+ countries. Scrapeless handles JavaScript rendering, residential-proxy egress, fingerprint randomization (UA, timezone, WebGL, canvas), and session persistence at the platform level, so the ZeroClaw agent focuses on the task instead of the evasion plumbing.
  • Discover β†’ extract works across any site. Use google_search to locate the page, scrape_markdown to pull clean text from a JS-rendered SPA, the browser_* tools for paginated or interactive flows, and google_trends for time-series context. The agent composes them; nothing in the protocol is target-specific.
  • Free to start. New Scrapeless accounts include free MCP runtime β€” sign up at app.scrapeless.com.

Introduction: from a local Rust agent to live web access

ZeroClaw is a Rust agent runtime that runs entirely on the operator's machine. One binary, one TOML config, the operator's keys, the operator's workspace. It speaks to ~20 LLM providers, reaches the world through Discord, Telegram, Matrix, email, voice, webhooks, and a CLI, and acts through shell, browser, HTTP, hardware, and MCP-server tools. The 31k-star repository ships a security model built around supervised autonomy, OS-level sandboxes (Landlock, Bubblewrap, Seatbelt, Docker), and cryptographic tool receipts on every action.

The fundamental limit of any local agent runtime is the same one every LLM hits: the model's knowledge is frozen at training cutoff. For research, monitoring, lead generation, competitive intelligence, and RAG against live publisher data, that limit shows up the moment the agent has to read a page that did not exist when the model was trained. ZeroClaw's built-in browser and HTTP tools cover benign pages and documentation lookups; commercial pages behind Cloudflare, Akamai, reCAPTCHA, or IP-reputation filtering are a different surface that those tools were not engineered for.

This post walks through wiring Scrapeless into ZeroClaw through both integration surfaces the runtime supports: the Scrapeless MCP Server (the canonical way to expose new tools to the agent) and the Scrapeless OpenClaw skills (canonical knowledge files the agent loads to drive those tools effectively). The two complement each other β€” the MCP server is what the agent calls; the skills are what tell it when and how to call the underlying Scrapeless APIs. For the same Scrapeless primitive surfaced through other clients, the MCP server tutorial walks through Claude Desktop / Cursor / Codex CLI, and the Hermes integration post covers the direct-CDP path for agents that already speak Chrome DevTools Protocol.


What Is ZeroClaw?

ZeroClaw is a single Rust binary that boots an agent runtime on the operator's own machine. The maintainers describe it as "you own the agent, you own the data, you own the machine it runs on." The runtime is structured around four moving pieces:

  • Channels (30+ adapters). Inbound messages from Discord, Telegram, Matrix, email, voice, webhooks, the CLI, and the ACP IDE bridge β€” all routed to the same agent loop.
  • Providers (~20 LLM backends). Anthropic, OpenAI, Ollama, any OpenAI-compatible endpoint. Fallback chains and routing keep the agent running when a provider flakes.
  • Tools (shell, browser, HTTP, hardware, MCP). The action surface. MCP servers register as first-class tools alongside the built-ins.
  • Security policy and SOP engine. Default autonomy is supervised: medium-risk operations require approval, high-risk are blocked. Standard Operating Procedures fire on MQTT, webhook, cron, or peripheral events with approval gates and resumable runs.

Configuration lives in one place: ~/.zeroclaw/config.toml. The workspace β€” skills, memory, logs, MCP state β€” lives under ~/.zeroclaw/workspace/. Operators migrating from OpenClaw can import the workspace directly; the skill format is the same.


Why Add Web Access to Your ZeroClaw Agent

LLMs powering ZeroClaw share the same constraint: training cutoff. In a fast-moving environment that produces three observable failure modes β€” outdated answers, hallucinated facts, and tool calls against URLs that have since rotated or 404'd.

ZeroClaw ships built-in http and browser tools, and they cover a broad surface. They are not optimized for the commercial web: JS-rendered SPAs, anti-bot interstitials, CAPTCHA challenges, and geo-restricted content sit between the agent and the data the operator actually wants. Wiring Scrapeless in turns those failure modes into normal tool calls:

  • Real-time research through google_search (Google, with localized gl + hl parameters) and google_trends (time-series interest data).
  • Cross-source validation by scrape_markdown against multiple result URLs in a single agent turn.
  • Live data collection from JS-heavy sites β€” pricing pages, marketplace listings, review pages, public directories β€” through the browser_* cloud-browser primitives.
  • Geo-bound queries by allocating sessions in a specific country, so the agent sees what a local user would see.

How to Extend ZeroClaw With Scrapeless: Two Surfaces

Scrapeless supports ZeroClaw through two surfaces, used together:

  • Scrapeless MCP Server β€” the official server exposing 20 cloud-browser, SERP, and scraping tools over the Model Context Protocol.
  • Scrapeless OpenClaw skills β€” SKILL.md-formatted knowledge files that teach the agent how to drive the Scrapeless Universal Scraping API and the LLM Chat Scraper effectively. ZeroClaw imports OpenClaw skills directly.

The MCP server is what the agent invokes. The skills are what the agent reads to decide when and how to invoke. They are not alternatives β€” installed together, the agent has both the tools and the playbook.

Scrapeless MCP Server

The MCP server ships 20 tools out of the box. The core set:

Tool What it does
google_search SERP retrieval with gl / hl localization parameters.
google_trends Trending search and time-series interest data.
scrape_markdown Render a URL through the cloud browser, return Markdown.
scrape_html Same, returning full rendered HTML.
scrape_screenshot Capture a high-quality screenshot of any page.
browser_create Allocate (or reuse) a cloud browser session.
browser_goto Navigate the session to a URL.
browser_click / browser_type / browser_press_key Drive interactive page elements.
browser_scroll / browser_scroll_to Trigger lazy-loaded content.
browser_get_html / browser_get_text Extract from the current cloud-browser page.
browser_screenshot / browser_snapshot Capture state for review or downstream processing.
browser_wait_for / browser_wait Wait for selectors or fixed durations.
browser_close Release the session.

Two transports are supported. Stdio (npx -y scrapeless-mcp-server) is the right default for a workstation running ZeroClaw locally; streamable HTTP (https://api.scrapeless.com/mcp) is the right default when the agent runs on a remote host and the operator wants the MCP server hosted by Scrapeless rather than spawned per-invocation.

Scrapeless OpenClaw Skills

The skills are SKILL.md files with a small Python runtime that wraps a specific Scrapeless API. Both ship on the official Scrapeless GitHub org:

Skill What it teaches the agent
webunlocker-skill Drive the Scrapeless Universal Scraping API β€” fetch HTML / Plaintext / Markdown / screenshots / structured content with automatic CAPTCHA solving (reCAPTCHA, Cloudflare Turnstile, Cloudflare Challenge), JS rendering, residential-proxy egress with --country, retry, and POST + custom-header support.
llm-chat-scraper-skill Collect structured chat responses from ChatGPT, Gemini, Perplexity, and Grok β€” useful for AI-search monitoring and GEO measurement workflows.

ZeroClaw inherits the OpenClaw skill format. Skills get cloned into ~/.zeroclaw/workspace/skills/, are listed by zeroclaw skills list, and become available to the agent on the next zeroclaw agent session.


What You Can Do With It

  • Daily monitoring agent. Schedule a ZeroClaw SOP that runs each morning: google_search for tracked keywords, scrape_markdown the top three results, summarize, deliver via the Discord channel adapter.
  • AI-search visibility tracking. With the LLM Chat Scraper skill, pull the responses ChatGPT, Gemini, Perplexity, and Grok produce for brand-relevant prompts on a cadence; track presence and sentiment over time.
  • Lead generation from public directories. Drive the cloud browser through a paginated public directory, dedupe by domain, hand the records to the agent's memory store.
  • Authenticated form-fill with human in the loop. Drive a vendor onboarding or job-application form to the final review screen, take a full-page screenshot, stop before submit so a human can approve.
  • Geo-bound competitor pricing. Allocate the session in a specific country, render the localized pricing page, diff against the previous snapshot, ping a channel when a threshold trips.
  • RAG against live publisher data. Render publisher pages to clean text through scrape_markdown, embed into ZeroClaw's SQLite + embeddings memory, retrieve for future turns.
  • Bypass Cloudflare for benign research targets. The Web Unlocker skill handles Turnstile and Challenge pages automatically; the agent only sees a clean Markdown payload.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this post is for demonstration purposes only.


Why Scrapeless

Scrapeless is an anti-detection cloud browser plus a Universal Scraping API plus a SERP API plus an LLM Chat Scraper, all behind one API key. For ZeroClaw specifically, it brings:

  • A native MCP server β€” no SDK install, no adapter code. The MCP block in ~/.zeroclaw/config.toml is the entire integration.
  • Cloud-side JavaScript rendering so SPAs, infinite-scroll feeds, and lazy-loaded panels are first-class targets for the browser_* tools and scrape_markdown.
  • Residential proxies in 195+ countries so geo-bound queries return the listings a local user would see.
  • Anti-detection fingerprinting on every session β€” UA, timezone, language, screen resolution, WebGL, canvas randomized per session.
  • Automatic CAPTCHA solving for reCAPTCHA, Cloudflare Turnstile, and Cloudflare Challenge through the Web Unlocker surface.
  • A single management surface β€” one API key, one dashboard, free runtime credits on the new-account plan.

Get the API key on the free plan at app.scrapeless.com. The full MCP tool surface is documented at github.com/scrapeless-ai/scrapeless-mcp-server; the API surface at docs.scrapeless.com.


Prerequisites

  • A UNIX-like host. Linux, macOS, or WSL2 on Windows. ZeroClaw publishes Windows builds, but the install script and skill scripts assume a POSIX shell β€” the smoothest path is Linux / macOS / WSL2.
  • Node.js 18 or newer for the MCP stdio transport (npx -y scrapeless-mcp-server).
  • Python 3.10 or newer for the OpenClaw skills (they ship as Python scripts in scripts/).
  • Rust toolchain if installing from source; the prebuilt binary path needs nothing extra.
  • A Scrapeless account and API key β€” sign up at app.scrapeless.com and copy the key from Settings β†’ API Key Management.
  • An LLM provider key β€” Anthropic, OpenAI, Ollama, or any OpenAI-compatible endpoint. ZeroClaw's onboarding wizard wires it in.
  • git for cloning the skills repos.
  • jq is optional β€” handy when piping CLI output, not required for the MCP path.

Install ZeroClaw

The full setup is two sub-steps.

1. Run the installer

bash Copy
curl -fsSL https://raw.githubusercontent.com/zeroclaw-labs/zeroclaw/master/install.sh | bash

The installer asks whether to fetch a prebuilt binary (~seconds) or build from source (slower, customizable). Both end the same way β€” zeroclaw onboard kicks off automatically. To skip the wizard at the end, pass --skip-onboard and run zeroclaw onboard later.

Verify the binary is on the path:

bash Copy
zeroclaw --version

The output should look like zeroclaw 0.7.5 or newer.

2. Complete the onboarding wizard

bash Copy
zeroclaw onboard

The wizard walks through provider selection, channel wiring, autonomy mode, and personalization. For this integration, two settings matter:

  • Provider β€” pick whichever LLM provider is already configured (OpenAI, Anthropic, Ollama, an OpenAI-compatible gateway). Paste the API key when prompted.
  • Autonomy β€” supervised is the safe default; the agent will prompt before invoking medium-risk tools. The MCP tools count as medium-risk by default. For a development box where prompting is friction, the wizard also exposes yolo mode, which the operator should turn on only on a trusted machine.

Confirm the runtime is up by starting a chat:

bash Copy
zeroclaw agent

A "Hey!" should return a normal completion. If it does, the runtime is healthy and the next step is wiring in the MCP server.


Connect ZeroClaw to the Scrapeless MCP Server

1. Smoke-test the MCP server outside ZeroClaw

Before adding the MCP block to config.toml, confirm the server starts standalone. ZeroClaw lazy-loads MCP servers on agent start, so a broken config surfaces only the first time the agent runs β€” better to catch it now:

bash Copy
SCRAPELESS_KEY="<YOUR_SCRAPELESS_KEY>" npx -y scrapeless-mcp-server

On the first run, npx downloads scrapeless-mcp-server from the registry and the server starts over stdio. The process stays attached; press Ctrl-C to release it. If it printed a startup banner and is waiting for MCP requests, the credentials and the package both work.

Get your API key on the free plan: app.scrapeless.com

2. Add the MCP block to ~/.zeroclaw/config.toml

ZeroClaw reads MCP server configuration from a [mcp] block in the global config. Add (or merge) the following:

toml Copy
# ~/.zeroclaw/config.toml

[mcp]
enabled = true
deferred_loading = true
servers = [
  { name = "scrapeless", command = "npx", transport = "stdio", args = ["-y", "scrapeless-mcp-server"], env = { SCRAPELESS_KEY = "<YOUR_SCRAPELESS_KEY>" }, headers = {} }
]

Notes:

  • enabled = true activates the MCP subsystem. Recent ZeroClaw builds default it off.

  • deferred_loading = true keeps the daemon startup fast; ZeroClaw spawns npx only when the agent actually starts a session.

  • env.SCRAPELESS_KEY is the auth surface β€” the same key the smoke test in step 1 used.

  • For the hosted streamable-HTTP transport instead of stdio, swap the entry for:

    toml Copy
    { name = "scrapeless", transport = "http", url = "https://api.scrapeless.com/mcp", headers = { "x-api-token" = "<YOUR_SCRAPELESS_KEY>" } }

    ZeroClaw's MCP client stack supports three transport values β€” stdio, http, and sse β€” with validation enforcing command / args for stdio and url / headers for remote transports (per ZeroClaw issue #1380). The HTTP transport is the right default when ZeroClaw runs on a remote host (a VPS or a container) and the operator does not want npx running there.

3. Verify the connection from inside ZeroClaw

Restart the agent session so it picks up the new config and lazy-loads the MCP server:

bash Copy
zeroclaw agent

In a fresh chat, ask:

Copy
Which Scrapeless MCP tools do you have access to?

The agent should enumerate the 20 tools listed earlier β€” google_search, google_trends, the browser_* set, scrape_html, scrape_markdown, scrape_screenshot. If the answer says zero tools, the most common cause is enabled = false in [mcp]; the second most common is a typo in SCRAPELESS_KEY.


Install the Scrapeless OpenClaw Skills

The MCP server is the tools. The skills are the playbook. Both Scrapeless skills work with ZeroClaw because the runtime supports the OpenClaw skill format directly.

1. Allow skill scripts in ~/.zeroclaw/config.toml

Both Scrapeless skills ship scripts/ directories that the agent executes. Set allow_scripts = true in the [skills] section:

toml Copy
# ~/.zeroclaw/config.toml

[skills]
allow_scripts = true

allow_scripts is off by default for safety. Turning it on grants ZeroClaw permission to run skill-bundled scripts under the autonomy policy already in force; medium-risk script invocations still prompt for approval under supervised mode.

2. Clone the skill repositories

bash Copy
mkdir -p ~/.zeroclaw/workspace/skills
git clone https://github.com/scrapeless-ai/webunlocker-skill ~/.zeroclaw/workspace/skills/webunlocker-skill
git clone https://github.com/scrapeless-ai/llm-chat-scraper-skill ~/.zeroclaw/workspace/skills/llm-chat-scraper-skill

3. Install the Python dependencies and the API token

The Web Unlocker skill ships a requirements.txt:

bash Copy
cd ~/.zeroclaw/workspace/skills/webunlocker-skill
pip install -r requirements.txt
cp .env.example .env
# Then edit .env and set X_API_TOKEN=<YOUR_SCRAPELESS_KEY>

Repeat for the LLM Chat Scraper skill if it is in scope for the agent.

4. Verify the skills are visible to ZeroClaw

bash Copy
zeroclaw skills list

The output should include webunlocker-skill and llm-chat-scraper-skill. If they are missing, the most common cause is that the clone landed under ~/.zeroclaw/skills/ instead of ~/.zeroclaw/workspace/skills/ β€” the latter is the path the runtime watches.


ZeroClaw + Scrapeless in Action

A realistic worked example: a daily competitive-intelligence brief on a topic the operator tracks. The agent locates fresh sources, extracts the content, and produces a structured summary, delivered to whichever channel the agent is bound to.

In zeroclaw agent, paste:

Copy
Build me a competitive-intelligence brief on "AI agent frameworks" for the last 7 days.

1. Use the Scrapeless MCP `google_search` tool to find the 5 most relevant news / blog
   posts published this week. Use gl=us, hl=en.
2. For each result URL, use `scrape_markdown` to pull the article body. Discard
   navigation chrome and ads.
3. Use `google_trends` to fetch the 7-day interest curve for the query
   "AI agent frameworks" so I have the demand signal alongside the supply signal.
4. Produce a structured Markdown report with:
   - Top 3 themes across the 5 articles, each with a one-sentence summary and the
     source URL.
   - The 7-day trend direction (up / flat / down) and the peak day.
   - A "what changed this week" callout β€” anything new vs. last week's brief.

If a target page blocks the cloud browser, fall back to `browser_create` +
`browser_goto` + `browser_get_text` for that URL only. Don't substitute synthetic
content; if a source can't be retrieved, list it under "unretrieved sources".

The agent's plan, in plain English:

  1. Call google_search(q="AI agent frameworks", gl="us", hl="en") and pick the five freshest results that look like primary sources (skip aggregator pages).
  2. Iterate the URLs through scrape_markdown and keep the cleaned body text in working memory.
  3. Call google_trends(q="AI agent frameworks", date="now 7-d") for the interest curve.
  4. Summarize into a Markdown brief.
  5. For any URL that returns an anti-bot interstitial through scrape_markdown, retry through the browser_create β†’ browser_goto β†’ browser_get_text chain, which warms a cloud browser session and waits for hydration before extracting.

Before each tool call, ZeroClaw's supervised autonomy mode prompts for approval β€” Y for one-shot approval, A to remember the permission for future tool calls in the same session.

To send the prompt without entering the interactive chat:

bash Copy
zeroclaw agent --message "Build me a competitive-intelligence brief on AI agent frameworks for the last 7 days..."

To turn this into a scheduled run instead of an ad-hoc prompt, register an SOP on a cron schedule and bind it to whichever channel adapter the agent should deliver the brief through (Discord, Telegram, email). The MCP tools and the skill stay the same; only the trigger changes.


What You Get Back

The brief comes back as a Markdown payload along the lines of the following β€” captured from an actual run of the prompt above against five live SERP results for "AI agent frameworks 2026":

markdown Copy
# AI Agent Frameworks β€” Weekly Brief (week of 12-May-2026)

## Themes (last 7 days)
1. **LangGraph is the consensus production standard.** All three deep
   comparisons published this week (Towards AI, GuruSup, Alice Labs) rank
   LangGraph #1 for production workloads. The cited reasons converge:
   deterministic graph execution, native human-in-the-loop checkpoints,
   and first-class observability through LangSmith.
   Source: https://pub.towardsai.net/top-ai-agent-frameworks-in-2026-a-production-ready-comparison-7ba5e39ad56d
2. **MCP is emerging as the cross-framework tool-integration standard.**
   Anthropic's Model Context Protocol β€” now governed by the Linux Foundation
   with OpenAI, Google, Microsoft, AWS, and Salesforce on the supporter list β€”
   is referenced as the agent-to-tool standard in two of the three comparisons.
   Source: https://gurusup.com/blog/best-multi-agent-frameworks-2026
3. **The AutoGen / AG2 split is the major 2025–2026 development.** Microsoft
   rewrote AutoGen as v0.4+ with a new API; the community continued the v0.2
   lineage as AG2 (ag2.ai). Both Alice Labs and GuruSup flag this as a "pick
   deliberately" moment for teams evaluating multi-agent debate frameworks.
   Source: https://alicelabs.ai/en/insights/best-ai-agent-frameworks-2026

## Demand signal
- 7-day trend: unavailable (google_trends returned a transient upstream error
  on this run β€” retry on next schedule)

## What changed this week
- Alice Labs added Claude Agent SDK as a new entrant at #2, displacing CrewAI
  to #3 β€” first ranking we've seen elevate Anthropic's official SDK above
  the multi-agent generalists.
- AutoGen / AG2 fork status referenced in 2 of 3 articles, up from 0 last week.

## Unretrieved sources
- (none β€” alicelabs.ai SPA required the browser_* fallback path; recovered)

The structure follows the prompt; the values are what the verified tool chain actually returned on the day the brief ran. A few honest observations grounded in the live run:

  • scrape_markdown cleans most publisher pages well. Towards AI and GuruSup returned clean Markdown bodies on the first attempt. Heavily JS-rendered SPAs (alicelabs.ai is a Webflow / Vite SPA in this run) returned the rendered HTML shell instead β€” the agent recovered through the browser_create β†’ browser_goto β†’ browser_get_text chain, which returned a fully structured page snapshot including the ranked list, key takeaways, FAQ, and the May-2026 update timestamp.
  • google_trends is interest, not volume β€” and is sometimes transient. On the verification run the upstream Trends call returned a load failed error; the prompt handles this by reporting the gap rather than substituting synthetic data. The right retry posture is the next scheduled run, not a hot retry inside the same agent turn.
  • Per-source freshness varies. Some publishers backfill timestamps when they update articles; if "freshness" matters absolutely, cross-check the published date in the article body, not the SERP snippet. (The Alice Labs page in this run shows both an April-2026 publish date and a May-2026 update date in the body.)
  • Anti-bot interstitials and SPA shells are normal, not exceptions. Budget for the browser_* fallback in any prompt that touches commercial sites at scale; the verification run hit one in three URLs and the recovery was uneventful.

Conclusion: an agent that reads the live web

The ZeroClaw + Scrapeless integration reduces to four moves the operator runs once: install ZeroClaw, register the Scrapeless MCP server in ~/.zeroclaw/config.toml, drop the OpenClaw skills into ~/.zeroclaw/workspace/skills/, and verify with zeroclaw skills list and a tool-listing prompt in zeroclaw agent. After that, every agent turn that touches the web β€” research, monitoring, lead generation, RAG ingestion, AI-search visibility tracking β€” goes through the cloud browser, the residential proxies, and the SERP API behind one API key.

For the same Scrapeless primitive in other clients, the MCP server tutorial covers Claude Desktop / Cursor / Codex CLI, the Hermes integration post covers direct-CDP, and the LangChain integration post covers Python agents. The pattern across all of them is the same: pin a residential region, keep the session warm across multi-step flows, treat anti-bot interstitials as a retry case rather than an exception, and let the agent compose google_search β†’ scrape_markdown β†’ browser_* into whatever the prompt actually asks for.


Ready to Build Your AI-Powered Data Pipeline?

Join our community to claim a free plan and connect with developers building local-agent pipelines on top of Scrapeless: Discord Β· Telegram.

Sign up at app.scrapeless.com for free MCP runtime and adapt the patterns above to whichever workflows the ZeroClaw agent already runs.


FAQ

Q1. Does the Scrapeless MCP server work on Windows, or only Linux / macOS?
The MCP server is a Node.js package β€” it runs anywhere Node 18+ runs, including Windows. ZeroClaw's installer assumes a POSIX shell, so the smoothest path on Windows is WSL2. The HTTP-transport variant (pointing ZeroClaw at https://api.scrapeless.com/mcp) removes the local npx dependency entirely and is the easiest fit for hosted ZeroClaw deployments.

Q2. Stdio or streamable HTTP β€” which transport is the right default?
For a workstation running ZeroClaw locally, stdio. The lifecycle is simple: ZeroClaw spawns npx -y scrapeless-mcp-server on agent start, kills it on agent stop. For ZeroClaw on a VPS or in a container, HTTP. The Scrapeless-hosted endpoint removes the need to package npx and Node into the runtime image.

Q3. Is scraping public web data legal?
Generally yes, when the data is publicly visible and the workflow respects each site's terms of service and applicable jurisdictions. The legal posture varies by country, by site, and by use case (research, commercial resale, training data). Review the target site's ToS before scaling a workflow against it, and consult counsel for high-volume or regulated use cases.

Q4. Do the MCP server and the OpenClaw skills overlap?
They are complementary. The MCP server gives the agent tools β€” concrete, callable surfaces (google_search, scrape_markdown, browser_*). The skills give the agent knowledge β€” how the Scrapeless Universal Scraping API behaves, when to fall back to JS rendering, which response type to request, how to chain CAPTCHA solving with country selection. Installed together, the agent has both.

Q5. What happens when a target page returns an anti-bot interstitial?
For scrape_markdown against most pages, the cloud browser solves the challenge transparently. For pages that still return an interstitial, the standard fallback is browser_create β†’ browser_goto β†’ browser_wait_for (a known post-challenge selector) β†’ browser_get_text. Budget for this fallback in any prompt that touches commercial sites; the prompt example above shows the shape.

Q6. How does ZeroClaw's autonomy mode interact with MCP tool calls?
Under supervised (the default), the agent prompts before invoking each MCP tool the first time. The operator can grant one-shot approval (Y) or remember-this-tool approval (A). Under yolo, the agent invokes tools without prompting; that mode is appropriate only on a trusted dev box.

Q7. Can the agent compose Scrapeless calls into multi-step flows in a single turn?
Yes β€” that is the design point. A single agent turn typically chains google_search (locate), scrape_markdown (extract from the canonical URL), and browser_* (fall back for interactive or anti-bot-protected pages). ZeroClaw streams the intermediate tool calls into the same conversation context.

Q8. Where does the Scrapeless API key live?
For the MCP path, in env.SCRAPELESS_KEY inside ~/.zeroclaw/config.toml (or in the streamable-HTTP x-api-token header). For the skill path, in the .env file inside each skill directory as X_API_TOKEN. The two paths are independent; rotating the key means updating both locations.

Q9. Can a ZeroClaw SOP fire the same prompt on a schedule?
Yes. Register an SOP with a cron trigger that runs the same prompt the operator would paste into zeroclaw agent --message "...". Bind the SOP to a channel adapter (Discord, Telegram, email) and the brief is delivered automatically. SOPs in supervised mode still gate medium-risk tool calls behind approval; for unattended scheduled runs, the SOP needs to be configured under a more permissive autonomy mode or with pre-granted tool permissions.

Q10. What about Scrapeless's other products β€” Scraping Browser, Universal Scraping API, SERP API?
The MCP server bundles the most common cloud-browser, SERP, and scrape primitives into one MCP surface. For workflows that need the full Scraping Browser primitive set directly (CDP, custom fingerprints, session persistence at session_ttl granularity), wire the Scraping Browser CDP endpoint into ZeroClaw's built-in browser tool instead. The two approaches compose; they do not conflict.


At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Most Popular Articles

Catalogue