🎯 A customizable, anti-detection cloud browser powered by self-developed Chromium designed for web crawlers and AI Agents.πŸ‘‰Try Now
Back to Blog

How to Connect Pi Agent to the Web: Scrapeless MCP Server Integration Guide

Alex Johnson
Alex Johnson

Senior Web Scraping Engineer

18-May-2026

Key Takeaways:

  • Pi Agent is a deliberately minimal terminal-based coding agent. It ships with four tools (read, write, edit, bash) and adds everything else through opt-in extensions, which keeps the context window small but means web access is not built in.
  • pi-mcp-adapter bridges Pi to any MCP server through a single ~200-token proxy tool. Servers are lazy by default β€” they only start when the agent calls one of their tools β€” so connecting several MCP servers does not blow the context budget.
  • The Scrapeless MCP Server exposes 21 tools over stdio and over a streamable HTTP endpoint at https://api.scrapeless.com/mcp. Coverage spans google_search, google_trends, the full 16-tool browser_* surface (a cloud-hosted anti-detection browser with residential proxies in 195+ countries), and three stateless scraping tools (scrape_html, scrape_markdown, scrape_screenshot).
  • One .mcp.json file wires Pi to Scrapeless. The adapter reads the standard MCP config format that Claude Desktop, Cursor, and other MCP clients already use, so the same JSON snippet drops into any of them.
  • The end-to-end pattern: prompt β†’ Pi β†’ pi-mcp-adapter β†’ scrapeless-mcp-server β†’ cloud browser β†’ ranked results or extracted Markdown β†’ code generated against live data. Pi stops generating from training-only knowledge and starts grounding output in the page it just scraped.
  • Free to start. New Scrapeless accounts include free Scraping Browser runtime β€” sign up at app.scrapeless.com.

Introduction: a minimal coding agent with live web tools

Most terminal coding agents ship with dozens of features that you may never use. Pi Agent takes the opposite stance: four tools, full transparency, everything else added on demand. That keeps the agent fast and the context window cheap, but it also means Pi cannot fetch the latest documentation for a library, read a release page that was published last week, or pull live data from a public web page on its own.

The fix is to connect Pi to a Model Context Protocol (MCP) server that exposes web tools. The scrapeless-mcp-server package does exactly that β€” backed by the Scrapeless Scraping Browser, a cloud-hosted anti-detection browser that egresses through residential proxies in 195+ countries.

This post walks through wiring the two together with pi-mcp-adapter (the community MCP extension for Pi) and a single .mcp.json file. The endpoint Pi connects to is the same one Claude Desktop, Cursor, and other MCP clients use; the same JSON snippet works across all of them.


What You Can Do With It

  • Ground code generation in live documentation. Have Pi fetch a library's current README before generating an example β€” no more outdated APIs from training-cut model knowledge.
  • Search and scrape in one turn. Pi calls google_search to rank candidate pages, then scrape_markdown to pull the most relevant one as clean Markdown.
  • Drive an anti-detection cloud browser from a terminal prompt. Tools like browser_goto, browser_click, browser_type, and browser_get_html give Pi full control over a real cloud Chromium with session persistence.
  • Pull region-specific data through residential proxies. Scope a search to gl=us or gl=de directly from the agent's tool call.
  • Stay inside one terminal session. No browser tab switching, no copy-paste of curl output, no separate scraping CLI to babysit.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this post is for demonstration purposes only.


What is Pi Agent?

Pi Agent is a terminal-based coding agent built by Mario Zechner (the creator of the libGDX game framework). Out of the box it gives the model four tools:

  • read β€” read files and directories
  • write β€” create and edit files
  • edit β€” make targeted edits to existing files
  • bash β€” execute shell commands

Everything else is opt-in. You extend Pi through TypeScript extensions, skills, and prompt templates bundled as packages and installable via npm or git. The agent runs in four modes β€” interactive (conversational), print/JSON (scripted), RPC (stdin/stdout integration), and SDK (embedded). Provider support is broad: Anthropic, OpenAI, Google, Mistral, Groq, and more.

Install it globally with npm:

bash Copy
npm install -g @mariozechner/pi-coding-agent

The trade-off is explicit: a small starting surface that you grow to fit your workflow, instead of a one-size-fits-all toolbox.


Why Pi Needs Web Access

Pi's reasoning is bound by what the underlying model was trained on. That means it cannot:

  • Look up a library's latest API changes after the training cutoff
  • Fetch current documentation pages, changelogs, or release notes
  • Read a public page to pull a configuration table, a price, or a schema
  • Verify that an example in its training data still compiles against today's package version

For fast-moving ecosystems β€” anything web, frontend, AI tooling, infrastructure β€” this matters. Connect Pi to a web-capable MCP server and you get a coding agent that searches for current information and scrapes the exact pages it needs before generating code. The output stops being a best guess from stale memory and starts being grounded in the page that just rendered.


What is the Scrapeless MCP Server?

The Scrapeless MCP Server is a Model Context Protocol server that exposes the Scrapeless cloud browser, search, and scrape APIs as MCP tools. Any MCP-compatible client β€” Pi, Claude Desktop, Cursor, Codex CLI, Gemini CLI, Windsurf, VS Code Copilot Chat β€” can call them directly from a conversation.

At publication, the server exposes 21 tools across three categories:

  • Search and trends β€” google_search, google_trends
  • Browser automation (16 tools) β€” browser_create, browser_close, browser_goto, browser_go_back, browser_go_forward, browser_click, browser_type, browser_press_key, browser_wait, browser_wait_for, browser_screenshot, browser_snapshot, browser_get_html, browser_get_text, browser_scroll, browser_scroll_to
  • Stateless scraping β€” scrape_html, scrape_markdown, scrape_screenshot

Two transport modes are supported:

  • Stdio β€” npx -y scrapeless-mcp-server runs the server as a child process of the MCP client. This is the right default for desktop and terminal agents like Pi.
  • Streamable HTTP β€” point the client at https://api.scrapeless.com/mcp. This is the right default for cloud-hosted agents that cannot shell out to npx.

Both modes are backed by the same Scrapeless API key. The server source lives at github.com/scrapeless-ai/scrapeless-mcp-server; full tool reference at docs.scrapeless.com. Get your API key on the free plan at app.scrapeless.com.


The MCP Bridge: pi-mcp-adapter

Pi does not ship with MCP support out of the box. That is a deliberate choice: Mario argues that MCP tool definitions are too token-heavy for a minimal agent. Popular MCP servers like Playwright MCP expose 21 tools and consume around 13.7k tokens; Chrome DevTools MCP exposes 26 tools and consumes around 18k tokens. Connecting a few servers can burn through a significant portion of a context window before the conversation starts.

The community response is pi-mcp-adapter: a Pi extension that exposes a single proxy tool (~200 tokens) instead of loading every MCP tool definition upfront. The agent searches and calls individual tools on demand:

Copy
mcp({ search: "screenshot" })
mcp({ tool: "scrapeless_scrape_markdown", args: '{"url": "https://example.com"}' })

Servers are lazy by default β€” they only start when the agent first calls one of their tools, and they disconnect after 10 idle minutes (configurable). Tool metadata is cached to disk so search and describe work without live connections.

The adapter reads the standard MCP config files in this precedence order:

  1. ~/.config/mcp/mcp.json (user-global shared)
  2. <Pi agent dir>/mcp.json (Pi global override, typically ~/.pi/agent/mcp.json)
  3. .mcp.json (project-local shared)
  4. .pi/mcp.json (Pi project override)

Install it with one command (run inside Pi or via the Pi CLI):

bash Copy
pi install npm:pi-mcp-adapter

The adapter version at publication is 2.6.1.


How to Connect Pi Agent to the Scrapeless MCP Server

Prerequisites

  • Node.js 18 or newer. Pi Agent and pi-mcp-adapter both require it; the Gemini CLI variant of Pi support needs Node 20 or newer.
  • A Scrapeless account and API key. Sign up at app.scrapeless.com. New accounts include free Scraping Browser runtime.
  • An API key from a model provider Pi supports β€” Anthropic, OpenAI, Google Gemini, Mistral, DeepSeek, Groq, or any of the others Pi lists at /login.

Step 1 β€” Install Pi Agent

Open your terminal and run:

bash Copy
npm install -g @mariozechner/pi-coding-agent

Verify the install:

bash Copy
pi --version

The Pi binary at publication is @mariozechner/pi-coding-agent version 0.73.1.

Step 2 β€” Install pi-mcp-adapter

With Pi installed, add the MCP adapter extension:

bash Copy
pi install npm:pi-mcp-adapter

Restart Pi after installation. The adapter pulls in a single mcp proxy tool that costs around 200 tokens, plus the /mcp slash command for interactive server management.

Step 3 β€” Get your Scrapeless API key

Log in to app.scrapeless.com, open Settings β†’ API Keys, and copy your key. Keep it on the clipboard for the next step.

Step 4 β€” Configure .mcp.json

In your project folder, create a file named .mcp.json. This is the standard MCP config file format that pi-mcp-adapter reads at startup (no Pi-specific syntax required):

json Copy
{
  "mcpServers": {
    "scrapeless": {
      "command": "npx",
      "args": ["-y", "scrapeless-mcp-server"],
      "env": {
        "SCRAPELESS_KEY": "YOUR_SCRAPELESS_KEY"
      }
    }
  }
}

Replace YOUR_SCRAPELESS_KEY with the key from Step 3. The MCP server reads the API key from the SCRAPELESS_KEY environment variable β€” that name is the source of truth; do not change it to SCRAPELESS_API_KEY.

On first run, npx -y scrapeless-mcp-server downloads the package and starts the server over stdio. No separate install command is needed.

If you would rather skip stdio and use the streamable HTTP transport, swap the entry to:

json Copy
{
  "mcpServers": {
    "scrapeless": {
      "url": "https://api.scrapeless.com/mcp",
      "headers": {
        "x-api-token": "YOUR_SCRAPELESS_KEY"
      }
    }
  }
}

Both forms use the same Scrapeless API key and surface the same 21 tools. Stdio is the right default for a workstation; HTTP is the right default if Pi is running on a cloud host that cannot shell out to npx.

Step 5 β€” Connect to a model provider

Start Pi:

bash Copy
pi

You should see pi-mcp-adapter listed under Extensions. Type /login and choose your authentication method (subscription or API key). Pick the provider you want to use, paste the API key, and Pi will save the credential for future sessions. Type /model to open the model selection panel and pick a model.

Get your Scrapeless API key on the free plan: app.scrapeless.com

Step 6 β€” Verify the connection

Type /mcp to open the MCP panel. The scrapeless server is listed but lazy β€” at first it may show 0/21 because the connection has not been opened yet. Highlight the row with the arrow keys and press Ctrl+R to reconnect (or call any Scrapeless tool, which triggers a lazy connect).

Once connected, the bottom of the terminal shows MCP: 1/1 servers. The 21 tools are now discoverable. To confirm by listing them:

Copy
mcp({ search: "scrapeless" })

You should see the google_search, google_trends, browser_*, and scrape_* tools in the result. Press Esc to close the panel.

Step 7 β€” Run a real task

Give Pi a prompt that needs live web data. For example:

Copy
Search the web for the official axios npm documentation, scrape the most relevant page,
and generate a working JavaScript example that makes a GET request with proper error
handling. Save it as axios-example.js.

Pi calls scrapeless_google_search first, returning a ranked list of results with titles, URLs, and snippets from the official axios docs. It then picks the most relevant URL and calls scrapeless_scrape_markdown to pull the page as clean Markdown β€” the cloud browser handles JavaScript rendering and any anti-detection challenges on the way in, and Pi receives extracted content rather than raw HTML.

With the documentation in context, Pi generates axios-example.js against the version of the API it just read. If a transient os error 10054 or HTTP 503 surfaces, retry the call β€” the cloud browser fleet recycles sessions and a re-issue typically succeeds.

Step 8 β€” Explore the output

Pi writes axios-example.js to your project folder. The file contains an example that mirrors the patterns it just scraped β€” async/await, status-code branching, and error inspection consistent with what the current axios docs recommend:

javascript Copy
async function fetchPost() {
  try {
    const response = await axios.get('https://jsonplaceholder.typicode.com/posts/1');
    console.log('Status:', response.status);
    console.log('Title:', response.data.title);
    console.log('Body:', response.data.body);
  } catch (error) {
    if (error.response) {
      console.error('Status:', error.response.status);
      console.error('Data:', error.response.data);
    } else if (error.request) {
      console.error('No response received from server');
    } else {
      console.error('Request setup error:', error.message);
    }
  }
}

Run it:

bash Copy
npm install axios
node axios-example.js

What You Get Back

A representative tools/list response from the Scrapeless MCP server has this shape (schema reflects the live server at publication; field values are illustrative samples):

json Copy
{
  "tools": [
    {
      "name": "google_search",
      "description": "Universal information search engine",
      "inputSchema": {
        "type": "object",
        "properties": {
          "q":  { "type": "string", "default": "Top news headlines" },
          "gl": { "type": "string", "default": "us" },
          "hl": { "type": "string", "default": "en" }
        }
      }
    },
    {
      "name": "scrape_markdown",
      "description": "Scrape a URL and return its content as Markdown",
      "inputSchema": {
        "type": "object",
        "properties": { "url": { "type": "string", "format": "uri" } },
        "required": ["url"]
      }
    },
    { "name": "browser_create",   "description": "Create a new cloud browser session" },
    { "name": "browser_goto",     "description": "Navigate to a URL in an existing session" },
    { "name": "browser_get_html", "description": "Return the rendered HTML of the active page" }
    // …16 more
  ]
}

Honest observations after wiring this together:

  • Token cost is dominated by the proxy tool, not by the 21 underlying tools. The adapter holds metadata in a disk cache; nothing is loaded into the system prompt until you call directTools on a server.
  • Lazy startup wins on cold sessions. The Scrapeless MCP server only spawns the first time Pi calls one of its tools, so opening a Pi session that has the server configured costs nothing extra.
  • google_search + scrape_markdown is the common pair. Search to find the page, scrape to read it. The browser_* tools are reserved for flows that need login, click-through, or pagination.
  • Transient os error 10054 / HTTP 503 happens. It is documented in the Scrapeless guides and surfaces on cloud-browser session churn. Retry the tool call from Pi rather than restarting the session.
  • SCRAPELESS_KEY is the canonical env var. Other Scrapeless surfaces (the standalone CLI, the agent skill) use SCRAPELESS_API_KEY. The MCP server is the exception.

Conclusion: a minimal coding agent that reads the live web

Pi Agent stays minimal by default; the Scrapeless MCP Server adds 21 web tools without changing that. The connective tissue is pi-mcp-adapter β€” one proxy tool, ~200 tokens, lazy server startup β€” and one .mcp.json file that the wider MCP ecosystem already understands. The same JSON snippet drops into Claude Desktop, Cursor, Codex CLI, Gemini CLI, Windsurf, or VS Code Copilot Chat unchanged.

The qualitative difference shows up the first time you ask Pi to generate code against a library you have not pinned: instead of guessing from training-data memory, Pi searches, scrapes the canonical doc, and writes its example against the version that is actually live. Pair this guide with the Scrapeless MCP Server overview for the full tool catalogue, or with the AWS Strands + Scrapeless MCP guide if the same MCP server is being wired into a framework-based agent rather than a terminal one. Pin the proxy-only mode for normal use, promote individual tools to directTools when you want the model to see them in its system prompt, and keep lifecycle: lazy so cold sessions stay cheap.


Ready to Build Your AI-Powered Data Pipeline?

Join our community to claim a free plan and connect with developers building MCP-driven agent pipelines: Discord Β· Telegram.

Sign up at app.scrapeless.com for free Scraping Browser runtime, then drop the .mcp.json snippet above into Pi (or any other MCP client) and start grounding code generation in live web data. Pricing details at scrapeless.com/en/pricing.


FAQ

1. Is scraping with the Scrapeless MCP server through Pi legal?
The MCP server only accesses publicly available content, the same content a logged-out user would see in a browser. Legality depends on jurisdiction and on the target site's Terms of Service. Review the ToS of any site Pi is asked to scrape and consult counsel for high-stakes use cases.

2. Do I need a proxy on top of the Scrapeless MCP server?
No. The cloud browser already egresses through residential proxies in 195+ countries. Use the gl parameter on google_search (or a --proxy-country style hint in the agent prompt) to pin a region.

3. What does Pi see if a tool call hits ERR_TUNNEL_CONNECTION_FAILED, os error 10054, or HTTP 503?
These are transient errors on the cloud browser fleet. Pi surfaces the error to the model, which typically retries automatically; if not, re-issue the prompt. They are not a sign that the MCP wiring is broken.

4. Pi shows the Scrapeless server but 0/21 tools β€” what is wrong?
Nothing. Servers are lazy by default in pi-mcp-adapter. The count flips to 21/21 the first time Pi calls a Scrapeless tool. To force-connect, highlight the server in /mcp and press Ctrl+R, or run /mcp reconnect scrapeless from the prompt line.

5. The proxy tool is great, but can I expose Scrapeless tools directly to Pi's model?
Yes β€” add "directTools": true (or "directTools": ["google_search", "scrape_markdown"] to promote a subset) to the scrapeless entry in .mcp.json. Direct tools cost ~150–300 tokens each in the system prompt; pick the ones the agent uses most.

6. Can the same .mcp.json snippet be used outside Pi?
Yes. The mcpServers object is the standard MCP config envelope. Claude Desktop, Cursor, Codex CLI, Gemini CLI, Windsurf, and VS Code Copilot Chat all read it (some with minor path or filename differences). The Scrapeless block above works in all of them.

7. How many MCP servers can Pi connect to at once?
There is no hard limit β€” pi-mcp-adapter keeps each lazy and disconnects after 10 idle minutes, so token and process cost stay flat regardless of how many servers are listed. The relevant ceiling is the model provider's context window and Pi's own per-tool budget.

8. Does this work without an AI agent β€” can I call the Scrapeless MCP server from a script?
Yes. The streamable HTTP endpoint at https://api.scrapeless.com/mcp is callable from any HTTP client with the initialize β†’ tools/list β†’ tools/call JSON-RPC sequence; a curl smoke test against it returns serverInfo.name: "scrapeless-mcp-server" and an mcp-session-id header for follow-up calls. Pi is the convenience layer, not a hard dependency.


At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Most Popular Articles

Catalogue