🎯 A customizable, anti-detection cloud browser powered by self-developed Chromium designed for web crawlers and AI Agents.👉Try Now
Back to Blog

Give LangChain Agents 21 Live Web Tools with Scrapeless MCP

Ava Wilson
Ava Wilson

Expert in Web Scraping Technologies

10-Jun-2026

Key Takeaways:

  • LangChain agents get 21 live web tools from one client config. The langchain-mcp-adapters package connects a LangChain app to the Scrapeless MCP server and returns the whole tool surface — browser control, page scraping, Google Search and Trends — as ready-to-bind StructuredTool objects.
  • No Node required on the hosted path. Point the client at https://api.scrapeless.com/mcp over streamable HTTP with your x-api-token; the stdio path (npx -y scrapeless-mcp-server) is the same surface for local setups.
  • Tools work before any model is involved. get_tools() lists them and ainvoke() executes them directly — scrape_markdown on a live URL returns the page as markdown — so the wiring is testable without an LLM key.
  • The names differ by transport. The hosted endpoint serves bare names (browser_goto, scrape_markdown, google_search); the stdio server namespaces them scrapeless_*. Same 21 tools either way.
  • From tools to agent is one constructor. Bind the returned tools to any LangChain chat model and the agent can search, browse, and scrape the live web inside its reasoning loop.
  • Free to start. New Scrapeless accounts include free trial credits — sign up at app.scrapeless.com.

What You Can Do With It

  • Research agents that read the live web. google_search for discovery, scrape_markdown for clean page text — the retrieval half of an agent loop, without building a scraper.
  • Browser-driving agents. Sixteen browser_* tools (create, goto, click, type, scroll, screenshot, snapshot, wait) give an agent a real anti-detection cloud browser session to operate.
  • Market and trend monitoring. google_trends plus scheduled scraping turns a LangChain pipeline into a monitoring service.
  • Tool-grounded RAG. Fetch pages as markdown on demand instead of pre-indexing everything, and let the agent decide what to read.
  • One auth for everything. The same Scrapeless API key that drives the Scraper API actors drives the MCP tool surface.

Why the Scrapeless MCP Server

MCP (Model Context Protocol) is the standard interface for handing tools to agents, and LangChain speaks it through the official adapter package. On the other side of that protocol, the Scrapeless MCP server exposes the scraping infrastructure as 21 typed tools: cloud-browser sessions on the Scraping Browser, single-shot page scraping to HTML, markdown, or screenshot, and Google Search and Trends. The agent gets capabilities; rendering, anti-detection, and proxy routing stay server-side.

The combination matters because LangChain agents are only as useful as their tools. A model that can plan but cannot fetch a live page answers from training data; the same model with this tool surface reads the web it is reasoning about.


Prerequisites

  • Python 3.10+ and a virtual environment.
  • A Scrapeless account and API key — sign up at app.scrapeless.com.
  • For the optional stdio transport: Node.js 18+ (the hosted HTTP path needs no Node).
bash Copy
export SCRAPELESS_API_KEY=your_api_token_here

Connect

1. Install the adapter

bash Copy
pip install langchain-mcp-adapters langchain-core

2. Configure the client and verify the tool count

The hosted endpoint is the fastest path — pure HTTPS, authenticated by the x-api-token header:

python Copy
# handshake.py — connect LangChain to the Scrapeless MCP server, list the tools
import asyncio
import os

from langchain_mcp_adapters.client import MultiServerMCPClient


async def main():
    client = MultiServerMCPClient({
        "scrapeless": {
            "transport": "streamable_http",
            "url": "https://api.scrapeless.com/mcp",
            "headers": {"x-api-token": os.environ["SCRAPELESS_API_KEY"]},
        }
    })
    tools = await client.get_tools()
    names = sorted(t.name for t in tools)
    print(f"tool count: {len(names)}")
    print("names:", ", ".join(names))

asyncio.run(main())

A correct handshake prints 21 tools:

browser_click, browser_close, browser_create, browser_get_html, browser_get_text, browser_go_back, browser_go_forward, browser_goto, browser_press_key, browser_screenshot, browser_scroll, browser_scroll_to, browser_snapshot, browser_type, browser_wait, browser_wait_for, google_search, google_trends, scrape_html, scrape_markdown, scrape_screenshot

3. Or run the server locally over stdio

The same surface ships as an npm package for local setups — the standard MCP config shape, with the key passed as an environment variable:

json Copy
{
  "scrapeless": {
    "command": "npx",
    "args": ["-y", "scrapeless-mcp-server"],
    "transport": "stdio",
    "env": { "SCRAPELESS_KEY": "your_api_token_here" }
  }
}

One transport-level difference to expect: the stdio server namespaces its tool names scrapeless_*, while the hosted endpoint serves them bare. Code that looks tools up by name should match on the suffix.

Get your API key on the free plan: app.scrapeless.com


How you actually use this: call a tool, then hand them to an agent

The returned objects are normal LangChain StructuredTools, which means they run directly — no model required. The shortest possible proof that the wiring works end to end:

python Copy
# invoke_tool.py — execute one MCP tool directly through the adapter
import asyncio
import os

from langchain_mcp_adapters.client import MultiServerMCPClient


async def main():
    client = MultiServerMCPClient({
        "scrapeless": {
            "transport": "streamable_http",
            "url": "https://api.scrapeless.com/mcp",
            "headers": {"x-api-token": os.environ["SCRAPELESS_API_KEY"]},
        }
    })
    tools = {t.name: t for t in await client.get_tools()}
    result = await tools["scrape_markdown"].ainvoke(
        {"url": "https://www.scrapeless.com/en/blog/best-llm-scrapers-2026"}
    )
    text = result if isinstance(result, str) else str(result)
    print(f"scrape_markdown returned {len(text):,} chars of markdown")

asyncio.run(main())

On a live run this returns the full article as markdown — tens of thousands of characters of clean page text from one tool call.

Binding the tools to an agent is the same one constructor it always is in LangChain — bring whichever chat model your stack uses (a model API key is the one prerequisite this guide doesn't cover):

python Copy
# agent.py — attach the MCP tools to a LangChain agent (requires a model API key)
from langchain.agents import create_agent

agent = create_agent(model, tools)  # `tools` from client.get_tools(), `model` = your chat model
result = agent.invoke({
    "messages": [{"role": "user", "content": "Search for Scrapeless and summarize the top result."}]
})

From the agent's perspective the tools are just functions it may call: it plans, picks google_search, reads, picks scrape_markdown, reads again, and answers from live content.


The Scrapeless MCP tool surface

Group Tools What they do
Browser session browser_create, browser_goto, browser_click, browser_type, browser_press_key, browser_scroll, browser_scroll_to, browser_go_back, browser_go_forward, browser_wait, browser_wait_for, browser_snapshot, browser_get_html, browser_get_text, browser_screenshot, browser_close Drive a cloud anti-detection browser step by step — sessions persist across calls
Page scraping scrape_html, scrape_markdown, scrape_screenshot One-shot fetch of any URL as raw HTML, clean markdown, or an image
Google data google_search, google_trends Structured search results and trends data

What You Get Back

Tool results arrive as MCP content parts that the adapter exposes to LangChain — for the scraping tools, the payload is the page itself. The scrape_markdown call above returns the rendered article as markdown text ready to feed a splitter, a summarizer, or the agent's own context window. Browser tools return their observations (snapshots, extracted text, screenshots) the same way, which is what makes multi-step browsing inside an agent loop practical.


Conclusion: one config block, a web-capable agent

The integration is genuinely small: install the adapter, give MultiServerMCPClient the hosted URL and your token, and get_tools() hands LangChain 21 live web capabilities. Verify with the tool count, prove it with one direct ainvoke, then bind the same list to your agent. The Mastra integration guide shows the same server wired into a TypeScript agent framework — same surface, different host.

Ready to Give Your Agent the Live Web?

Join our community to claim a free plan and connect with developers building agent pipelines: Discord · Telegram.

Sign up at app.scrapeless.com for free trial credits — pricing covers the current tiers — and point your LangChain agents at the pages they should be reading.


FAQ

Q: Do I need Node.js?

Only for the stdio transport, which spawns the npm package locally. The hosted https://api.scrapeless.com/mcp endpoint is plain HTTPS — Python-only stacks use it with no Node anywhere.

Q: How do I authenticate?

The hosted endpoint takes x-api-token: <your key> as a request header; the stdio server reads SCRAPELESS_KEY from its environment. Same key, both transports — created on the free plan at app.scrapeless.com.

Q: How do I verify the integration is actually wired up?

Two checks, both model-free: get_tools() returns 21 tools, and a direct ainvoke of scrape_markdown on a real URL returns the page as markdown. If both pass, agent binding is the only step left.

Q: Why do tool names differ between my local server and the hosted endpoint?

The stdio package namespaces names as scrapeless_*; the hosted endpoint serves them bare. Match on the suffix if your code needs to work across both.

Q: Can I use the tools without an agent?

Yes — they are StructuredTool objects and run standalone via ainvoke, which also makes them usable in plain LangChain chains and LangGraph nodes, not just agents.

Q: Is web access through the tools legal?

The tools fetch publicly accessible pages through Scrapeless infrastructure. Rules vary by jurisdiction and site terms — review the ToS of the sites your agent reads and consult counsel for your use case. Never collect personal data protected under GDPR or CCPA.

Q: What does it cost to run?

Tool calls draw on the same usage-based Scrapeless account as the rest of the platform, and new accounts start with free trial credits.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Most Popular Articles

Catalogue