Connecting AI Agent to Live Web Data: AWS Strands + Scrapeless MCP Server Integration Guide
Lead Scraping Automation Engineer
Key Takeaways:
- AWS Strands SDK + Scrapeless MCP gives AI agents a typed web-data tool surface. The agent calls
google_search,scrape_html,scrape_markdown, and full browser-session tools (browser_createβbrowser_gotoβbrowser_wait_forβbrowser_get_htmlβbrowser_close) directly through the agentic loop, without glue code around a REST API. - 21 MCP tools on the hosted endpoint (
https://api.scrapeless.com/mcp). Browser-session control (16 tools), search (google_search,google_trends), and stateless scraping (scrape_html,scrape_markdown,scrape_screenshot) β exposed byscrapeless-mcp-serveron streamable HTTP. The hosted endpoint reportsserverInfo.name: "scrapeless-mcp-server", version 0.2.0 at time of publication. - Cloud browser with residential-proxy egress. Pages render in the Scrapeless Scraping Browser before the agent reads the DOM, so JavaScript-heavy targets work without local browser infrastructure. Pass-rate varies by site: simpler challenge pages clear after a short
browser_wait, while sites with stricter bot-detection layers can still block the default MCP session β for those targets, fall back to the SDK surface with a pre-configured browser profile. - Model-agnostic agent loop. Strands' perception β reasoning β action cycle lets Claude (or any supported model) decide tool calls autonomously. This guide uses Anthropic Claude through the Strands Anthropic provider.
- Free to start. New Scrapeless accounts include free Scraping Browser runtime β sign up at scrapeless website and see the pricing page when you outgrow it.
AI agents powered by large language models (LLMs) can reason and make decisions, but they are limited by their training data. To build truly useful agents, you connect them to real-time web data. This guide shows you how to combine AWS Strands SDK with Scrapeless's MCP server to create autonomous AI agents that can access and analyze live web data.
In this guide, you will learn:
- What AWS Strands SDK is and what makes it a useful framework for building AI agents.
- Why AWS Strands SDK pairs cleanly with Scrapeless's MCP server for web-aware agents.
- How to integrate AWS Strands with Scrapeless's MCP server to create an autonomous competitive intelligence agent.
- How to build agents that autonomously decide which web-scraping tools to use based on their goals.
Let's get started.
What You Can Do With It
- Competitive intelligence loops. Sweep
google_searchfor a competitor name, then callscrape_markdownon the top results to summarize positioning and recent launches. - Single-page enrichment. Drive
browser_createβbrowser_gotoβbrowser_wait_forβbrowser_get_htmlβbrowser_closeagainst an ASIN or product page and let the model pull title, price, rating, and availability into a JSON shape you define in the prompt. - Multi-region price checks. Run two
browser_createsessions against the same SKU on different country storefronts and compare prices in one response. - Search-engine driven research. Combine
google_search+google_trendsfor momentum reads on a topic, then drill into individual articles withscrape_markdown. - Screenshot capture for compliance. Use
scrape_screenshot(URL-in, PNG-out) or the in-sessionbrowser_screenshotafter a UI interaction to keep evidence of what a page looked like at a point in time. - Multi-step UI flows.
browser_click,browser_type,browser_press_key, andbrowser_scrolllet the agent drive forms, log gates, and lazy-loaded grids inside one persistent session.
At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this post is for demonstration purposes only.
What Is AWS Strands SDK?
AWS Strands SDK is a Python framework for building AI agents, published on PyPI by AWS (strands-agents, author opensource@amazon.com). It exposes an Agent class that takes a model and a list of tools and runs a perception β reasoning β action loop until the task completes.
This guide uses three pieces of the SDK that are verified to exist in the published package:
from strands import Agentβ the agent class itself.from strands.models.anthropic import AnthropicModelβ Anthropic provider; constructor signature isAnthropicModel(*, client_args: dict | None = None, **model_config)and acceptsmodel_id,max_tokens,params={"temperature": ...}.from strands.tools.mcp import MCPClientβ wraps an MCP transport so any MCP server's tools can be passed toAgent(tools=...).
The agent's result type is strands.agent.agent_result.AgentResult, a dataclass with fields stop_reason, message, metrics, state (plus optional interrupts, structured_output).
How the loop runs
- The user passes a prompt to
agent.invoke_async(prompt). - The model receives the prompt plus the list of available tools.
- It either returns a final message or emits a tool call.
- If a tool is called, Strands runs it and feeds the result back to the model.
- Steps 3β4 repeat until the model returns a final message.
Why Combine AWS Strands SDK with Scrapeless's MCP Server for Web Data Retrieval
LLMs don't fetch live web data on their own. To give a Strands agent that capability, you wire it to an MCP server that exposes web tools. The scrapeless-mcp-server is one such MCP server β backed by the Scrapeless Scraping Browser, a cloud-hosted browser that egresses through residential proxies.
Verified live (tools/list against https://api.scrapeless.com/mcp, today): 21 tools, in three groups:
- Stateless scraping β
scrape_html,scrape_markdown,scrape_screenshot. - Search β
google_search,google_trends. - A persistent browser session β
browser_create,browser_goto,browser_go_back,browser_go_forward,browser_wait,browser_wait_for,browser_get_html,browser_get_text,browser_snapshot,browser_click,browser_type,browser_press_key,browser_scroll,browser_scroll_to,browser_screenshot,browser_close.
Get your Scrapeless free plan API key at scrapeless website. The MCP server source is at github.com/scrapeless-ai/scrapeless-mcp-server.
How to Integrate AWS Strands SDK with Scrapeless MCP Server in Python
In this section, you will use AWS Strands SDK to build an AI agent equipped with live data scraping and retrieval capabilities from the Scrapeless MCP server.
As the example, this guide builds a competitive intelligence agent that can autonomously analyze markets and competitors. The agent decides which tools to use based on its goals, demonstrating the power of the agentic loop.
Follow the step-by-step guide below to build your Claude + Scrapeless MCP-powered AI agent using AWS Strands SDK.
Prerequisites
To replicate the code example, make sure you have:
Software requirements:
- Python 3.10 or higher.
- Node.js (latest LTS version recommended).
- A Python IDE such as VS Code with the Python extension or PyCharm.
Account requirements:
- A Scrapeless account and API key β sign up at scrapeless.
- An Anthropic account with Claude API access and credits.
Background knowledge (helpful but not required):
- Basic understanding of how MCP works.
- Familiarity with AI agents and their capabilities.
- Basic knowledge of asynchronous programming in Python.
Step #1: Create Your Python Project
Open your terminal and create a new folder for your project:
bash
mkdir strands-scrapeless-agent
cd strands-scrapeless-agent
Set up a Python virtual environment:
bash
python -m venv venv
Activate the virtual environment:
bash
# On Linux/macOS:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
Create the main Python file:
bash
touch agent.py
Your folder structure should look like this:
strands-scrapeless-agent/
βββ venv/
βββ agent.py
You now have a Python environment ready to build an AI agent with web data access.
Step #2: Install AWS Strands SDK
In your activated virtual environment, install the required packages:
bash
pip install "strands-agents>=1.0" anthropic "mcp>=1.0" python-dotenv
This installs:
strands-agents: the AWS Strands SDK for building AI agents.anthropic: required peer dep forstrands.models.anthropic.AnthropicModel.mcp: the official MCP Python SDK, needed for themcp.client.streamable_httptransport.python-dotenv: for environment variable management.
Next, add these imports to your agent.py file:
python
from strands import Agent
from strands.models.anthropic import AnthropicModel
from strands.tools.mcp import MCPClient
from mcp.client.streamable_http import streamablehttp_client
You can now use AWS Strands SDK for agent building.
Step #3: Set Up Environment Variables
Create a .env file in your project folder for secure API key management:
bash
touch .env
Add your API keys to the .env file:
ini
# Anthropic API for Claude models
ANTHROPIC_API_KEY=your_anthropic_key_here
# Scrapeless credentials for web scraping via MCP
SCRAPELESS_KEY=your_scrapeless_key_here
In your agent.py, set up environment variable loading:
python
import os
from dotenv import load_dotenv
load_dotenv()
# Read API keys
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
SCRAPELESS_KEY = os.getenv("SCRAPELESS_KEY")
You are now set up to securely load API keys from the .env file.
Step #4: Verify the Scrapeless MCP Endpoint
This guide uses the hosted streamable HTTP endpoint at https://api.scrapeless.com/mcp. No local install is required.
Smoke-test the hosted endpoint with curl:
bash
curl -X POST "https://api.scrapeless.com/mcp" \
-H "x-api-token: $SCRAPELESS_KEY" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"smoke","version":"1.0"}}}'
A successful response returns:
- HTTP
200 OKwithContent-Type: text/event-stream - An
mcp-session-idheader β reuse it on follow-uptools/listandtools/callrequests - A body event
data: {"result":{"protocolVersion":"2024-11-05","capabilities":{"tools":{"listChanged":true}},"serverInfo":{"name":"scrapeless-mcp-server","version":"0.2.0",...}},...}
Tested against the live endpoint at time of publication: returns 21 MCP tools and the serverInfo shown above.
Get your API key on the free plan: app.scrapeless.com
Full SDK and CLI reference: scrapeless website.
Step #5: Initialize the Strands Model
Configure the Anthropic Claude model in your agent.py:
python
# Initialize Anthropic model (Claude via the Strands Anthropic provider)
model = AnthropicModel(
client_args={"api_key": ANTHROPIC_API_KEY},
model_id="claude-sonnet-4-6",
max_tokens=4096,
params={"temperature": 0.3}
)
This configures Claude as your agent's LLM with parameters tuned for consistent, focused responses. The client_args dictionary is the official Strands way to pass Anthropic client configuration (see the AWS Strands Anthropic provider docs).
Step #6: Connect to the Scrapeless MCP Server
Create the MCP client configuration to connect to the hosted streamable HTTP endpoint:
python
import asyncio
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def http_transport():
"""Streamable-HTTP transport to the hosted Scrapeless MCP endpoint."""
return streamablehttp_client(
url="https://api.scrapeless.com/mcp",
headers={"x-api-token": SCRAPELESS_KEY},
)
async def connect_mcp_tools():
"""Connect to the Scrapeless MCP server and discover tools."""
logger.info("Connecting to the Scrapeless MCP server (HTTP)...")
mcp_client = MCPClient(http_transport)
with mcp_client:
tools = mcp_client.list_tools_sync()
logger.info(f"Discovered {len(tools)} MCP tools")
for tool in tools:
logger.info(f" - {tool.tool_name}")
return mcp_client, tools
Tested against the live endpoint: the call sequence above logs Discovered 21 MCP tools followed by all 21 tool names listed in the table below.
Available Scrapeless MCP tools
The list below summarizes the tools the Scrapeless MCP server exposes on the streamable HTTP transport. list_tools_sync() returns the same set at runtime.
| Tool | Purpose |
|---|---|
browser_create |
Allocate a Scrapeless cloud-browser session |
browser_close |
Release the session |
browser_goto |
Navigate to a URL |
browser_go_back / browser_go_forward |
Move through session history |
browser_wait_for |
Wait for a CSS selector to appear |
browser_wait |
Wait for a fixed duration in milliseconds |
browser_get_html |
Read the fully rendered DOM |
browser_get_text |
Read the visible text of the current page |
browser_snapshot |
Capture the accessibility snapshot of the page |
browser_click |
Click an element by selector |
browser_type |
Type text into an input field |
browser_press_key |
Send keystrokes such as PageDown or Enter |
browser_scroll |
Scroll the current page |
browser_scroll_to |
Scroll a specific element into view |
browser_screenshot |
Capture a PNG screenshot |
google_search |
Run a Google Search query and return results |
google_trends |
Fetch Google Trends data for keywords |
scrape_html |
Fetch the rendered HTML of any URL |
scrape_markdown |
Fetch a URL and return readable Markdown |
scrape_screenshot |
Capture a screenshot of any URL |
Step #7: Define the Competitive Intelligence Agent
Create an agent with a specialized prompt for competitive intelligence:
python
def create_agent(model, tools):
"""Create a competitive intelligence agent with web data access"""
system_prompt = """You are an expert competitive intelligence analyst with access to web data tools through the Scrapeless MCP server.
## Mission
Conduct comprehensive market and competitive analysis using real-time web data.
## Available MCP Tools (Scrapeless)
- google_search: run queries against Google Search and return results
- google_trends: fetch Google Trends data for keywords
- scrape_html: fetch the rendered HTML of a URL through the Scrapeless cloud browser
- scrape_markdown: fetch a URL and return readable Markdown content
- scrape_screenshot: capture a screenshot of a target URL
- browser_create / browser_close: allocate and release a Scrapeless cloud-browser session
- browser_goto / browser_go_back / browser_go_forward: navigate the session
- browser_wait_for / browser_wait: wait for a selector or a fixed timeout
- browser_get_html / browser_get_text / browser_snapshot: read the rendered DOM, visible text, or full DOM snapshot
- browser_click / browser_type / browser_press_key: drive UI interactions
- browser_scroll / browser_scroll_to: trigger lazy-loaded content or scroll an element into view
- browser_screenshot: capture session evidence for QA and compliance
## Autonomous Analysis Workflow
When given an analysis task, autonomously:
1. Decide which tools to use based on the goal.
2. Gather comprehensive data from multiple sources.
3. Synthesize findings into actionable insights.
4. Provide specific strategic recommendations.
Be proactive in tool selection. You have full autonomy to use any combination of tools."""
return Agent(
model=model,
tools=tools,
system_prompt=system_prompt
)
This creates an agent specialized in competitive intelligence with autonomous decision-making capabilities. The agent can mix lightweight one-shot tools (scrape_markdown, google_search) with full browser-session tools (browser_create β browser_goto β browser_wait_for β browser_get_html β browser_close) depending on whether the page needs JavaScript rendering or session persistence.
Step #8: Launch Your Agent
Create the main execution function to run your agent:
python
async def main():
"""Run the competitive intelligence agent"""
print("AWS Strands + Scrapeless MCP Competitive Intelligence Agent")
print("=" * 70)
try:
# Connect to MCP tools
mcp_client, tools = await connect_mcp_tools()
# Create the agent
agent = create_agent(model, tools)
print("\nAgent ready with web data access.")
print("\nStarting analysis...")
print("-" * 40)
# Example: Analyze Tesla's competitive position
prompt = """
Analyze Tesla's competitive position in the electric vehicle market.
Research:
- Current product lineup and pricing strategy.
- Main competitors and their offerings.
- Recent strategic announcements.
- Market share and positioning.
Use the Scrapeless tools to gather real-time data from tesla.com and search results.
"""
# Run analysis with MCP context
with mcp_client:
result = await agent.invoke_async(prompt)
print("\nAnalysis results:")
print("=" * 50)
print(result)
print("\nAnalysis complete.")
except Exception as e:
logger.error(f"Error: {e}")
print(f"\nError: {e}")
if __name__ == "__main__":
asyncio.run(main())
invoke_async returns an AgentResult object whose attributes are message, stop_reason, metrics, and state. print(result) calls __str__ on the wrapper; for the assistant text only, use print(result.message) (the full Message dict) or extract the text blocks with for block in result.message.get("content", []): print(block.get("text", "")).
Your agent is ready to perform autonomous competitive analysis.
Step #9: Put It All Together
Here is the complete code in agent.py:
python
import asyncio
import os
import logging
from dotenv import load_dotenv
from strands import Agent
from strands.models.anthropic import AnthropicModel
from strands.tools.mcp import MCPClient
from mcp.client.streamable_http import streamablehttp_client
# Load environment variables
load_dotenv()
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Read API keys
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
SCRAPELESS_KEY = os.getenv("SCRAPELESS_KEY")
# Initialize Anthropic model
model = AnthropicModel(
client_args={"api_key": ANTHROPIC_API_KEY},
model_id="claude-sonnet-4-6",
max_tokens=4096,
params={"temperature": 0.3}
)
def http_transport():
"""Streamable-HTTP transport to the hosted Scrapeless MCP endpoint."""
return streamablehttp_client(
url="https://api.scrapeless.com/mcp",
headers={"x-api-token": SCRAPELESS_KEY},
)
async def connect_mcp_tools():
"""Connect to the Scrapeless MCP server and discover tools."""
logger.info("Connecting to the Scrapeless MCP server (HTTP)...")
mcp_client = MCPClient(http_transport)
with mcp_client:
tools = mcp_client.list_tools_sync()
logger.info(f"Discovered {len(tools)} MCP tools")
for tool in tools:
logger.info(f" - {tool.tool_name}")
return mcp_client, tools
def create_agent(model, tools):
"""Create a competitive intelligence agent with web data access"""
system_prompt = """You are an expert competitive intelligence analyst with access to web data tools through the Scrapeless MCP server.
## Mission
Conduct comprehensive market and competitive analysis using real-time web data.
## Available MCP Tools (Scrapeless)
- google_search, google_trends
- scrape_html, scrape_markdown, scrape_screenshot
- browser_create, browser_close
- browser_goto, browser_go_back, browser_go_forward
- browser_wait_for, browser_wait
- browser_get_html, browser_get_text, browser_snapshot
- browser_click, browser_type, browser_press_key
- browser_scroll, browser_scroll_to
- browser_screenshot
## Autonomous Analysis Workflow
When given an analysis task, autonomously:
1. Decide which tools to use based on the goal.
2. Gather comprehensive data from multiple sources.
3. Synthesize findings into actionable insights.
4. Provide specific strategic recommendations.
Be proactive in tool selection. You have full autonomy to use any combination of tools."""
return Agent(
model=model,
tools=tools,
system_prompt=system_prompt
)
async def main():
"""Run the competitive intelligence agent"""
print("AWS Strands + Scrapeless MCP Competitive Intelligence Agent")
print("=" * 70)
try:
mcp_client, tools = await connect_mcp_tools()
agent = create_agent(model, tools)
print("\nAgent ready with web data access.")
print("\nStarting analysis...")
print("-" * 40)
prompt = """
Analyze Tesla's competitive position in the electric vehicle market.
Research:
- Current product lineup and pricing strategy.
- Main competitors and their offerings.
- Recent strategic announcements.
- Market share and positioning.
Use the Scrapeless tools to gather real-time data from tesla.com and search results.
"""
with mcp_client:
result = await agent.invoke_async(prompt)
print("\nAnalysis results:")
print("=" * 50)
print(result)
print("\nAnalysis complete.")
except Exception as e:
logger.error(f"Error: {e}")
print(f"\nError: {e}")
if __name__ == "__main__":
asyncio.run(main())
As noted in Step #8, invoke_async returns an AgentResult whose attributes are message, stop_reason, metrics, and state. Use print(result.message) if you only want the assistant text rather than the wrapper string.
Execute the AI agent with:
bash
python agent.py
The script will connect to the Scrapeless MCP server, run list_tools_sync() (returns 21 tools β verified live against the hosted endpoint at time of writing), instantiate the agent, and call agent.invoke_async(prompt). The model decides which tools to call. What the agent actually does on any given run depends on what the model picks β Strands does not constrain the tool sequence.
Conclusion
This guide wired AWS Strands SDK to Scrapeless's MCP server: ~100 lines of Python, 21 verified MCP tools available to the model, and a verified data path through the Scrapeless Scraping Browser with residential-proxy egress.
Get an API key at app.scrapeless.com.
Ready to Build Your AI-Powered Data Pipeline?
Join our community to claim a free plan and connect with developers building AWS Strands + Scrapeless MCP agents: Discord Β· Telegram.
Sign up at app.scrapeless.com for free Scraping Browser runtime and adapt the AWS Strands integration above to the workflows your team needs. Full reference at docs.scrapeless.com. For deeper background on the Scraping Browser runtime that powers these tools, see the Scrapeless Scraping Browser product page and our sibling guide on integrating Scrapeless with LangChain agents.
FAQ
Q1: What is MCP, and why does it matter for AI agents?
MCP (Model Context Protocol) is an open standard for connecting AI agents to tools and data sources. An MCP server exposes a typed tool list that any MCP-aware client β including AWS Strands SDK β can call. With Scrapeless MCP, the agent gets typed access to a real cloud browser, search-engine queries, and Markdown/HTML scrapers without writing per-API glue code. The agent decides which tools to call inside the agentic loop; MCP carries the typed schemas and results.
Q2: What does the hosted MCP endpoint give me?
https://api.scrapeless.com/mcp (with x-api-token header) returns the full 21-tool surface and was verified end-to-end with the Strands MCPClient in this guide. No local Node runtime or background process needed β streamablehttp_client(...) is one import and one URL.
Q3: How does the agentic loop handle a tool failure or anti-bot challenge?
Strands feeds the tool result β including errors β back to the model, which decides whether to retry, switch tools, or surface the failure. For Scrapeless specifically, the common recovery is browser_close + a fresh browser_create, or falling back from a full browser session to scrape_markdown / scrape_html. Per-call proxy region is not a parameter the MCP browser_create tool exposes β scrapeless-mcp-server/src/session-manager.ts forwards only session_ttl, profile_id, and profile_persist. To pin a specific region, create a Scrapeless profile in the dashboard with the country baked in and reference it via the BROWSER_PROFILE_ID env var on the MCP server.
Q4: Can this run without an AI agent?
Yes. Every Scrapeless MCP tool is callable from a plain Python script or a curl against the streamable HTTP endpoint (the curl from Step #4 returns serverInfo.name: "scrapeless-mcp-server" and a mcp-session-id header you reuse on follow-up tools/list and tools/call requests). The AWS Strands SDK adds the agentic loop on top.
Q5: What other models work besides Claude Sonnet?
The Strands AnthropicModel accepts any model ID the Anthropic API exposes β claude-haiku-* for cheaper extract-and-summarize loops, claude-opus-* for heavier multi-step reasoning. Strands also ships OpenAIModel, BedrockModel, GeminiModel, and providers for Mistral, Ollama, LiteLLM, llamacpp, and SageMaker. Swap the model line; the rest of agent.py stays the same.
Any OpenAI-API-compatible endpoint also works via OpenAIModel with a base_url override. Verified end-to-end for this guide against OpenRouter:
python
from strands.models.openai import OpenAIModel
model = OpenAIModel(
client_args={
"api_key": OPENROUTER_API_KEY,
"base_url": "https://openrouter.ai/api/v1",
},
model_id="openai/gpt-4o-mini", # or anthropic/claude-3.5-sonnet, google/gemini-flash-1.5, etc.
params={"temperature": 0.3},
)
With that swap, the same Agent(model=..., tools=tools, system_prompt=...) runs against the OpenRouter catalog. Tested for this guide with openai/gpt-4o-mini: the agent autonomously called google_search three times and scrape_markdown once, returned a clean answer with stop_reason: end_turn, at roughly $0.001 per run.
Q6: How do I handle transient errors like ERR_TUNNEL_CONNECTION_FAILED or 503?
Both come from the Scrapeless egress pool and are usually transient. Strands' loop already feeds the error back to the model, which will retry on the next turn. For deterministic recovery in the prompt, instruct the agent: "If a browser tool returns a tunnel or 503 error, call browser_close and start a fresh browser_create before retrying." That phrasing produces a cleaner retry than relying on the model to decide.
Q7: Can multiple agents share the MCP connection?
The MCPClient opens one session per with mcp_client: context. For parallel agents, give each its own MCPClient instance (and therefore its own session ID on the hosted endpoint) rather than reusing one. The hosted endpoint scales with concurrent sessions; the per-account concurrency limit lives in the Scrapeless pricing tier.
Q8: Where do I plug in my own non-Scrapeless tools alongside this?
Agent(tools=...) accepts a list. Pass tools + my_extra_tools where my_extra_tools is any Strands-compatible tool (functions decorated with @tool, or tools from another MCP server). The model sees the union of the schemas and decides which to call.
At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.



