How to Add Web Search to Qwen Code: Scrapeless MCP Integration for Terminal Agents
Expert in Web Scraping Technologies
Key Takeaways:
- Qwen Code has no built-in web search — MCP is how it sees the web. The built-in
web_searchtool was removed in an early release; the official docs are explicit that web search is provided by connecting an MCP server. Point that connection at Scrapeless and the terminal agent gains live Google search, page rendering, and a full cloud browser in one move. - One block in
~/.qwen/settings.jsonwires it all in. Add a singlescrapelessentry to themcpServersobject and the agent gets a Google SERP scraper, a Trends scraper, HTML/Markdown/Screenshot helpers, and 16 browser-automation tools — no SDK code, no service to host. - The agent searches, renders, and drives a browser from plain prompts. Ask in natural language to search Google, read a JavaScript-heavy page as clean markdown, or click through a multi-step flow, and Qwen Code composes the right tool calls turn by turn instead of being capped at training-cutoff knowledge and local files.
- Residential proxies and anti-detection are handled cloud-side. Every request routes through the Scrapeless anti-detection cloud browser with residential proxies in 195+ countries, so the agent gets a rendered, usable response on commercial sites without any proxy or fingerprint setup on your machine.
- 21 tools across SERP, stateless scraping, and browser automation. The Scrapeless MCP server exposes
google_search,google_trends,scrape_html/scrape_markdown/scrape_screenshot, plus 16browser_*tools — one namespace the agent's planner draws from each turn. - stdio or HTTP-streamable transport. Spawn the server locally with
npx, or point the same config at the streamable HTTP endpoint for remote dev containers and CI runners. - Free to start. New Scrapeless accounts include free Scraping Browser runtime — sign up at Scrapeless.
Introduction: a terminal coding agent that can finally read the live web
Qwen Code is an open-source AI agent that lives in your terminal, optimized for the Qwen series of models. It reads large codebases, edits files, runs commands, and automates the tedious parts of a project — all without leaving the shell. What it cannot do on its own is see the live web. Its knowledge stops at the model's training cutoff and the files on disk.
That gap is unusually explicit in Qwen Code. The built-in web_search tool was removed in an early version, and the official documentation states plainly that "web search is provided by connecting to external MCP servers" rather than a built-in tool. In other words, real-time web access in Qwen Code is not an afterthought you bolt on — it is the intended extension point. Until you connect one, the agent cannot pull a current SERP, read a competitor's pricing page, check the latest changelog, or render a JavaScript-only app.
This post closes that gap by wiring the Scrapeless MCP server into Qwen Code. One block in ~/.qwen/settings.json gives the agent Google search, JavaScript rendering, and a full anti-detection cloud browser, all reachable through the same natural-language prompts it already takes for code. For the same Scrapeless surface through other MCP clients, see the Google Antigravity walkthrough and the Pi Agent integration.
What You Can Do With It
- Live SERP research in the terminal. Ask the agent to run
google_searchfor a query and hand back the top results as JSON, so research happens in the shell instead of a separate browser tab. - Competitor and pricing snapshots. Drop a URL into the prompt and have the agent render the page and extract plan names, prices, and features into a structured record you can drop next to your code.
- Doc and changelog lookups that feed code. Have the agent fetch a library's current docs or release notes as clean markdown and write against the rendered text rather than a stale memory of the API.
- Market and trend checks. Use
google_trendsto pull interest signals for a topic in a target region, then seed feature copy, content plans, or experiment ideas with current evidence. - JavaScript-page extraction into a typed record. Point the agent at a single-page app; the cloud browser hydrates it and the agent parses the result into a typed object for the script you are building.
- Multi-step browser flows. Chain
browser_goto,browser_click,browser_type, andbrowser_scrollso the agent walks pagination, expands panels, or steps through a wizard before extracting. - Screenshot capture for review. Use
scrape_screenshotorbrowser_screenshotto grab a rendered page as an image the agent can save into the workspace. - Search-then-read pipelines. Combine
google_searchwithscrape_markdownso the agent finds the top results, reads each one, and summarizes them in a single terminal turn.
At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this post is for demonstration purposes only.
Why the Scrapeless MCP Server
The Scrapeless MCP server is a customizable, anti-detection bridge between an AI agent and the live web. For Qwen Code specifically, it brings:
- An anti-detection cloud browser with JavaScript rendering. Pages are hydrated in a full Scrapeless Scraping Browser before extraction, so SPAs, infinite-scroll feeds, and lazy-loaded panels become first-class targets for
browser_goto+browser_get_html. - Residential proxies in 195+ countries. Geo-bound queries return the listings a local user would see, with proxy egress handled entirely on the Scrapeless side.
- One stdio command via
npx, no SDK code. The server launches as a child process fromnpx -y scrapeless-mcp-server; there is nothing to build, host, or import into your project. - 21 tools spanning SERP, stateless scraping, and full browser automation.
google_searchandgoogle_trendscover SERP data,scrape_html/scrape_markdown/scrape_screenshotcover one-shot page fetches, and 16browser_*tools cover stateful navigation, clicking, typing, scrolling, and screenshots. - A complement to Qwen Code's own
web_fetch. Qwen Code ships a simpleweb_fetchfor plain page retrieval, but it does not render JavaScript or carry anti-detection. The Scrapeless tools fill exactly that gap — search the agent never had, plus rendered, proxied page access.
The free plan is enough to wire this up and run real prompts; compare quotas on the pricing page when you outgrow it. Get your API key on the free plan at Scrapeless.
Prerequisites
- Node.js 22 or newer on the workstation — Qwen Code requires Node 22+, and the stdio MCP server is spawned with
npx. - Qwen Code installed and a model provider configured. Qwen Code authenticates against an LLM backend; the agent loop needs a working model before any tool call runs.
- A Scrapeless account and API key — sign up on the free plan at app.scrapeless.com and copy the key from Settings → API Key Management.
- Basic terminal familiarity — the whole setup is a handful of commands plus one small JSON file.
Install
The setup is five sub-steps; each is independently verifiable.
1. Install Qwen Code
Install the CLI globally from npm, then check the version:
bash
npm install -g @qwen-code/qwen-code
qwen --version
You can also run it without a global install via npx -y @qwen-code/qwen-code@latest.
2. Connect a model provider
Qwen Code talks to an LLM backend. It supports an OpenAI-compatible mode, so any OpenAI-compatible endpoint works — set the auth type, the API key, the base URL, and the model:
bash
export OPENAI_API_KEY="your_provider_key_here"
export OPENAI_BASE_URL="https://your-openai-compatible-endpoint/v1"
qwen --auth-type openai -m "your-model-id"
The same values can be passed as --openai-api-key and --openai-base-url flags. Pick a model that handles tool calls well — Qwen Code is built around agentic tool use, so a current Qwen-series coder model is a natural fit.
3. Add the Scrapeless MCP server (stdio)
Qwen Code reads MCP servers from ~/.qwen/settings.json (user scope) or .qwen/settings.json in a project root. Add a scrapeless block to the mcpServers object:
json
{
"mcpServers": {
"scrapeless": {
"command": "npx",
"args": ["-y", "scrapeless-mcp-server"],
"env": { "SCRAPELESS_KEY": "$SCRAPELESS_KEY" },
"timeout": 60000,
"trust": true
}
}
}
Two details matter here. First, the Scrapeless MCP server reads its key from SCRAPELESS_KEY, not SCRAPELESS_API_KEY — the Scrapeless CLI and SDK use SCRAPELESS_API_KEY, but the MCP server is the documented exception. Second, Qwen Code expands $VAR and ${VAR} inside the env object, so you can keep the key in your environment (export SCRAPELESS_KEY=...) and reference it as $SCRAPELESS_KEY instead of pasting the literal value into the file. The server source lives at github.com/scrapeless-ai/scrapeless-mcp-server.
You can also add the server from the CLI instead of editing JSON by hand:
bash
qwen mcp add --transport stdio --scope user --env SCRAPELESS_KEY=$SCRAPELESS_KEY --trust scrapeless npx -y scrapeless-mcp-server
4. Or use HTTP streamable mode
If the host cannot reliably spawn npx — a hosted dev container, a remote workspace, or a CI sandbox — point Qwen Code at the Scrapeless HTTP endpoint instead of the local process. For HTTP transport, Qwen Code uses the httpUrl key with an optional headers object:
json
{
"mcpServers": {
"scrapeless": {
"httpUrl": "https://api.scrapeless.com/mcp",
"headers": { "x-api-token": "YOUR_SCRAPELESS_KEY" }
}
}
}
The same key value works in both modes; HTTP streamable passes it as the x-api-token header rather than the SCRAPELESS_KEY env var. Stdio is the right default on a developer workstation; HTTP streamable is the right default anywhere a long-lived child process is awkward to keep alive.
5. Verify the connection
List the configured MCP servers:
bash
qwen mcp list
The scrapeless server should report Connected, which means Qwen Code launched the stdio process and completed the MCP handshake. From there the agent can enumerate the server's 21 tools — the Google data tools (google_search, google_trends), the one-shot page helpers (scrape_html, scrape_markdown, scrape_screenshot), and the cloud-browser primitives (browser_create, browser_goto, browser_get_html, browser_get_text, browser_click, browser_type, browser_press_key, browser_scroll, browser_scroll_to, browser_screenshot, browser_snapshot, browser_wait, browser_wait_for, browser_go_back, browser_go_forward, browser_close).
How you actually use this: prompt your Qwen Code agent
After wiring the MCP server, you get live web data by talking to Qwen Code in the terminal — not by hand-writing tool calls. The agent reads the tool list the Scrapeless MCP server exposes and chooses google_search, scrape_markdown, or the browser_* tools as needed, composing them turn by turn from the natural-language prompt. There is no tool JSON to author on your side. Qwen Code runs prompts interactively in a session, or non-interactively by passing the prompt as a positional argument (or piping it on stdin) for one-shot runs and scripting.
Prompts you can paste
| Prompt | What the agent does |
|---|---|
"Find the top Google results for vector database benchmarks 2026 and return them as JSON." |
google_search with q, hl, gl → typed result rows. |
"What search topics are rising for developer tools in the US right now?" |
google_trends. |
"Pull the Qwen Code docs page at https://qwenlm.github.io/qwen-code-docs/en/users/overview/ as clean markdown." |
scrape_markdown. |
"Open https://pricing.example.com, it's a JavaScript app — render it and extract plan name, price, and features as JSON." |
browser_create → browser_goto → browser_get_html → typed extract. |
"Compare the pricing pages at https://a.example.com/pricing and https://b.example.com/pricing and tell me where they differ." |
browser_create → browser_goto (A) → browser_get_html → browser_goto (B) → browser_get_html → diff. |
"Take a full-page screenshot of https://example.com/landing." |
scrape_screenshot. |
"Grab the rendered HTML of https://example.com so I can read the markup." |
scrape_html. |
"Open https://example.com/jobs, wait for the listings to load, snapshot the page, then extract every job title and location as JSON." |
browser_create → browser_goto → browser_wait_for → browser_snapshot → typed extract → browser_close. |
Worked example
You type (one-shot, prompt passed on stdin):
bash
echo "Use the scrapeless google_search tool to find the top results for 'qwen code github' and return the top 3 as a JSON array of {title, link}." | qwen --approval-mode yolo --allowed-mcp-server-names scrapeless
The agent's plan (in plain English):
- Call
google_searchwithq: "qwen code github",hl: "en",gl: "us". - Receive an array of result rows and read the
position,title, andlinkfields. - Sort by
positionand keep the first three rows. - Map each row to a
{title, link}object. - Return the JSON array to the terminal.
What you get back (illustrative shape — the agent works from rows like these):
json
[
{ "title": "Qwen Code is an open-source AI agent for the terminal, ...", "link": "https://qwen.ai/qwencode" },
{ "title": "Qwen Code overview", "link": "https://qwenlm.github.io/qwen-code-docs/en/users/overview/" },
{ "title": "qwen-code/qwen-code-core", "link": "https://www.npmjs.com/package/@qwen-code/qwen-code-core" }
]
// Field names match the google_search row shape; values are illustrative samples.
--allowed-mcp-server-names scrapeless scopes the run to the Scrapeless tools, and --approval-mode yolo lets the agent execute the trusted tool without an interactive prompt — handy for headless and scripted runs. The stateless data tools return their payload as a body prefixed with Response:\n\n; the agent unwraps that prefix before parsing the JSON, so you never see it in the answer.
Shaping prompts
| Say this | Effect |
|---|---|
| "…from Germany" / "…German results" | Routes egress through proxyCountry and sets gl=de on the search. |
| "…as markdown, skip the nav and boilerplate" | Picks scrape_markdown for a clean text payload instead of raw HTML. |
| "…render it first, it's a single-page app" | Forces the browser_* path (browser_create → browser_goto → browser_get_html) so extraction runs against the hydrated DOM. |
| "…top 5 only" | Trims the returned array to the first five rows. |
| "…include the snippet for each result" | Keeps the snippet field in the output rows. |
| "…close the session when you're done" | Adds a final browser_close with the sessionId from browser_create. |
Everything below is the under-the-hood reference — the tool surface, the exact return shapes, and the behavior the agent handles for you.
The Scrapeless MCP tool surface
Once the server is connected, Qwen Code sees 21 tools spanning SERP data, stateless scraping, and full anti-detection cloud browser control.
| Tool | What it does |
|---|---|
google_search |
Runs a Google search (q, hl, gl) and returns structured organic result rows. |
google_trends |
Pulls Google Trends interest data for a query. |
scrape_html |
Fetches a URL and returns its rendered HTML. |
scrape_markdown |
Fetches a URL and returns clean Markdown for the page. |
scrape_screenshot |
Captures a screenshot of a target URL. |
browser_create |
Opens a session on the anti-detection cloud browser. |
browser_goto |
Navigates the session to a URL. |
browser_click |
Clicks an element in the live page. |
browser_type |
Types text into an input or editable field. |
browser_get_text / browser_get_html |
Reads the page's text or HTML. |
browser_screenshot |
Captures a screenshot of the live session. |
browser_snapshot |
Returns an accessibility/structure snapshot of the page. |
browser_wait / browser_wait_for |
Waits a fixed interval, or for a condition/element. |
browser_scroll / browser_scroll_to |
Scrolls the page, or to a specific element. |
browser_go_back / browser_go_forward |
Moves through session history. |
browser_press_key |
Sends a keyboard key to the page. |
browser_close |
Ends the cloud browser session. |
Get your API key on the free plan: Scrapeless
What You Get Back
A google_search call returns a JSON array of organic result rows. Each row carries the same keys, so the agent can map straight to title, link, and snippet:
json
// Field names reflect the google_search tool output; values are illustrative samples.
[
{
"position": 1,
"title": "Web Scraping With Python: A Complete Guide",
"link": "https://example.com/python-web-scraping",
"snippet": "A step-by-step guide to scraping the web with Python and parsing HTML.",
"source": "example.com"
},
{
"position": 2,
"title": "Scraping Dynamic Sites",
"link": "https://example.org/dynamic-scraping",
"snippet": "How to render JavaScript pages before extracting data.",
"source": "example.org"
}
]
A few honest observations once you start running prompts:
- Stateless tools like
google_searchandscrape_markdownreturn a body prefixed withResponse:\n\nfollowed by the JSON payload; the agent unwraps that prefix automatically, so you work with the data, not the wrapper. - The
browser_*tools return plain text with noResponse:\n\nprefix. - Tool arguments are camelCase: pass
sessionId,proxyCountry, and similar fields exactly as named. proxyCountryis a request, not a guarantee — it can defer to the region configured on your account, so confirm the egress region when geo-targeting matters.- Values in tool output are content-dependent: result counts, ordering, and snippet text vary with the live query.
Conclusion: search, render, and browse from the terminal
The whole integration reduces to one MCP block plus natural-language prompts. With the scrapeless entry in ~/.qwen/settings.json and your key in the environment, Qwen Code gains live Google search, JavaScript rendering, and a full anti-detection cloud browser — exactly the web layer the agent does not ship on its own. You describe the task; the agent picks the tool.
If you are wiring up other agents, the same Scrapeless MCP server drops into them too: see the Google Antigravity and Pi Agent integrations, and the Scrapeless MCP server overview for the full tool reference. Keep your API key in SCRAPELESS_KEY, prefer stdio transport for local CLIs and HTTP-streamable for hosted agents, and let the agent pick the tools. Full reference at docs.scrapeless.com.
Ready to Build Your AI-Powered Data Pipeline?
Join our community to claim a free plan and connect with developers building Qwen Code + Scrapeless MCP agents: Discord · Telegram.
Sign up at Scrapeless for free Scraping Browser runtime and adapt the integration above to the SERPs, pages, and regions your team needs. Full reference at docs.scrapeless.com.
FAQ
Why does Qwen Code need an MCP server for web search at all?
Because it has no built-in web search. The web_search core tool was removed in an early version, and the official docs route web search through MCP servers instead. Connecting Scrapeless gives the agent that missing search capability, plus rendered page access and a full cloud browser.
How is this different from Qwen Code's built-in web_fetch?
web_fetch does a plain retrieval of a URL. It does not render JavaScript and carries no anti-detection or proxy layer, so it struggles on single-page apps and bot-protected sites. The Scrapeless tools add the missing search (google_search), clean rendered text (scrape_markdown), and a stateful anti-detection browser (browser_*) on residential proxies.
Which environment variable holds the Scrapeless key?
SCRAPELESS_KEY. This is the documented exception — the Scrapeless CLI and SDK read SCRAPELESS_API_KEY, but the MCP server reads SCRAPELESS_KEY. Qwen Code can expand it from your environment via $SCRAPELESS_KEY inside the config's env object.
Where does Qwen Code read MCP configuration from?
From ~/.qwen/settings.json for user scope, or .qwen/settings.json in a project root for project scope. Both use the mcpServers object. You can also add a server with qwen mcp add and inspect connections with qwen mcp list.
stdio vs HTTP streamable — when should you use each?
Use stdio when the server runs locally alongside the CLI: Qwen Code launches scrapeless-mcp-server as a child process and talks to it over standard input/output. Use HTTP streamable (the httpUrl key pointing at https://api.scrapeless.com/mcp with the x-api-token header) when the agent is hosted or remote and cannot spawn a local process.
Does proxyCountry always apply?
Not necessarily. proxyCountry is a preference that can defer to the region configured on your account. If geo-targeting matters, confirm the egress region rather than assuming the per-call value always wins.
Is web scraping via the agent legal?
Scraping publicly available data is generally permissible, but you are responsible for how you use it. Review each site's Terms of Service and respect robots.txt, and remember that rules around personal data and access vary by jurisdiction. When in doubt, get legal advice for your specific use case.
Can you use this without an AI agent?
Yes. The Scrapeless MCP server is a standard MCP server, so any MCP-compatible client can call it — or you can drive it directly over JSON-RPC (initialize, then tools/list and tools/call). The agent is a convenience, not a requirement.
At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.



