How to Scrape Qwen AI Answers With the Scrapeless Scraping Browser
Lead Scraping Automation Engineer
Key Takeaways:
- One composer, one send button, one answer node. chat.qwen.ai ("Qwen Studio") renders a single composer
textarea.message-input-textarea; type into it, click.message-input-right-button-send, then read the reply out of.response-message-content.phase-answer. End to end, the prompt "What is the capital of France?" returns "The capital of France is Paris." - A guest session gets one turn; the next turn needs login. An anonymous Qwen session answers a single question, then a "Welcome / Log in" wall appears before the second turn. Conversation history, file upload, and image generation all sit behind that wall — handle them as an authenticated prerequisite, not a guest call.
- Wait on the answer footer, not a stopwatch. Qwen streams the reply token by token. The reliable "done" signal is the response footer (the copy control
.copy-response-button) mounting under the message — a fixed sleep captures a half-written sentence. - The reasoning card is a separate node from the answer. Qwen renders a collapsible "Thought completed" card (
.qwen-chat-thinking-status-card-title-text) above the answer body. Detect it on its own so reasoning text never bleeds into your answer field. - Pin residential egress and keep each job in one shell. chat.qwen.ai personalizes and rate-limits by IP, and the Scraping Browser handles proxies, fingerprinting, and rendering as session-level concerns so your code only deals with selectors and waits.
- Free to start. New Scrapeless accounts include free Scraping Browser runtime — sign up at app.scrapeless.com.
Introduction: turning Qwen's answers into structured data
Qwen (Alibaba's Tongyi family) is one of the most widely used large-language-model assistants, and teams want its answers as data: model-eval and regression sets, brand-and-category answer monitoring, multilingual grounding corpora, and side-by-side prompt tests. The catch is that those answers do not live on an open HTML page. They are streamed into a hydrated React app at chat.qwen.ai — the interface labels itself "Qwen Studio" — and the reply only exists in the DOM after the app renders and finishes streaming it.
That makes a plain HTTP fetch useless: you get an empty application shell, no answer. The composer, the send control, and the answer container are all app-specific class names that shift when Qwen ships a UI update, and a second question runs into a login wall. So the real task is to drive the chat UI like a browser would, wait for the stream to settle, and pull the answer (and the reasoning card, when one rendered) out of the live DOM.
This post is a terminal-first walkthrough on top of the Scrapeless Scraping Browser. It mints a cloud session, opens Qwen Studio, types a prompt into the composer, waits for the reply to finish, and reads it back as JSON. Every selector and signal below is drawn from a real Scraping Browser run against chat.qwen.ai. A companion search-and-AI-answer guide is linked at the end.
What You Can Do With It
- Build Qwen eval datasets. Pin question/answer pairs per timestamp to track answer drift and feed model-regression suites.
- Brand and category monitoring. Watch how Qwen answers questions about your product, your space, or a regulated topic, and diff the responses over weeks.
- Multilingual answer grounding. Capture Qwen's replies in Chinese and English for cross-lingual retrieval evaluation.
- Prompt A/B testing. Run the same question across several phrasings and compare what comes back.
- Reasoning-trace flags. Record whether a query triggered a "Thought completed" reasoning pass, so you can separate fast answers from deliberated ones.
- Citation harvesting. When a prompt pushes Qwen to ground its answer with web sources, collect the source links it surfaces.
Why Scrapeless Scraping Browser
Scrapeless Scraping Browser is a customizable, anti-detection cloud browser designed for web crawlers and AI agents. For chat.qwen.ai specifically, it brings:
- Residential proxies in 195+ countries (
--proxy-country,--proxy-state,--proxy-city) — chat.qwen.ai personalizes and throttles by IP, so residential egress is the load-bearing primitive for sustained collection. - JavaScript rendering in the cloud — Qwen Studio is a hydrated single-page app; the answer node only exists after the app mounts and streams it, which static HTML never sees.
- Anti-detection fingerprinting on every session — the cloud browser, an anti-detection cloud browser powered by self-developed Chromium, presents as real Chrome to the chat app.
- Session persistence and profiles — keep a logged-in Qwen cookie alive across calls, which is what unlocks the multi-turn path in Step 5.
- A single CLI surface — one
scrapeless-scraping-browserbinary drives navigation, typing, clicks, waits, and eval, so the whole flow lives in one shell.
Get your API key on the free plan at app.scrapeless.com. The Scraping Browser product page covers the runtime, and the full command set lives in the Scrapeless docs.
Prerequisites
- Node.js 18 or newer.
- A Scrapeless account and API key — sign up at app.scrapeless.com.
jqfor JSON parsing (optional; a grep fallback is shown below).- For multi-turn extraction: a Qwen account you control. The single-turn guest flow needs no login; everything past the first answer does (Step 5).
- Basic familiarity with the terminal.
Install
The recipes below run on the scrapeless-scraping-browser CLI. Setup is four short steps — CLI users need #1, #2, and #4; AI-agent users add #3.
1. Install the CLI package
bash
npm install -g scrapeless-scraping-browser
This provides the scrapeless-scraping-browser binary that every step below calls. The skill does not ship its own runtime — it loads command patterns into your AI agent, but the CLI itself must be installed first.
2. Configure your API key
Get your token from app.scrapeless.com, then store it where the CLI can read it:
bash
scrapeless-scraping-browser config set apiKey your_api_token_here
scrapeless-scraping-browser config get apiKey # verify
The config file lives at ~/.scrapeless/config.json with access restricted to the current user, and it takes priority over the environment variable. For CI runners, prefer the env var instead:
bash
export SCRAPELESS_API_KEY=your_api_token_here
3. Install the Scrapeless skill in your AI agent
This is a separate step from #1. Step 1 installed the CLI binary — the runtime your agent invokes. The skill is what teaches your agent how to invoke it correctly (the discover → submit → wait → extract pattern, the Qwen selectors, the wait strategy). They are two different things and you need both for the prompt-driven workflow.
The skill is a folder containing SKILL.md + skill.json + references/. The canonical source is the scrapeless-ai/scrapeless-agent-browser → skills/scraping-browser-skill repo on GitHub; the per-agent install commands are in the Scrapeless docs. Reload your agent after install so the skill becomes active.
4. Verify the install
Smoke-test with one safe prompt before touching Qwen:
"Using the Scrapeless skill, open https://example.com and tell me the page title."
Your agent should mint a session, open the page, and reply with "Example Domain". If that works, you are ready to drive Qwen Studio.
How you actually use this: prompt your agent
After install, you scrape Qwen by talking to your agent — not by copy-pasting bash. The skill loads the Qwen composer/send/answer selectors and the stream-completion check into the agent's context, so a one-line prompt returns clean answer JSON.
Prompts you can paste
| You say to your agent | What you get back |
|---|---|
| "Ask Qwen 'what is the capital of France?' and give me just the answer." | The answer string from the rendered message body |
| "Ask Qwen 'explain RAG in two sentences' and return JSON with answer + a reasoning flag." | { answer, reasoning, model } |
| "Run these 5 questions through Qwen and save qwen-eval.json." | One JSON file, one row per question/answer |
| "Ask Qwen 'what is the capital of France' in Chinese on a Singapore IP." | Session minted with --proxy-country SG, prompt sent in Chinese |
| "Ask Qwen 'latest on the James Webb telescope' and also grab any source links it cites." | { answer, citations: [...] } |
| "Did Qwen run a reasoning pass on 'prove that sqrt(2) is irrational'?" | reasoning: "Thought completed" or null |
Worked example: one Qwen answer as text
You type:
"Ask Qwen 'What is the capital of France? Answer in one short sentence.' and return just the answer as text."
The agent's plan (in plain English):
- Mint a residential session (US egress is a fine default for Qwen Studio).
- Open
https://chat.qwen.ai/, then wait until the composertextarea.message-input-textareais present.- Fill the composer with the question and click
.message-input-right-button-send.- Poll until the response footer (
.copy-response-button) mounts — that means the stream finished.- Read
.response-message-content.phase-answerand return its text.
What you get back:
The capital of France is Paris.
That is the entire user-facing surface. The selector discovery, the completion poll, and the JSON shaping in Steps 1–4 below are what the skill makes the agent run — you do not type any of them.
Shaping prompts: how to control what comes back
| Phrasing | Effect |
|---|---|
| "…just the answer" / "…with a reasoning flag" | Which fields the agent returns |
| "…as JSON" / "…as plain text" | Output format |
| "…in Chinese" / "…in English" | Prompt language |
| "…on a Singapore IP" / "…from Germany" | Sets --proxy-country |
| "…save to qwen-eval.json" | Writes to file |
| "…run all 10 questions" | Loops — fresh session per question |
Steps 1–6 below are the under-the-hood reference — read them once to see how the open → submit → wait → extract pattern composes, then trust your agent to apply it. Scripting outside an agent works exactly as shown; the skill is just the faster path.
Step 1 — Connect to Scrapeless Scraping Browser
Mint a session with residential egress before opening any page. The proxy geography is fixed for the life of the session.
bash
SESSION=$(scrapeless-scraping-browser new-session \
--name "qwen-us" \
--ttl 1800 \
--proxy-country US \
--json | jq -r '.data.taskId')
echo "Session: $SESSION"
Portable fallback without jq:
bash
SESSION=$(scrapeless-scraping-browser new-session \
--name "qwen-us" --ttl 1800 --proxy-country US --json \
| grep -oE '"taskId":"[^"]*"' | cut -d'"' -f4)
US residential egress renders Qwen Studio cleanly. Qwen is a global product, so any stable residential country works; match the geography to the locale you want Qwen to answer in.
Step 2 — Open Qwen Studio and pick the right wait
Open chat.qwen.ai, then settle on a wait strategy. A chat app holds a live connection open for streaming, so --load networkidle rarely reaches a quiet window — it tends to hang. The reliable pattern is a fixed wait followed by a readiness check that counts the composer.
bash
scrapeless-scraping-browser --session-id $SESSION open "https://chat.qwen.ai/"
scrapeless-scraping-browser --session-id $SESSION wait 4000
# Readiness signal: the single composer textarea has mounted.
scrapeless-scraping-browser --session-id $SESSION eval \
'document.querySelectorAll("textarea.message-input-textarea").length' # expect 1
| Strategy | Behavior on Qwen Studio | Recommendation |
|---|---|---|
wait --load networkidle |
Streaming connection keeps the network busy; rarely settles | Avoid for chat.qwen.ai |
wait 4000 (fixed) |
Deterministic — the app has hydrated by then | Default |
eval composer count === 1 |
True readiness — the input is interactive | Use as the gate before typing |
The page title reads "Qwen Studio" and the banner shows the active model label (for example, Qwen3.7-Plus) next to a mode selector set to Auto. You do not need to change either to read an answer.
Step 3 — Submit a prompt
Qwen Studio exposes exactly one composer, textarea.message-input-textarea. The send control arms only once the composer holds real keystrokes — a programmatic fill sets the value without firing the composer's input state, so the button stays inert. Type the prompt in character by character, give the UI a moment, then click send.
If you are wiring this against a fresh build, discover the selectors first rather than trusting these verbatim — Qwen rotates class names across releases:
bash
# Discover: confirm the composer and any send control before driving them.
scrapeless-scraping-browser --session-id $SESSION get html "main"
Then submit:
bash
PROMPT="What is the capital of France? Answer in one short sentence."
scrapeless-scraping-browser --session-id $SESSION type \
"textarea.message-input-textarea" "$PROMPT"
# The send button arms once real keystrokes land in the composer.
scrapeless-scraping-browser --session-id $SESSION wait 600
scrapeless-scraping-browser --session-id $SESSION click ".message-input-right-button-send"
Pressing Enter in the focused composer submits the same prompt — press Enter after the type is an equivalent path, handy when the send button has not armed yet.
On submit, Qwen routes the URL from / to /c/new-chat to /c/guest and renders your question as a user message, with the assistant reply mounting underneath.
Get your API key on the free plan: app.scrapeless.com
Step 4 — Wait for the stream, then extract the answer
Qwen streams the reply token by token, so reading the DOM too early gives you a partial sentence. The clean completion signal is the per-message footer mounting — the copy control (.copy-response-button) appears only after the stream finishes. Poll for it, then run one extractor.
bash
# Completion poll: the copy control mounts when the answer is fully rendered.
for i in 1 2 3 4 5 6 7 8; do
DONE=$(scrapeless-scraping-browser --session-id $SESSION eval '
document.querySelector(".chat-response-message .copy-response-button") ? "done" : "streaming"
' | tail -1 | tr -d '"')
[ "$DONE" = "done" ] && break
sleep 1
done
# Extract: answer body, reasoning flag, and any source links — guarded per field.
scrapeless-scraping-browser --session-id $SESSION eval '
(function(){
const msg = document.querySelector(".qwen-chat-message-assistant, .chat-response-message");
if (!msg) return JSON.stringify({ answer: null });
const body = msg.querySelector(".response-message-content.phase-answer, .custom-qwen-markdown");
const reasoning = msg.querySelector(".qwen-chat-thinking-status-card-title-text");
const cites = Array.from(msg.querySelectorAll(".qwen-markdown a[href^=\"http\"]"))
.map(a => ({ url: a.href, text: a.textContent.trim().slice(0, 80) }));
return JSON.stringify({
url: location.href,
reasoning: reasoning ? reasoning.textContent.trim() : null,
answer: body ? body.textContent.trim() : null,
citations: cites,
});
})()
'
On the verification run this returned the real answer text The capital of France is Paris., a reasoning value of Thought completed (the collapsed reasoning card was present), and an empty citations array — a short factual prompt does not push Qwen to ground with web sources.
Selector notes:
- The assistant message wraps in
.qwen-chat-message-assistant(also reachable via.chat-response-message); the answer text lives in.response-message-content.phase-answer, rendered as.qwen-markdownparagraphs. - The reasoning card (
.qwen-chat-thinking-status-card-title-text) is a sibling of the answer body, not a child of it — query it separately so the "Thought completed" label never lands in youranswerfield. - Treat
citationsas nullable. It populates only when Qwen shows source links for a web-grounded prompt.
Step 5 — Authenticated, multi-turn sessions (prerequisite)
The guest surface answers exactly one question. After the first reply, Qwen raises a "Welcome" modal — "Login or sign up to chat with Qwen, upload file and image, generation image or video, and more" — with Log in, Sign up, and Stay logged out buttons. "Stay logged out" lets you keep reading the single answer you already have, but a second turn, conversation history, file upload, and image or video generation all require an account.
That login wall is a prerequisite, not something to fake. To extract a multi-turn conversation:
- Persist the login state, because a session ends when its connection closes. Inject the Qwen session cookies you exported from an account you own with the CLI's
cookies set, or save the credentials once to the Auth Vault (auth save <name>) and replay them withauth login <name>on a fresh session — see the Scrapeless docs for the cookie and auth flags. - Reuse that state on every call instead of re-authenticating per turn.
- Drive the conversation with the same type → click → wait → extract loop from Steps 3–4; each new turn appends another
.qwen-chat-message-assistantnode you read the same way. For a multi-turn session that must outlive a single connection, the@scrapeless-ai/sdkTypeScript path holds the persistent connection the CLI does not.
Keep credentials in environment variables or your secret manager, never in the script. A single-turn guest answer needs none of this; reach for the authenticated flow only when the pipeline genuinely needs more than one turn.
Step 6 — Scaling: isolate per-worker CLI state
Running several Qwen jobs at once on one host needs care, because the CLI shares daemon state across shells. The primitives that hold up under parallel load:
- Single-shell chaining. Run a job's whole sequence in one atomic shell invocation so other workers cannot interleave between your steps. One caveat on the chain operator:
openexits non-zero even on a successful navigation (some pages make the underlyingpage.gotothrow after the page is already usable), so separate it with;rather than&&and probe state witheval 'location.href'instead of gating on its exit code —new-session && open "https://chat.qwen.ai/" ; wait 4000 && type … && click … && eval …. That single-shell atomicity is the load-bearing primitive. - Unique session names per worker. The daemon shares state across shells, so a unique session name keeps one worker's calls from routing into another's session.
- Cap at ~3 concurrent workers per host. Beyond that, contention compounds. For more fan-out, shard workers across separate hosts — daemon state is per-host, not per-account.
For a steady eval pipeline, sequential-per-host is simple and plenty: one Qwen question at a time, queue the rest.
What You Get Back
The Step 4 extractor returns url, reasoning, answer, and citations; the query, model, and authenticated fields are enriched around it — the prompt you sent, the banner model label read in Step 2, and whether the session was logged in. Every value below is from a real capture.
json
{
"query": "What is the capital of France? Answer in one short sentence.",
"url": "https://chat.qwen.ai/c/guest",
"model": "Qwen3.7-Plus", // read from the banner model label; pin it so eval rows stay comparable
"reasoning": "Thought completed", // null when the query triggered no reasoning pass
"answer": "The capital of France is Paris.",
"citations": [], // populated only when Qwen grounds with web sources
"authenticated": false // guest = one turn; true once you reuse a logged-in session
}
Honest observations:
- The answer streams in, so read it only after the response footer mounts (Step 4) — otherwise you capture a half-written sentence.
reasoningisnullfor many short factual prompts. Qwen only renders the "Thought completed" card when it ran a reasoning pass, so use it as a signal, not a guarantee.citationsstays empty unless the prompt pushes Qwen to ground its answer with web sources. Guard the field as nullable downstream.authenticated: falseis a valid state, not a failure — the guest surface answers one question, and the login wall (Step 5) is the boundary for anything more.- The banner model label reflects whichever model Qwen Studio defaulted to; record it per row so a model swap does not silently mix your eval set.
Conclusion
Scraping Qwen comes down to treating chat.qwen.ai as the streaming app it is: mint a residential cloud session, open Qwen Studio, type into the one composer, wait for the answer footer to mount, and read the reply out of the hydrated DOM. Guard every field on a real signal — the answer body separate from the reasoning card, citations as nullable — so your schema stays trustworthy when Qwen ships a UI change. Remember the boundary: a guest session is good for one clean answer, and everything past it lives behind the login wall, so reach for the authenticated, cookie-persisted flow only when multi-turn is genuinely required. For the search-and-AI-answer side of the same pattern, see how to scrape Google Search results with the Scrapeless Scraping Browser, and compare runtimes and plans on the pricing page.
Ready to Build Your AI-Powered Data Pipeline?
Join our community to claim a free plan and connect with developers building LLM-answer extraction pipelines: Discord · Telegram.
Sign up at app.scrapeless.com for free Scraping Browser runtime and adapt the patterns above to the Qwen prompts, locales, and authenticated conversations your pipeline needs.
FAQ
Q: Is scraping Qwen legal?
A: Collecting publicly visible answers for analytics, model evaluation, and research is broadly permitted in most jurisdictions, but Qwen's and Alibaba's terms of service still apply, and laws vary by region. Review the target terms before commercial deployment and consult counsel, especially around storing generated content or anything that touches personal data.
Q: Do I need a proxy?
A: Yes. chat.qwen.ai personalizes and rate-limits by IP, and a single raw IP attracts throttling quickly. Pin residential egress with --proxy-country (Step 1); match the country to the locale you want Qwen to answer in.
Q: Qwen shows a "Welcome / Log in" wall — how do I get a clean render?
A: For a single answer, the guest surface works without login — warm the session by opening https://chat.qwen.ai/ first and confirming the composer mounted (Step 2) before you type, and keep residential egress pinned. Only the multi-turn path needs an account; that is the authenticated prerequisite in Step 5, where injected cookies or the Auth Vault hold the login across calls.
Q: The selectors stopped matching after a Qwen update — what now?
A: Qwen Studio rotates class names across releases. Re-discover the live DOM with get html "main" and tighten your selectors against what is actually rendered. Lean on the stable anchors: the single composer textarea.message-input-textarea, the assistant wrapper .qwen-chat-message-assistant, and the answer body .response-message-content.phase-answer.
Q: How many Qwen sessions can I run in parallel?
A: Keep it to about three workers per host, chain each job's CLI calls in a single shell, and give every worker a unique session name (Step 6). For more throughput, shard across hosts rather than stacking workers on one.
Q: Can I do this without an AI agent?
A: Yes. The bash above runs end to end on its own. The skill simply lets your agent drive the same open → submit → wait → extract loop from a one-line prompt, which is the recommended path but not a requirement.
Q: How do I capture Qwen's reasoning trace?
A: Detect the reasoning card with .qwen-chat-thinking-status-card-title-text; the collapsed "Thought completed" label marks that a reasoning pass ran. Query it separately from .response-message-content.phase-answer so the reasoning text and the final answer stay in distinct fields, and expand the card if you need the chain itself.
Q: Will Qwen answer in Chinese or English?
A: It depends on the prompt language and the account or proxy locale. Ask in the target language and pin a matching --proxy-country to keep answers consistent across a multilingual eval set.
At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.



