How to Monitor Amazon Rufus Recommendations Over Time
Expert Network Defense Engineer
Key Takeaways:
- Amazon Rufus recommendations are structured product data, not a chat transcript. One call to the
scraper.amazonactor withtype: rufusreturns aproductsarray, each entry carrying an ASIN, title, price, rating, and the section label Rufus grouped it under. - Share-of-recommendation is the metric this pipeline produces. Tracking which ASINs Rufus surfaces for a fixed query set over time turns a conversational shelf into a measurable visibility signal β the Rufus equivalent of share-of-voice.
- Rufus splits its picks into labelled sections. The response groups products across
content_blockssuch as "Top Picks β Best ANC" and "Great Value Picks", so every product carries both an overall rank and the section heading that framed it. related_questionsexpands the seed query set on its own. Each capture returns the follow-up questions Rufus suggests for the query, and those feed straight back into the next run's prompt list.- Every field is nullable and the answer is generated per session. A query can return no Rufus answer for a given region, and a product slot can arrive without a
boughtcount β so the pipeline stores per-run snapshots and reads the series, never a single call. - The pipeline reduces to append-only snapshots plus a diff. Each run writes one JSONL record keyed by query and capture time; diffing consecutive records reports which ASINs entered or dropped between runs.
- Free to start. New Scrapeless accounts include free trial credits β sign up at app.scrapeless.com.
Introduction: the buying shelf moved into the assistant
Amazon Rufus answers shopping questions with a ranked list of products. A shopper asks for the best noise-cancelling headphones, and Rufus returns grouped picks β "Top Picks", "Best for Apple Users", "Great Value Picks" β each with a price, a rating, and a buy link, inside the assistant and before any search-results page loads. For a brand, the question is no longer where a product ranks on the results grid; it is whether Rufus names the product at all, in which section, and at what rank.
That shelf is hard to watch over time. Rufus generates its answer per session, the picks change by query and by region, and the product cards resolve inside a conversational surface that fights automation. Reading it by eye once tells you nothing; the signal is how the recommendation set moves week over week.
This guide builds a monitoring pipeline on top of the Scrapeless Scraping API: a fixed query set goes in, the scraper.amazon actor returns Rufus's recommendations as structured products, and the pipeline extracts ASIN, rank, and section, maps each ASIN to a brand, stores a per-run snapshot, and diffs the snapshots to report share-of-recommendation. It pairs naturally with brand visibility in Google AI Overviews, which tracks the same recommendation question on the search side.
Pipeline at a glance
The whole system is six stages, end to end:
- Define a query set β a fixed list of buying questions, expanded by the
related_questionseach capture returns. - Capture per query β POST each query to the
scraper.amazonactor withtype: rufus; an unsupported region returns no Rufus answer, which the pipeline records and skips. - Extract products β walk
content_blocks, pulling each product's ASIN, overall rank, and section label. - Map ASIN β brand β resolve a brand from the product title with a small heuristic so share can be aggregated above the ASIN level.
- Store a per-run snapshot β append one JSONL record per query keyed by query and capture time; never overwrite.
- Diff over time β compare consecutive snapshots to report share-of-recommendation and which ASINs entered or dropped.
Stages 1β4 run on every query in every cycle; Stages 5β6 turn those captures into a time series. The sections below build each stage in order.
Prerequisites
- Python 3.10 or newer (the code below uses only the standard library plus
requests) - A Scrapeless account and API key β sign up at app.scrapeless.com
- The key exported as
SCRAPELESS_API_KEY - Basic familiarity with the terminal and JSON
Stage 1 β Define the query set
A monitoring program is only as good as its query set. Start with the buying questions that matter for the category you track β phrase them the way a shopper asks Rufus, with clear purchase intent.
python
SEED_QUERIES = [
"best noise cancelling headphones",
"best wireless earbuds for travel",
"best budget over-ear headphones",
]
Each Rufus capture also returns a related_questions list β the follow-ups Rufus suggests for that query. Feeding those back into the set lets the query list grow toward the questions shoppers actually ask, instead of staying frozen at your initial guesses.
python
def expand_queries(result: dict) -> list[str]:
"""Pull the follow-up questions Rufus suggested for a captured query."""
return result.get("related_questions") or []
For the headphones query, that field came back as ["Compare Sony XM6 vs Bose QC Ultra 2", "Best headphones for travel and flights", "Best noise cancelling earbuds instead", "Are any of these on sale?"]. Add the questions you want to track to the seed set and dedupe; treat the rest as candidates to review before promoting them into the monitored set.
Stage 2 β Capture Rufus for each query
A single POST to /api/v1/scraper/request with the scraper.amazon actor and type: rufus returns the recommendation set. The actor renders the Rufus surface server-side and parses it into a JSON response, so there is no browser, proxy, or session to manage on your side.
bash
# Amazon Rufus recommendations via the Scrapeless Scraping API (scraper.amazon, type: rufus).
# Requires SCRAPELESS_API_KEY in the environment.
curl -sS -X POST https://api.scrapeless.com/api/v1/scraper/request \
-H "Content-Type: application/json" \
-H "x-api-token: ${SCRAPELESS_API_KEY}" \
-d '{
"actor": "scraper.amazon",
"input": {
"type": "rufus",
"keywords": "best noise cancelling headphones",
"domain": "www.amazon.com"
}
}'
# Pipe to: | jq '.result.products' for the flat recommendation list.
The same call in Python reads the key from the environment and returns the result object. Rufus is generated per session, so a query can come back without an answer β Amazon returns a region failure for an unsupported store rather than recommendations. Treat a persistently empty result as no Rufus answer for this query/region, record it, and move to the next query; it is not something to send again.
python
import os
import requests
ENDPOINT = "https://api.scrapeless.com/api/v1/scraper/request"
def capture_rufus(query: str, domain: str = "www.amazon.com") -> dict:
resp = requests.post(
ENDPOINT,
headers={
"Content-Type": "application/json",
"x-api-token": os.environ["SCRAPELESS_API_KEY"],
},
json={"actor": "scraper.amazon", "input": {"type": "rufus", "keywords": query, "domain": domain}},
timeout=180,
)
resp.raise_for_status()
return resp.json().get("result", {}) or {}
If the region you target has no Rufus surface, the actor reports the failure for that store and the result is empty β pin the region you actually want to monitor and compare like with like, because a US run and a non-US run are different datasets.
Stage 3 β Extract products with rank and section
The recommendation set lives in two places in the response. result.products is the flat list of every recommended item; result.content_blocks is the same items grouped into the labelled sections Rufus rendered (type: "product_section", each with a category heading and its own products array). Reading the blocks preserves both the overall rank and the section that framed each pick.
python
def extract_products(result: dict) -> list[dict]:
"""Flatten Rufus content_blocks into rows of asin, rank, section, brand."""
rows = []
rank = 0
for block in result.get("content_blocks") or []:
if block.get("type") != "product_section":
continue
section = block.get("category")
for product in block.get("products") or []:
asin = product.get("asin")
title = product.get("title")
if not asin or not title: # half-resolved slot β treat as nullable, skip
continue
rank += 1
rows.append({
"asin": asin,
"title": title,
"section": section,
"rank": rank,
"price": product.get("price"),
"rating": product.get("rating"),
"brand": brand_from_title(title),
})
return rows
The bought, original_price, and delivery fields are present on some products and absent on others, so read each with .get() and treat a missing field as nullable rather than assuming it is there.
Map ASIN to brand
Share-of-recommendation is more useful aggregated to the brand than left at the ASIN, because one brand often appears several times across sections. A title-based heuristic covers the common case: match a known brand if the title names one, otherwise fall back to the first token of the title.
python
KNOWN_BRANDS = ("Sony", "Bose", "Apple", "Sennheiser", "Beats", "JBL", "Anker")
def brand_from_title(title: str) -> str | None:
if not title:
return None
lowered = title.lower()
for brand in KNOWN_BRANDS:
if brand.lower() in lowered:
return brand
return title.split()[0] # fallback: leading word of the title
Keep the KNOWN_BRANDS list scoped to the category you monitor; the fallback handles the long tail without a lookup table.
Get your API key on the free plan: app.scrapeless.com
Stage 4 β Store a per-run snapshot
The pipeline is append-only: every capture writes one JSONL record keyed by the query and the capture time, and nothing is ever overwritten. That gives you the full history to diff against, and it means a bad or empty run never destroys an earlier good one.
python
import json
import time
def append_snapshot(path: str, query: str, rows: list[dict]) -> dict:
record = {"query": query, "captured_at": int(time.time()), "products": rows}
with open(path, "a", encoding="utf-8") as handle:
handle.write(json.dumps(record, ensure_ascii=False) + "\n")
return record
Using an integer epoch for captured_at keeps each record self-describing and sortable without a separate index. To load the history back for a given query, read the file line by line and filter on the query key β one pass yields every snapshot in capture order.
Stage 5 β Diff and report share-of-recommendation
Two read-side functions turn the snapshot history into the metrics. Share-of-recommendation counts how often each brand appears across a run and normalizes to a percentage; the diff compares two runs' ASIN sets to show movement.
python
from collections import Counter
def share_of_recommendation(rows: list[dict]) -> dict[str, float]:
counts = Counter(row["brand"] for row in rows if row.get("brand"))
total = sum(counts.values())
if not total:
return {}
return {brand: round(100 * n / total, 1) for brand, n in counts.most_common()}
def diff_runs(prev_rows: list[dict], curr_rows: list[dict]) -> dict[str, list[str]]:
prev = {row["asin"] for row in prev_rows}
curr = {row["asin"] for row in curr_rows}
return {
"entered": sorted(curr - prev),
"dropped": sorted(prev - curr),
}
Run share_of_recommendation per query to see which brands own the conversational shelf for that question, or across the whole query set for a category-wide view. Run diff_runs between a query's two most recent snapshots to catch the week a brand entered the picks or fell out of them β the moment worth alerting on.
Scheduling and scaling
The capture loop ties the stages together: for each query, capture, extract, snapshot, and record an empty result as a skip. Run it on a schedule β daily or weekly β and the JSONL file becomes the time series.
python
if __name__ == "__main__":
snapshot_path = "rufus_snapshots.jsonl"
for query in SEED_QUERIES:
result = capture_rufus(query)
rows = extract_products(result)
if not rows:
print(f"{query}: no Rufus answer for this query/region")
continue
append_snapshot(snapshot_path, query, rows)
share = share_of_recommendation(rows)
leaders = ", ".join(f"{b} {pct}%" for b, pct in list(share.items())[:3])
print(f"{query}: {len(rows)} products β {leaders}")
A few practical bounds when you scale the query set:
- Keep concurrency modest β a handful of queries in flight at once is plenty; a monitoring run is steady, not a burst.
- Pin the region per run so the series stays comparable; a query that returns no Rufus answer in one region is recorded as a skip, not mixed into another region's numbers.
- Scope the query set to what you act on. Each query is a billable call, so monitor the questions that drive decisions and let
related_questionssuggest the next ones to add. Plan the cadence against the Scrapeless pricing tiers.
What You Get Back
Each capture yields the flat products list plus the section-grouped content_blocks; the pipeline reduces both to snapshot rows and a share table. The shape below is what the actor returns, trimmed to one section.
json
// Schema is what scraper.amazon (type: rufus) returns; field values are an illustrative sample from a live run (sections and products trimmed).
{
"metadata": { "type": "rufus", "rawUrl": "https://β¦" },
"result": {
"user_query": "best noise cancelling headphones",
"content_blocks": [
{
"type": "product_section",
"category": "Top Picks β Best ANC",
"products": [
{
"asin": "B0GN4CFF6H",
"title": "Sony WH-1000XM6/B Wireless Noise Canceling Headphones",
"price": "$398.00",
"original_price": "$428.00",
"rating": "4.5",
"reviews": "238",
"delivery": "FREE delivery Fri, Jun 19",
"url": "https://β¦"
}
]
}
],
"products": [
{ "asin": "B0GN4CFF6H", "title": "Sony WH-1000XM6/B β¦", "category": "Top Picks β Best ANC" }
],
"related_questions": ["Compare Sony XM6 vs Bose QC Ultra 2", "Are any of these on sale?"]
}
}
After Stages 3β5, one snapshot record and its share table look like this:
json
// Pipeline output β illustrative sample (schema real, ASIN/brand/rank values illustrative): one append-only snapshot row plus the share-of-recommendation tally.
{
"snapshot": {
"query": "best noise cancelling headphones",
"captured_at": 1781716376,
"products": [
{ "asin": "B0GN4CFF6H", "rank": 1, "section": "Top Picks β Best ANC", "brand": "Sony" },
{ "asin": "B0FDKR293G", "rank": 2, "section": "Top Picks β Best ANC", "brand": "Bose" },
{ "asin": "B0GSS4SGZR", "rank": 3, "section": "Best for Apple Users", "brand": "Apple" }
]
},
"share_of_recommendation": { "Sony": 33.3, "Bose": 33.3, "Apple": 33.3 }
}
A few honest observations from running it:
- The answer is per session. The same query returns a different recommendation set, and different
related_questions, from one run to the next. Storecaptured_aton every record; the series over time is the signal, not any single capture. - Sections frame the rank. A product's
section(content_blocks[].category) explains why it ranked where it did β "Best for Apple Users" is a different shelf than "Great Value Picks". Carry the section, not just the position. - Fields are nullable.
bought,original_price, anddeliveryappear on some products and not others; a half-resolved slot can arrive without atitleorasin. Read each with.get()and skip the row rather than storing blanks. - Region decides whether there is an answer at all. An unsupported store returns a region failure instead of products. Pin the region you monitor and record the misses as skips.
Handling this responsibly
This pipeline reads only the public product recommendations Rufus shows any shopper β ASINs, titles, prices, ratings, and the section labels. Keep it to that public surface: collect no personal data and no account-gated content, respect Amazon's terms of service and robots directives, and store only the product fields the monitoring program needs. Share-of-recommendation is a brand-visibility metric built from public listings, nothing more.
Conclusion: a conversational shelf as a time series
Monitoring Amazon Rufus reduces to one loop: capture each query against the scraper.amazon actor with type: rufus, extract ASIN plus rank plus section from content_blocks, map each ASIN to a brand, append a per-run snapshot, and diff the snapshots for share-of-recommendation. Pin the region, treat every field as nullable, record a missing answer as a skip, and let related_questions grow the query set. The same monitoring shape applies to the other answer surfaces β pairing Rufus with scraping Google's AI Overviews gives one program across both shopping assistants and search. The actor, endpoint, and field names here are confirmed against the live Scrapeless Scraping API reference.
Ready to Build Your AI-Recommendation Monitoring Pipeline?
Join our community to claim a free plan and connect with developers building AI-answer data pipelines: Discord Β· Telegram.
Sign up at app.scrapeless.com for free trial credits and point the query set above at the Rufus categories and regions your brand-visibility program tracks.
FAQ
Q: Is monitoring Amazon Rufus recommendations legal?
The data captured is the publicly visible product recommendations Rufus shows any shopper. As with any scraping, the legality depends on jurisdiction and use β review the relevant terms and consult counsel before building on it, and collect only public product data, never personal or account-gated data.
Q: Why does a query return no Rufus answer?
Two causes. The query may not be transactional enough β phrase it as a buying question with clear product intent. Or the region you targeted has no Rufus surface, in which case the actor reports a region failure for that store; pin a supported region and record the miss as a skip.
Q: Do I need a proxy or a browser?
No. Rendering, region handling, and parsing run server-side. You send one POST with an x-api-token header and read JSON back; the actor returns the recommendation set already structured.
Q: How do I get the rank and the section a product appeared in?
Walk result.content_blocks: each product_section block carries a category heading and its own products array. Counting products as you flatten the blocks gives the overall rank, and the block's category gives the section β both worth storing per snapshot.
Q: What is share-of-recommendation?
It is the percentage of recommended slots a brand occupies across a query or a query set, aggregated from the captured products. Tracked over time, it shows whether a brand is gaining or losing presence on the Rufus shelf β the conversational-shopping equivalent of share-of-voice.
Q: Why store snapshots instead of a single current view?
Rufus generates its answer per session, so any one capture is a point in time. Append-only snapshots keyed by query and capture time give you the history to diff, so you can report which ASINs entered or dropped between runs rather than guessing from a single response.
Q: How many queries can I monitor at once?
Keep concurrency modest β a handful of queries in flight is enough for a steady monitoring run. Scope the set to the questions you act on and let related_questions suggest the next ones to add, so each billable call earns its place.
At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.



