How to Monitor Amazon Rufus Recommendations Over Time

Michael Lee

Expert Network Defense Engineer

17-Jun-2026

Key Takeaways:

Amazon Rufus recommendations are structured product data, not a chat transcript. One call to the scraper.amazon actor with type: rufus returns a products array, each entry carrying an ASIN, title, price, rating, and the section label Rufus grouped it under.
Share-of-recommendation is the metric this pipeline produces. Tracking which ASINs Rufus surfaces for a fixed query set over time turns a conversational shelf into a measurable visibility signal — the Rufus equivalent of share-of-voice.
Rufus splits its picks into labelled sections. The response groups products across content_blocks such as "Top Picks – Best ANC" and "Great Value Picks", so every product carries both an overall rank and the section heading that framed it.
related_questions expands the seed query set on its own. Each capture returns the follow-up questions Rufus suggests for the query, and those feed straight back into the next run's prompt list.
Every field is nullable and the answer is generated per session. A query can return no Rufus answer for a given region, and a product slot can arrive without a bought count — so the pipeline stores per-run snapshots and reads the series, never a single call.
The pipeline reduces to append-only snapshots plus a diff. Each run writes one JSONL record keyed by query and capture time; diffing consecutive records reports which ASINs entered or dropped between runs.
Free to start. New Scrapeless accounts include free trial credits — sign up at app.scrapeless.com.

Introduction: the buying shelf moved into the assistant

Amazon Rufus answers shopping questions with a ranked list of products. A shopper asks for the best noise-cancelling headphones, and Rufus returns grouped picks — "Top Picks", "Best for Apple Users", "Great Value Picks" — each with a price, a rating, and a buy link, inside the assistant and before any search-results page loads. For a brand, the question is no longer where a product ranks on the results grid; it is whether Rufus names the product at all, in which section, and at what rank.

That shelf is hard to watch over time. Rufus generates its answer per session, the picks change by query and by region, and the product cards resolve inside a conversational surface that fights automation. Reading it by eye once tells you nothing; the signal is how the recommendation set moves week over week.

This guide builds a monitoring pipeline on top of the Scrapeless Scraping API: a fixed query set goes in, the scraper.amazon actor returns Rufus's recommendations as structured products, and the pipeline extracts ASIN, rank, and section, maps each ASIN to a brand, stores a per-run snapshot, and diffs the snapshots to report share-of-recommendation. It pairs naturally with brand visibility in Google AI Overviews, which tracks the same recommendation question on the search side.

Pipeline at a glance

The whole system is six stages, end to end:

Define a query set — a fixed list of buying questions, expanded by the related_questions each capture returns.
Capture per query — POST each query to the scraper.amazon actor with type: rufus; an unsupported region returns no Rufus answer, which the pipeline records and skips.
Extract products — walk content_blocks, pulling each product's ASIN, overall rank, and section label.
Map ASIN → brand — resolve a brand from the product title with a small heuristic so share can be aggregated above the ASIN level.
Store a per-run snapshot — append one JSONL record per query keyed by query and capture time; never overwrite.
Diff over time — compare consecutive snapshots to report share-of-recommendation and which ASINs entered or dropped.

Stages 1–4 run on every query in every cycle; Stages 5–6 turn those captures into a time series. The sections below build each stage in order.

Prerequisites

Python 3.10 or newer (the code below uses only the standard library plus requests)
A Scrapeless account and API key — sign up at app.scrapeless.com
The key exported as SCRAPELESS_API_KEY
Basic familiarity with the terminal and JSON

Stage 1 — Define the query set

A monitoring program is only as good as its query set. Start with the buying questions that matter for the category you track — phrase them the way a shopper asks Rufus, with clear purchase intent.

python Copy

SEED_QUERIES = [
    "best noise cancelling headphones",
    "best wireless earbuds for travel",
    "best budget over-ear headphones",
]

Each Rufus capture also returns a related_questions list — the follow-ups Rufus suggests for that query. Feeding those back into the set lets the query list grow toward the questions shoppers actually ask, instead of staying frozen at your initial guesses.

python Copy

def expand_queries(result: dict) -> list[str]:
    """Pull the follow-up questions Rufus suggested for a captured query."""
    return result.get("related_questions") or []

For the headphones query, that field came back as ["Compare Sony XM6 vs Bose QC Ultra 2", "Best headphones for travel and flights", "Best noise cancelling earbuds instead", "Are any of these on sale?"]. Add the questions you want to track to the seed set and dedupe; treat the rest as candidates to review before promoting them into the monitored set.

Stage 2 — Capture Rufus for each query

A single POST to /api/v1/scraper/request with the scraper.amazon actor and type: rufus returns the recommendation set. The actor renders the Rufus surface server-side and parses it into a JSON response, so there is no browser, proxy, or session to manage on your side.

bash Copy

# Amazon Rufus recommendations via the Scrapeless Scraping API (scraper.amazon, type: rufus).
# Requires SCRAPELESS_API_KEY in the environment.
curl -sS -X POST https://api.scrapeless.com/api/v1/scraper/request \
  -H "Content-Type: application/json" \
  -H "x-api-token: ${SCRAPELESS_API_KEY}" \
  -d '{
    "actor": "scraper.amazon",
    "input": {
      "type": "rufus",
      "keywords": "best noise cancelling headphones",
      "domain": "www.amazon.com"
    }
  }'
# Pipe to: | jq '.result.products'  for the flat recommendation list.

The same call in Python reads the key from the environment and returns the result object. Rufus is generated per session, so a query can come back without an answer — Amazon returns a region failure for an unsupported store rather than recommendations. Treat a persistently empty result as no Rufus answer for this query/region, record it, and move to the next query; it is not something to send again.

python Copy

import os
import requests

ENDPOINT = "https://api.scrapeless.com/api/v1/scraper/request"


def capture_rufus(query: str, domain: str = "www.amazon.com") -> dict:
    resp = requests.post(
        ENDPOINT,
        headers={
            "Content-Type": "application/json",
            "x-api-token": os.environ["SCRAPELESS_API_KEY"],
        },
        json={"actor": "scraper.amazon", "input": {"type": "rufus", "keywords": query, "domain": domain}},
        timeout=180,
    )
    resp.raise_for_status()
    return resp.json().get("result", {}) or {}

If the region you target has no Rufus surface, the actor reports the failure for that store and the result is empty — pin the region you actually want to monitor and compare like with like, because a US run and a non-US run are different datasets.

Stage 3 — Extract products with rank and section

The recommendation set lives in two places in the response. result.products is the flat list of every recommended item; result.content_blocks is the same items grouped into the labelled sections Rufus rendered (type: "product_section", each with a category heading and its own products array). Reading the blocks preserves both the overall rank and the section that framed each pick.

python Copy

def extract_products(result: dict) -> list[dict]:
    """Flatten Rufus content_blocks into rows of asin, rank, section, brand."""
    rows = []
    rank = 0
    for block in result.get("content_blocks") or []:
        if block.get("type") != "product_section":
            continue
        section = block.get("category")
        for product in block.get("products") or []:
            asin = product.get("asin")
            title = product.get("title")
            if not asin or not title:  # half-resolved slot — treat as nullable, skip
                continue
            rank += 1
            rows.append({
                "asin": asin,
                "title": title,
                "section": section,
                "rank": rank,
                "price": product.get("price"),
                "rating": product.get("rating"),
                "brand": brand_from_title(title),
            })
    return rows

The bought, original_price, and delivery fields are present on some products and absent on others, so read each with .get() and treat a missing field as nullable rather than assuming it is there.

Map ASIN to brand

Share-of-recommendation is more useful aggregated to the brand than left at the ASIN, because one brand often appears several times across sections. A title-based heuristic covers the common case: match a known brand if the title names one, otherwise fall back to the first token of the title.

python Copy

KNOWN_BRANDS = ("Sony", "Bose", "Apple", "Sennheiser", "Beats", "JBL", "Anker")


def brand_from_title(title: str) -> str | None:
    if not title:
        return None
    lowered = title.lower()
    for brand in KNOWN_BRANDS:
        if brand.lower() in lowered:
            return brand
    return title.split()[0]  # fallback: leading word of the title

Keep the KNOWN_BRANDS list scoped to the category you monitor; the fallback handles the long tail without a lookup table.

Get your API key on the free plan: app.scrapeless.com

Stage 4 — Store a per-run snapshot

The pipeline is append-only: every capture writes one JSONL record keyed by the query and the capture time, and nothing is ever overwritten. That gives you the full history to diff against, and it means a bad or empty run never destroys an earlier good one.

python Copy

import json
import time


def append_snapshot(path: str, query: str, rows: list[dict]) -> dict:
    record = {"query": query, "captured_at": int(time.time()), "products": rows}
    with open(path, "a", encoding="utf-8") as handle:
        handle.write(json.dumps(record, ensure_ascii=False) + "\n")
    return record

Using an integer epoch for captured_at keeps each record self-describing and sortable without a separate index. To load the history back for a given query, read the file line by line and filter on the query key — one pass yields every snapshot in capture order.

Two read-side functions turn the snapshot history into the metrics. Share-of-recommendation counts how often each brand appears across a run and normalizes to a percentage; the diff compares two runs' ASIN sets to show movement.

python Copy

from collections import Counter


def share_of_recommendation(rows: list[dict]) -> dict[str, float]:
    counts = Counter(row["brand"] for row in rows if row.get("brand"))
    total = sum(counts.values())
    if not total:
        return {}
    return {brand: round(100 * n / total, 1) for brand, n in counts.most_common()}


def diff_runs(prev_rows: list[dict], curr_rows: list[dict]) -> dict[str, list[str]]:
    prev = {row["asin"] for row in prev_rows}
    curr = {row["asin"] for row in curr_rows}
    return {
        "entered": sorted(curr - prev),
        "dropped": sorted(prev - curr),
    }

Run share_of_recommendation per query to see which brands own the conversational shelf for that question, or across the whole query set for a category-wide view. Run diff_runs between a query's two most recent snapshots to catch the week a brand entered the picks or fell out of them — the moment worth alerting on.

Scheduling and scaling

The capture loop ties the stages together: for each query, capture, extract, snapshot, and record an empty result as a skip. Run it on a schedule — daily or weekly — and the JSONL file becomes the time series.

python Copy

if __name__ == "__main__":
    snapshot_path = "rufus_snapshots.jsonl"
    for query in SEED_QUERIES:
        result = capture_rufus(query)
        rows = extract_products(result)
        if not rows:
            print(f"{query}: no Rufus answer for this query/region")
            continue
        append_snapshot(snapshot_path, query, rows)
        share = share_of_recommendation(rows)
        leaders = ", ".join(f"{b} {pct}%" for b, pct in list(share.items())[:3])
        print(f"{query}: {len(rows)} products — {leaders}")

A few practical bounds when you scale the query set:

Keep concurrency modest — a handful of queries in flight at once is plenty; a monitoring run is steady, not a burst.
Pin the region per run so the series stays comparable; a query that returns no Rufus answer in one region is recorded as a skip, not mixed into another region's numbers.
Scope the query set to what you act on. Each query is a billable call, so monitor the questions that drive decisions and let related_questions suggest the next ones to add. Plan the cadence against the Scrapeless pricing tiers.

What You Get Back

Each capture yields the flat products list plus the section-grouped content_blocks; the pipeline reduces both to snapshot rows and a share table. The shape below is what the actor returns, trimmed to one section.

json Copy

// Schema is what scraper.amazon (type: rufus) returns; field values are an illustrative sample from a live run (sections and products trimmed).
{
  "metadata": { "type": "rufus", "rawUrl": "https://…" },
  "result": {
    "user_query": "best noise cancelling headphones",
    "content_blocks": [
      {
        "type": "product_section",
        "category": "Top Picks – Best ANC",
        "products": [
          {
            "asin": "B0GN4CFF6H",
            "title": "Sony WH-1000XM6/B Wireless Noise Canceling Headphones",
            "price": "$398.00",
            "original_price": "$428.00",
            "rating": "4.5",
            "reviews": "238",
            "delivery": "FREE delivery Fri, Jun 19",
            "url": "https://…"
          }
        ]
      }
    ],
    "products": [
      { "asin": "B0GN4CFF6H", "title": "Sony WH-1000XM6/B …", "category": "Top Picks – Best ANC" }
    ],
    "related_questions": ["Compare Sony XM6 vs Bose QC Ultra 2", "Are any of these on sale?"]
  }
}

After Stages 3–5, one snapshot record and its share table look like this:

json Copy

// Pipeline output — illustrative sample (schema real, ASIN/brand/rank values illustrative): one append-only snapshot row plus the share-of-recommendation tally.
{
  "snapshot": {
    "query": "best noise cancelling headphones",
    "captured_at": 1781716376,
    "products": [
      { "asin": "B0GN4CFF6H", "rank": 1, "section": "Top Picks – Best ANC", "brand": "Sony" },
      { "asin": "B0FDKR293G", "rank": 2, "section": "Top Picks – Best ANC", "brand": "Bose" },
      { "asin": "B0GSS4SGZR", "rank": 3, "section": "Best for Apple Users", "brand": "Apple" }
    ]
  },
  "share_of_recommendation": { "Sony": 33.3, "Bose": 33.3, "Apple": 33.3 }
}

A few honest observations from running it:

The answer is per session. The same query returns a different recommendation set, and different related_questions, from one run to the next. Store captured_at on every record; the series over time is the signal, not any single capture.
Sections frame the rank. A product's section (content_blocks[].category) explains why it ranked where it did — "Best for Apple Users" is a different shelf than "Great Value Picks". Carry the section, not just the position.
Fields are nullable. bought, original_price, and delivery appear on some products and not others; a half-resolved slot can arrive without a title or asin. Read each with .get() and skip the row rather than storing blanks.
Region decides whether there is an answer at all. An unsupported store returns a region failure instead of products. Pin the region you monitor and record the misses as skips.

Handling this responsibly

This pipeline reads only the public product recommendations Rufus shows any shopper — ASINs, titles, prices, ratings, and the section labels. Keep it to that public surface: collect no personal data and no account-gated content, respect Amazon's terms of service and robots directives, and store only the product fields the monitoring program needs. Share-of-recommendation is a brand-visibility metric built from public listings, nothing more.

Conclusion: a conversational shelf as a time series

Monitoring Amazon Rufus reduces to one loop: capture each query against the scraper.amazon actor with type: rufus, extract ASIN plus rank plus section from content_blocks, map each ASIN to a brand, append a per-run snapshot, and diff the snapshots for share-of-recommendation. Pin the region, treat every field as nullable, record a missing answer as a skip, and let related_questions grow the query set. The same monitoring shape applies to the other answer surfaces — pairing Rufus with scraping Google's AI Overviews gives one program across both shopping assistants and search. The actor, endpoint, and field names here are confirmed against the live Scrapeless Scraping API reference.

Ready to Build Your AI-Recommendation Monitoring Pipeline?

Join our community to claim a free plan and connect with developers building AI-answer data pipelines: Discord · Telegram.

Sign up at app.scrapeless.com for free trial credits and point the query set above at the Rufus categories and regions your brand-visibility program tracks.

FAQ

Q: Is monitoring Amazon Rufus recommendations legal?
The data captured is the publicly visible product recommendations Rufus shows any shopper. As with any scraping, the legality depends on jurisdiction and use — review the relevant terms and consult counsel before building on it, and collect only public product data, never personal or account-gated data.

Q: Why does a query return no Rufus answer?
Two causes. The query may not be transactional enough — phrase it as a buying question with clear product intent. Or the region you targeted has no Rufus surface, in which case the actor reports a region failure for that store; pin a supported region and record the miss as a skip.

Q: Do I need a proxy or a browser?
No. Rendering, region handling, and parsing run server-side. You send one POST with an x-api-token header and read JSON back; the actor returns the recommendation set already structured.

Q: How do I get the rank and the section a product appeared in?
Walk result.content_blocks: each product_section block carries a category heading and its own products array. Counting products as you flatten the blocks gives the overall rank, and the block's category gives the section — both worth storing per snapshot.

Q: What is share-of-recommendation?
It is the percentage of recommended slots a brand occupies across a query or a query set, aggregated from the captured products. Tracked over time, it shows whether a brand is gaining or losing presence on the Rufus shelf — the conversational-shopping equivalent of share-of-voice.

Q: Why store snapshots instead of a single current view?
Rufus generates its answer per session, so any one capture is a point in time. Append-only snapshots keyed by query and capture time give you the history to diff, so you can report which ASINs entered or dropped between runs rather than guessing from a single response.

Q: How many queries can I monitor at once?
Keep concurrency modest — a handful of queries in flight is enough for a steady monitoring run. Scope the set to the questions you act on and let related_questions suggest the next ones to add, so each billable call earns its place.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

How to Monitor Amazon Rufus Recommendations Over Time

Key Takeaways:

Introduction: the buying shelf moved into the assistant

Pipeline at a glance

Prerequisites

Stage 1 — Define the query set

Stage 2 — Capture Rufus for each query

Stage 3 — Extract products with rank and section

Map ASIN to brand

Stage 4 — Store a per-run snapshot

Scheduling and scaling

What You Get Back

Handling this responsibly

Conclusion: a conversational shelf as a time series

Ready to Build Your AI-Recommendation Monitoring Pipeline?

FAQ

Most Popular Articles

Scrapeless and Nstbrowser Jointly Establish “Browser Labs”: Launching Strategic Partnership and Comprehensive Cloud Browser Upgrade Plan

How to Enhance Crawl4AI with Scrapeless Cloud Browser

Scrapeless MCP Server Is Officially Live! Build Your Ultimate AI-Web Connector

How to Monitor Amazon Rufus Recommendations Over Time

Key Takeaways:

Introduction: the buying shelf moved into the assistant

Pipeline at a glance

Prerequisites

Stage 1 — Define the query set

Stage 2 — Capture Rufus for each query

Stage 3 — Extract products with rank and section

Map ASIN to brand

Stage 4 — Store a per-run snapshot

Stage 5 — Diff and report share-of-recommendation

Scheduling and scaling

What You Get Back

Handling this responsibly

Conclusion: a conversational shelf as a time series

Ready to Build Your AI-Recommendation Monitoring Pipeline?

FAQ

Most Popular Articles

Scrapeless and Nstbrowser Jointly Establish “Browser Labs”: Launching Strategic Partnership and Comprehensive Cloud Browser Upgrade Plan

How to Enhance Crawl4AI with Scrapeless Cloud Browser

Scrapeless MCP Server Is Officially Live! Build Your Ultimate AI-Web Connector