🎯 A customizable, anti-detection cloud browser powered by self-developed Chromium designed for web crawlers and AI Agents.👉Try Now
Back to Blog

Scrapeless LLM Chat Scraper

Michael Lee
Michael Lee

Expert Network Defense Engineer

10-Dec-2025

As AI search replaces traditional search engines, more user queries, content, and decision-making happen inside models such as ChatGPT, Perplexity, Copilot, Gemini, and Google AI Overviews.
Brands and teams need a way to collect, analyze, and monitor real-time insights from these AI engines—including prompts, answers, citations, rankings, trends, and competitor mentions.

The LLM Chat Scraper API is built for exactly this purpose.

It provides a unified scraping interface to extract structured, real-time data from all major AI models—allowing you to use the results for GEO (Generative Engine Optimization), competitor monitoring, content strategy optimization, and search intelligence.


Getting Started

Using the LLM Chat Scraper API consists of two simple steps:


Step 1: Create a Task

Send a POST request to create a scraping task.
If webhook.url is specified, the result will be automatically pushed when the task completes.

Request Example

bash Copy
curl '{api_host}/api/v2/scraper/request' \
--header 'Content-Type: application/json' \
--header 'x-api-token: {you_api_key}' \
--data '{
  "actor": "scraper.chatgpt",
  "input": {
    "prompt": "Most reliable proxy service for data extraction",
    "country": "US",
    "web_search": true
  },
  "webhook": {
    "url": "http://www.youwebhook.com"
  }
}'

Step 2: Retrieve the Result

Results are stored for 5 minutes. Make sure to fetch them promptly.

Request Example

bash Copy
curl --request GET '{api_host}/api/v2/scraper/result/{task_id}' \
--header 'Content-Type: application/json' \
--header 'x-api-token: {you_api_key}'

Common Parameters

Parameter Type Required Description
actor string true Scraper type (e.g., scraper.chatgpt)
webhook object false Webhook configuration
webhook.url string false URL to push task results to
input object true Task-specific input fields

Result Data Structure

Field Type Required Description
status string true Task status: pending / running / success / failed
message string false Error message (if any)
task_result object false Final result fields (vary by actor)

Webhook Push Format

If webhook.url is specified, the API sends the result via POST.

Field Type Required Description
task_id string true Unique Task ID
status string true success or failed
input string true Original request parameters as JSON string
task_result object false Result payload

HTTP Status Codes

Status Code Description
200 Successfully retrieved result
201 Task created successfully
202 Task still running
400 Bad request
410 Task expired (stored for 12 hours)
429 Too many requests

Scrapers Overview

Below are the supported AI model scrapers and their data formats.


1. ChatGPT Scraper

Body Parameters

Parameter Type Required Description
prompt string true User prompt
country string true Country/Region
web_search boolean false Enable built-in browser search

Response Fields

Field Description
prompt Original prompt
result_text Markdown-formatted response
model Model used (e.g., gpt-5-1)
web_search Whether search was enabled
links Extracted links
search_result Web search results
content_references Source citations

2. Perplexity Scraper

Key Response Fields

  • prompt
  • result_text
  • related_prompt (related questions)
  • web_results (title, URL, snippet)
  • media_items (videos, maps, images)
  • locations (lat/lng, description, categories, address)

Supports rich structured data for travel, local info, news, and trending topics.


3. Copilot Scraper

Supports multiple modes:
search, smart, chat, reasoning, study

Body Parameters

Parameter Description
prompt Input prompt
country JP and TW not supported
mode search / smart / chat / reasoning / study

Response Fields

  • result_text
  • prompt
  • mode
  • links
  • citations

4. Gemini Scraper

Response Fields

  • result_text
  • prompt
  • citations (favicon, highlights, snippet, website_name)

Supports rich citation structures similar to Google Gemini responses.


5. Google AI Mode Scraper

Used for scraping Google AI Overviews / AIO responses.

Response Fields

Field Description
result_text Main AI answer
result_html Raw HTML
raw_url Source URL
citations Citation data with thumbnails
search_result Traditional search results (if available)

Help & FAQ

Billing

If the result is generated but not retrieved within 5 minutes, the request is still billed.
To avoid waste:

  • Retrieve results immediately, or
  • Configure a webhook to auto-receive results

Data Source

We only scrape public, login-free accessible data, ensuring compliance and privacy protection.


Supported Countries / Regions

(Partial list below)

Country / Region Code
Austria AT
Australia AU
Belgium BE
Japan JP
Singapore SG
Taiwan TW
United States US

Full list with 195+ countries is available on request.


Conclusion

The LLM Chat Scraper API gives teams the ability to:

  • Monitor brand mentions across all AI chat platforms
  • Track competitor presence and ranking in AI answers
  • Analyze model outputs, citations, and trends
  • Build GEO (Generative Engine Optimization) strategies
  • Automate real-time intelligence pipelines
  • Access structured data from the entire AI search ecosystem

It is more than a scraper—it's a data infrastructure layer for the AI Search Era.

Contact us to unlock the full GEO data solution —
so every piece of content is backed by data, aligned with algorithm behavior, and positioned for measurable growth.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Most Popular Articles

Catalogue