ç«¶äºåã®ããäŸ¡æ Œèšå®ãã€ãã©ã€ã³ã®æ§ç¯æ¹æ³ïŒ1æ¥ã§8ã€ã®ç«¶åä»ç€Ÿã«ããã£ãŠ5,000ã®SKUã远跡ãã
Advanced Bot Mitigation Engineer
äž»ãªãã€ã³ã:
- ç«¶äºäŸ¡æ Œèšå®ã¯ã補ååé¡ã§ã¯ãªããã¹ã±ããåé¡ã§ããã 8ã€ã®ç«¶åä»ç€Ÿã§4ã€ã®åžå Žã«ããã5,000ã®SKUã远跡ããäŸ¡æ Œèšå®ããŒã ã¯ãæ¯æ¥160,000ã®ãªãŒããåŠçããŠãããæ¡åŒµå¯èœãªã¢ãŒããã¯ãã£ã¯ãURLããšã«1åã®ã¬ã³ããªã³ã°ã³ãŒã«ãšåžå Žããšã«åºå®ããããšã°ã¬ã¹ãããã«åäžã®æšæºåºåã¹ããŒãã§ããã160,000ã®ã¢ãããã¯ãã§ããã§ã¯ãªãã
- åžå Žããšã°ã¬ã¹ã決å®ããã äŸ¡æ Œãé貚ãäŸçµŠç¶æ³ã¯å°åãIPã®è©å€ã«ãã£ãŠå€åãããæž¬å®äžã®åžå Žã«å¯ŸããŠãããã·åœãåºå®ããããšã§ãèšé²ããããã¹ãŠã®äŸ¡æ Œãæ¯èŒå¯èœã«ãªã; åãSKUã«å¯ŸããŠç±³åœãšEUã®ãšã°ã¬ã¹ãæ··åãããšãäœã®æå³ããªãäŸ¡æ Œå±¥æŽãçæãããã
- ç«¶åä»ç€Ÿéã§ã®äžè²«ããã¹ããŒãã åå°å£²æ¥è
ã®DOMã¯ç°ãªãããå庫ããŒãã«ã¯åãã§ãããæœåºæã«æšæºåãã:
{your_sku, competitor, market, price_value, price_currency, availability, promo_state, captured_at}ãæææ±ºå®ã¯çã®HTMLã§ã¯ãªããå庫ãèªã¿åãã - ã¢ã³ãæ€åºã¯ãµãŒããŒãµã€ãã§åŠçãããã åãªã¯ãšã¹ãã¯ãäœå® çšã®ãšã°ã¬ã¹ãJavaScriptã®å®è¡ããã£ã³ã¬ãŒããªã³ãã£ã³ã°ã®ã©ã³ãã åãå©çšããŠScrapelessã¯ã©ãŠãå ã§ã¬ã³ããªã³ã°ãããããã€ãã©ã€ã³ã¯URLãšåœãéããã¬ã³ããªã³ã°ãããHTMLãåãåãããã©ãŠã¶ã®ãã€ããªããããã·ã®ããŒããŒã·ã§ã³ããžãã¯ãããªãã®ãã·ã³äžã§ã®ãµãŒãããŒãã£ã®CDPã¯ã©ã€ã¢ã³ãã¯äžèŠã
- ãã€ãã©ã€ã³ã¯HTMLã§ã¯ãªãdiffã§çµäºããã çã®ã¬ã³ããªã³ã°ããŒãžã¯ã¹ã¯ã©ããã¹ãã¬ãŒãžã§ãããäŸ¡æ ŒããŒã ãäœçšããã·ã°ãã«ã¯ãããªãã®äŸ¡æ Œãšç«¶åä»ç€Ÿã®äŸ¡æ Œã®diffã§ãããåžå ŽããšãSKUããšã« â åäŸ¡æ Œèšå®ã«ãŒã«ãSlackã¢ã©ãŒãããŸãã¯ã¢ããªã¹ãããã·ã¥ããŒãã«è¡šç€ºãããã
- ç¡æã§éå§ã§ããã æ°ããScrapelessã¢ã«ãŠã³ãã«ã¯ç¡æã®ã©ã³ã¿ã€ã ãå«ãŸãã â app.scrapeless.comã§ãµã€ã³ã¢ããã
ã¯ããã«: ãŠã§ãããŒã¿ããç«¶äºäŸ¡æ Œèšå®ã®æ±ºå®ãž
ç«¶äºäŸ¡æ Œèšå®ããŒã ã¯ãé·å¹Žåãå¶çŽã«æ©ãŸãããŠãã: äŸ¡æ Œã¯äŸ¡æ Œèšå®ã®æ±ºå®ãéç¥ããããŒã¿ãã£ãŒããããæ©ãå€åãããå°å£²æ¥è ã¯äžæ©ã§äŸ¡æ Œã·ãŒã«ãæ¹èšãã; BIã¿ã€ã«ã¯48æéåŸã«æŽæ°ããã; ã¢ããªã¹ããã®ã£ãããèŠããšãã«ã¯ãããã¢ãŒã·ã§ã³ãŠã£ã³ããŠã¯çµäºããŠããããŠã§ãããŒã¿ã¯ãã®ã«ãŒããéããããåéå±€ãå€åã®ããŒã¹ã«è¿œãã€ããå庫ãçµåã§ããã¹ããŒãã«ãã£ãŒãããå Žåã®ã¿ã
æ§é çãªèª²é¡ã¯ã補åããŒãžãã¹ã¯ã¬ã€ãã³ã°ãããããšã§ã¯ãªããSKUã®ãã¹ã±ãããç«¶åä»ç€Ÿã®ãã¹ã±ãããåžå Žã®ãã¹ã±ããå šäœã§ãæ¯æ¥ããã¹ãŠã®åžå Žããã¹ãŠã®å°å£²æ¥è ã§ãåã粟床ã§ã¹ã¯ã¬ã€ãã³ã°ãéå¶ããããš â åå°å£²æ¥è ã®DOMã¯å転ããååžå Žã®äŸ¡æ Œã¯ããŒã«ã©ã€ãºãããåãªã¯ãšã¹ãã¯å°å£²æ¥è ã®ã¢ã³ããããå±€ãã¯ãªã¢ããã¯ãªãŒã³ã§ã¬ã³ããªã³ã°ãããHTMLãè¿ãå¿ èŠããããOctoparseã®OptiGroupã±ãŒã¹ã¹ã¿ãã£ã¯ããã®ãã¿ãŒã³ãã¹ã±ãŒã«ã§æãã: 50ã®åäŒç€Ÿãæ°åã®ç«¶åãµã€ããå°åäŸ¡æ Œãéäžåã®äŸ¡æ Œæ±ºå®å±€ã
ãã®ã¬ã€ãã¯ãScrapelessäžã®äŸ¡æ Œã€ã³ããªãžã§ã³ã¹ãã€ãã©ã€ã³ã®åéå±€ã®ã¢ãŒããã¯ãã£ãšPythonã³ãŒããéããŠèª¬æãããåºåã¯å庫ããŒãã«ã«ãã£ãŒãããæšæºåãããNDJSONã¹ããªãŒã ; å ¥åã¯ã¢ããªã¹ããå®çŸ©ãããã¹ã±ãããã¡ã€ã«ã§ããããã¿ãŒã³ã®ããã«äžåºŠèªã¿ããã®åŸã¯å°å£²æ¥è ããšã®ãšã¯ã¹ãã©ã¯ã¿ã倿ŽããŠãã¹ãŠã®ç«¶åä»ç€Ÿã«åå©çšããã
ããã䜿ã£ãŠã§ããããš
- æ¥æ¬¡ç«¶äºãã¹ã±ããã®ãªãŒãã 4åžå Žã«ããã8ã€ã®ç«¶åä»ç€Ÿã§5,000ã®SKUãæ¥æ¬¡ã¹ã±ãžã¥ãŒã«ã§ãå¶éãããã©ã³ã¿ã€ã ãš1ã€ã®æšæºã¹ããŒãã§è¿œè·¡ããã
- åžå Žç¹æã®åäŸ¡æ Œèšå®ã ååžå Žã«ãšã°ã¬ã¹åœãåºå®ããå°å ã®è²·ãç©å®¢ãå®éã«èŠãããŒã«ã©ã€ãºãããäŸ¡æ ŒãååŸããããžãªãã©ãŒã«ããã¯äŸ¡æ Œã§ã¯ãªãã
- ããã¢ç¶æ ã®ç£èŠã è¡šç€ºäŸ¡æ Œãšããã¢ãŒã·ã§ã³ç¶æ ïŒã»ãŒã«äžãå²åŒçãæéå¶éã®ãããžïŒã®äž¡æ¹ããã£ããã£ããå庫ãéåžžã®äŸ¡æ Œãšã¯ãªã¢ã©ã³ã¹ããã·ã¥ã®éããèªèã§ããããã«ããã
- MAPã³ã³ãã©ã€ã¢ã³ã¹ç£æ»ã å°å£²æ¥è ã®è¡šç€ºäŸ¡æ Œãããªãã®MAPïŒæäœåºåäŸ¡æ ŒïŒããªã·ãŒãšæ¯èŒããéåããã£ãã«ç®¡çããŒã ã«éç¥ããã
- æ°è£œåçºå£²ã®è¿œè·¡ã ã«ããŽãªãŒå ã§ç«¶åSKUã®åç»å Žãç£èŠãã; ãã€ãã©ã€ã³ã¯ãç«¶åä»ç€Ÿã¯Xãçºå£²ããããšããŠãããïŒããšããã·ã°ãã«ãšããŠæ©èœããã
- äŸ¡æ ŒåŒŸåæ§ããŒã¿ã»ããã 90æ¥éã®æ¯æ¥ã®ã¹ãããã·ã§ããããSKUã¬ãã«ã§åŒŸåæ§ãèšç®ããããã«åç管çã䜿çšããæç³»åãçæããã
Scrapelessã§ã¯ãé©çšãããæ³åŸãèŠå¶ãããã³ãŠã§ããµã€ãã®ãã©ã€ãã·ãŒããªã·ãŒã峿 Œã«éµå®ããªãããå ¬éãããŠããããŒã¿ã®ã¿ãã¢ã¯ã»ã¹ããŸãããã®æçš¿ã®å 容ã¯ãã¢ç®çã®ã¿ã§ãã
ãªãç«¶äºäŸ¡æ Œèšå®ã«Scrapelessãªã®ã
Scrapelessã¯ãèªå·±éçºããChromiumã䜿çšããŠã¢ã³ãæ€åºã¯ã©ãŠããã©ãŠã¶å ã§åã¿ãŒã²ããURLãã¬ã³ããªã³ã°ããå®äºããHTMLãåäžã®APIã³ãŒã«ã§è¿ããŸããç¹ã«äŸ¡æ Œã€ã³ããªãžã§ã³ã¹ãã€ãã©ã€ã³ã«ãããŠãããã¯æ¬¡ã®ãã®ãæäŸããŸã:
- 195ãåœä»¥äžã®äœå® çšãããã·, ãªã¯ãšã¹ãããšã«åºå®ãããåœã³ãŒãã§ â ãšã°ã¬ã¹ã®å°çã¯åžå Žããšã«1ãã£ãŒã«ãã§ããã
- ã¯ã©ãŠãåŽã®JavaScriptã¬ã³ããªã³ã°ã å°å£²æ¥è
ã®ååããŒãžã¯ReactãŸãã¯Next.jsã¢ããªã§ãããäŸ¡æ ŒèŠçŽ ã¯ãã€ãã¬ãŒã·ã§ã³ã®åŸã«è¡šç€ºãããã
js_render=Trueã¯ãããªãã®ãã€ãã©ã€ã³ããã€ã³ãåŸã®DOMãèªã¿åãããšãæå³ããSSRã·ã§ã«ã§ã¯ãªãã - ãµãŒããŒãµã€ãã®ã¢ã³ããetectionã UAãã¿ã€ã ãŸãŒã³ãWebGLããã£ã³ãã¹ãããã³ãããã¬ã¹ãã©ã°ã¯ããªã¯ãšã¹ãããšã«ã¯ã©ãŠãã§ã©ã³ãã åãããŸããããŒã«ã«ã®ã¹ãã«ã¹ãã©ã°ã€ã³ã®ã¡ã³ããã³ã¹ã¯äžèŠã§ããã©ãŠã¶ã®ãã€ããªãã€ã³ã¹ããŒã«ããå¿ èŠããããŸããã
- ã¹ããŒãã¬ã¹ãªã¯ãšã¹ã圢ç¶ã å補åããŒãžã¯ç¬ç«ããèªã¿åãã§ã: URLãšåœãéä¿¡ãããšãã¬ã³ããªã³ã°ãããHTMLãè¿ãããŸããããã¯æ°åã®ç¬ç«ããSKUã®èªã¿åãã®ãã¹ã±ããã«ã¯ãªãŒã³ã«ããããããŸãã
- å šãã€ãã©ã€ã³çšã®1ã€ã®APIããŒã ã¬ã³ããªã³ã°ãäœå® ãããã·ãããã³SDKã¯ãã¹ãŠåãScrapelessã¢ã«ãŠã³ãã«å¯ŸããŠè«æ±ãããŸã; ãã£ã¢ããšã®çµ±åã¯äžèŠã§ãã
ç¡æãã©ã³ã§APIããŒãååŸããã«ã¯app.scrapeless.comã«ã¢ã¯ã»ã¹ããŠãã ããã
åææ¡ä»¶
- Python 3.10以é
- Scrapelessã¢ã«ãŠã³ããšAPIã㌠â app.scrapeless.comã§ãµã€ã³ã¢ãã
requestsã¹ã¿ã€ã«ã®HTTPããã³CSSã»ã¬ã¯ã¿ã©ã€ãã©ãªã«ç²ŸéããŠããããš- ç«¶åä»ç€Ÿãªã¹ããšSKUãã¹ã±ãããã¡ã€ã«
ãã€ãã©ã€ã³ã¢ãŒããã¯ãã£ã®æŠèŠ
basket.yaml (ã¢ããªã¹ãå®çŸ©ã®å
¥å)
â
âŒ
ââââââââââââââââââââ
â ãªãŒã±ã¹ãã¬ãŒã¿ãŒ â ïŒåžå Žãç«¶åä»ç€ŸãSKUïŒããšã«1ã€ã®ã¿ã¹ã¯; æéã®åæå®è¡
ââââââââ¬ââââââââââââ
â
âŒ
ââââââââââââââââââââ
â Scrapeless â client.universal.scrape(url, country) â äœå®
çšã€ã°ã¬ã¹,
â ïŒã¯ã©ãŠãã¬ã³ããŒïŒ â JSã¬ã³ããªã³ã°ãã¢ã³ããetectionããã¹ãŠãµãŒããŒãµã€ã
ââââââââ¬ââââââââââââ
â ã¬ã³ããªã³ã°ãããHTML
âŒ
ââââââââââââââââââââ
â ããŒãã©ã€ã¶ãŒ â å°å£²æ¥è
ããšã®æœåºåš â æšæºã¹ããŒã
ââââââââ¬ââââââââââââ
â
âŒ
prices.ndjson ïŒïŒè£œåãç«¶åä»ç€Ÿãåžå Žãæ¥ïŒã®åè¡ïŒ
â
âŒ
å庫ã®èªã¿èŸŒã¿ + ããªãã®äŸ¡æ Œãšã®éã + ã¢ã©ãŒã
åã¹ããŒãžã¯Pythonã¢ãžã¥ãŒã«ã§ã; 以äžã®7ã¹ãããã§äžããäžã«æ§ç¯ããŸãã
ã¹ããã1 â Scrapeless SDKã®ã€ã³ã¹ããŒã«
bash
pip install scrapeless lxml pyyaml
scrapelessã¯å
¬åŒPython SDKã§ã; ããŒãžãã¯ã©ãŠãåŽã§ã¬ã³ããªã³ã°ããHTMLãè¿ãããããã©ãŠã¶ã®ãã€ããªããµãŒãããŒãã£ã®èªååã©ã€ãã©ãªãã€ã³ã¹ããŒã«ããå¿
èŠã¯ãããŸãããlxmlã¯ããŒãµãŒã§ã; pyyamlã¯ãã¹ã±ããèšå®ãèªã¿åããŸãã
ã¹ããã2 â ãã¹ã±ããã®å®çŸ©
äŸ¡æ Œèšå®ããŒã ããã®ãã¡ã€ã«ãææããŸããéå±ã«ä¿ã£ãŠãã ãã â åžå Žãç«¶åä»ç€ŸãSKUãããã³ã°ãåè¡ã¯ïŒyour_skuãç«¶åä»ç€Ÿãç«¶åä»ç€Ÿã®URLãåžå ŽïŒããšã«1ã€ã§ã:
yaml
# basket.yaml
markets:
- US
- GB
- DE
- JP
basket:
- your_sku: SKU-1001
name: "Acme Widget Pro"
competitors:
- retailer: target_competitor_a
url:
US: "https://competitor-a.com/p/widget-pro"
GB: "https://competitor-a.co.uk/p/widget-pro"
DE: "https://competitor-a.de/p/widget-pro"
JP: "https://competitor-a.co.jp/p/widget-pro"
- retailer: target_competitor_b
url:
US: "https://competitor-b.com/products/widget-pro"
GB: "https://competitor-b.co.uk/products/widget-pro"
5,000 SKUã®ãã¹ã±ããã¯åã圢ç¶ã§ãå庫ã¯ããªãèªèº«ã®äŸ¡æ Œãã£ãŒãã«å¯ŸããŠyour_skuã§çµåããŸãã
ã¹ããã3 â Scrapelessãä»ããŠè£œåããŒãžãã¬ã³ããªã³ã°
åïŒåžå ŽãSKUïŒããšã«1ã€ã®ã¬ã³ããŒã³ãŒã«ãåœã®ãã³ãäœå®
çšã€ã°ã¬ã¹ãèšå®ããŸã; js_render=Trueã¯ããã¹ããã€ãã¬ãŒã·ã§ã³DOMãè¿ããŸã:
python
import os
from scrapeless import Scrapeless
from scrapeless.types.universal import (
UniversalScrapingRequest, UniversalJsRenderInput, UniversalProxy,
)
client = Scrapeless() # ç°å¢ããSCRAPELESS_API_KEYãèªã¿åããŸã
def scrape_rendered(url: str, market: str) -> str:
"""Scrapelessã¯ã©ãŠãã§1ã€ã®è£œåããŒãžãã¬ã³ããªã³ã°ããHTMLãè¿ããŸãã"""
request = UniversalScrapingRequest(
actor="unlocker.webunlocker",
input=UniversalJsRenderInput(url=url, js_render=True, headless=True),
proxy=UniversalProxy(country=market),
)
return client.universal.scrape(request) # ã¬ã³ããªã³ã°ãããHTMLïŒstrïŒãè¿ããŸã
åœã®ãã³ã¯äž»èŠãªãã£ãŒã«ãã§ããåã補åURLãå°åããšã«ç°ãªãäŸ¡æ Œãé貚ãå¯çšæ§ã®ç¶æ
ãã¬ã³ããªã³ã°ãããããã€ã°ã¬ã¹ããã³ãã³ã°ããããšã§ãèšé²ããããã¹ãŠã®äŸ¡æ Œãåãåžå Žã«ä¿ãããŸããjs_render=Trueã¯ãããŒãžãæç»ããããŸã§åŸ
æ©ããReact/Vue/Next.jsã®å°å£²æ¥è
ã空ã®ã·ã§ã«ã§ã¯ãªãäŸ¡æ ŒèŠçŽ ãè¿ããŸãã
ã¹ããã4 â ãã¹ã±ãããæ©ã
åSKUã¯ç¬ç«ããã¬ã³ããŒã³ãŒã«ã§ããããããã¹ã±ãããŠã©ãŒã¯ã¯åçŽãªã«ãŒãïŒãŸãã¯äžŠè¡åŠççšã®å¶çŽã®ããã¹ã¬ããããŒã«ïŒã§ããã»ãã·ã§ã³ãä¿æããå¿ èŠã¯ãªããããŒã ããŒãžãæž©ããå¿ èŠããããŸãã â ã¯ã©ãŠãã¬ã³ããŒã¯ãªã¯ãšã¹ãããšã«å°å£²æ¥è ã®ã¢ã³ããããå±€ãã¯ãªã¢ããŸã:
python
import yaml
def load_basket(path: str = "basket.yaml") -> dict:
with open(path, encoding="utf-8") as f:
return yaml.safe_load(f)
def walk_basket(basket: dict):
"""ãã¹ãŠã®ãã¹ã±ãããšã³ããªã«å¯Ÿã㊠(your_sku, retailer, market, url, html) ãçæããŸãã"""
for item in basket["basket"]:
for comp in item["competitors"]:
for market, url in comp["url"].items():
html = scrape_rendered(url, market)
yield item["your_sku"], comp["retailer"], market, url, html
5,000 SKU ãã¹ã±ããã®å Žåãscrape_rendered ã concurrent.futures.ThreadPoolExecutor ã§ã©ããããã¯ãŒã«ãŒã®æ°ãã¢ã«ãŠã³ããã©ã³ãèš±å¯ããã¬ãã«ã«å¶éããŸããååŒã³åºãã¯ç¡ç¶æ
ãªã®ã§ãã¯ãŒã«ãŒã远å ããããšã§äžŠè¡åŠçãã¹ã±ãŒã«ããŸã â ç«¶åããå
±æã»ãã·ã§ã³ã¯ãããŸããã
ç¡æãã©ã³ã® API ããŒãååŸããŠãã ãã: app.scrapeless.com
ã¹ããã 5 â æšæºã¹ããŒãã«æœåºãã
åå°å£²æ¥è ã® DOM ã¯ç°ãªããŸãããå庫ããŒãã«ã¯åãã§ãããšãã¹ãã©ã¯ã¿ãŒã®ä»äºã¯ãå°å£²æ¥è ãã¬ã³ããªã³ã°ãããã®ãæ¯ååã圢ã«å€ããããšã§ããåºåã¹ããŒã㯠(your_sku, competitor, market, captured_at) ããšã«1è¡ã§ãïŒ
python
from dataclasses import dataclass, asdict
from datetime import datetime, timezone
from typing import Optional
from lxml import html as lxml_html
@dataclass
class PriceRecord:
your_sku: str
competitor: str
market: str
url: str
price_value: Optional[float]
price_currency: Optional[str]
availability: Optional[str] # "in_stock" | "out_of_stock" | "preorder" | None
promo_state: Optional[str] # "none" | "on_sale" | "clearance" | None
promo_discount_pct: Optional[float]
captured_at: str # ISO-8601 UTC
å°å£²æ¥è ããšã®ãšãã¹ãã©ã¯ã¿ãŒã¯ãåãæ»ãå€ã®åã«æ¥ç¶ããŸãïŒ
python
def extract_competitor_a(html: str, your_sku: str, market: str, url: str) -> PriceRecord:
doc = lxml_html.fromstring(html)
price_el = doc.cssselect("[data-test='price'] .value")
currency_el = doc.cssselect("[data-test='price'] .currency")
availability_el = doc.cssselect("[data-test='availability']")
promo_el = doc.cssselect("[data-test='promo-badge']")
availability = (
"in_stock" if availability_el and "In stock" in availability_el[0].text_content()
else "out_of_stock" if availability_el
else None
)
return PriceRecord(
your_sku=your_sku,
competitor="target_competitor_a",
market=market,
url=url,
price_value=_to_float(price_el[0].text_content()) if price_el else None,
price_currency=currency_el[0].text_content().strip() if currency_el else None,
availability=availability,
promo_state="on_sale" if promo_el else "none",
promo_discount_pct=_to_float(promo_el[0].get("data-discount-pct")) if promo_el else None,
captured_at=datetime.now(timezone.utc).isoformat(),
)
def _to_float(text) -> Optional[float]:
if not text:
return None
cleaned = "".join(c for c in text if c.isdigit() or c == ".")
try:
return float(cleaned)
except (ValueError, TypeError):
return None
ãã¹ãŠã®å°å£²æ¥è
ã¯ããããã® extract_<name> 颿°ãæã¡ããã¹ãŠã®é¢æ°ã¯åã PriceRecord ãè¿ããŸãããªãŒã±ã¹ãã¬ãŒã¿ãŒã¯åå°å£²æ¥è
ã䜿çšãã DOM ãç¥ãã â åã«åŒã³åºã颿°åã ããç¥ã£ãŠããŸãã
ã»ã¬ã¯ã¿èšèšã®æ³šæç¹ïŒ
- å°å£²æ¥è
ãå
¬éãã
[data-test='...']屿§ãåªå ããŸãã ããã¯ããžã¥ã¢ã«ã®ã¯ã©ã¹åã®ããŒããŒã·ã§ã³ã«èããŸã;.text-lg.font-semiboldã®ãããªã¯ã©ã¹ã¯ãæ¯åã®ãªãªãŒã¹ã§å€æŽãããŸãã - æ¬ èœããŠãããã£ãŒã«ã㯠nullable ãšããŠæ±ããŸãã åšåº«åãã®ååã«å¯Ÿãã
NoneäŸ¡æ Œã¯ããŒã¿ã§ããã倱æã§ã¯ãããŸããã - å°å£²æ¥è ãã¬ã³ããªã³ã°ããé貚æååããã£ããã£ããŸãã åžå Žããéè²šãæšæž¬ããªãã§ãã ãã â äžéšã®å°å£²æ¥è ã¯ãè¶å¢ååã®ããã« .de ãã¡ã€ã³ã§ USD ã衚瀺ããŠããŸããããŒãžã«è¡šç€ºãããŠããéãã«ä¿åããŸãããã
ã¹ããã 6 â å庫ããŒãã®ããã« NDJSON ã«ã¹ããªãŒãã³ã°ãã
NDJSON ã«ã¹ããªãŒã æžã蟌ã¿ãè¡ãããã€ãã©ã€ã³ãå®è¡äžã®äžæãçãå»¶ã³ãŠãã¬ã³ãŒãã倱ããªãããã«ããŸããåè¡ã¯1ã€ã®ã¬ã³ããªã³ã° SKU ã§ããããã¡ã€ã«ã¯è¿œèšå°çšã§ãïŒ
python
import json
from pathlib import Path
def append_records(records: list[PriceRecord], out_path: str = "prices.ndjson"):
Path(out_path).parent.mkdir(parents=True, exist_ok=True)
with open(out_path, "a", encoding="utf-8") as f:
for r in records:
f.write(json.dumps(asdict(r)) + "\n")
NDJSON 㯠Snowflake (COPY INTO ... FILE_FORMAT = (TYPE = JSON))ãBigQuery (bq load --source_format=NEWLINE_DELIMITED_JSON)ãRedshiftãClickHouseãDuckDB ã«çŽæ¥ããŒããããŸããããžãã¹ã€ã³ããªãžã§ã³ã¹ã¹ã¿ãã¯ã§æ¢ã«äœ¿çšãããŠãããã®ãéžãã§ãã ãã; ã¹ããŒãã¯åãã§ãã
ã¹ããã 7 â å·®åãèšç®ããäŸ¡æ Œæ±ºå®ãã«ãŒãã£ã³ã°ãã
äŸ¡æ ŒããŒã ãè¡åããä¿¡å·ã¯çã®äŸ¡æ Œã§ã¯ãªã â ç«¶åä»ç€Ÿã®äŸ¡æ Œãšããªãã®äŸ¡æ Œã®å·®ã§ããåžå ŽããšãSKU ããšã«ãå·®åã¯å庫ã«ãããã¹ã¯ã¬ã€ããŒã«ã¯ãããŸããïŒ
sql
-- SKU ããšã®ç«¶åããšã®ãã€ãªãŒäŸ¡æ Œã®ã£ãã
WITH yours AS (
SELECT sku, market, list_price, currency, captured_date
FROM your_internal_prices
WHERE captured_date = CURRENT_DATE
),
theirs AS (
SELECT your_sku, competitor, market, price_value, price_currency,
availability, promo_state, CAST(captured_at AS DATE) AS captured_date
FROM competitor_prices
WHERE CAST(captured_at AS DATE) = CURRENT_DATE
)
SELECT
t.your_sku,
t.competitor,
t.market,
y.list_price AS our_price,
t.price_value AS their_price,
ROUND(100.0 * (y.list_price - t.price_value) / NULLIF(t.price_value, 0), 2)
AS price_gap_pct,
t.availability,
t.promo_state
FROM theirs t
LEFT JOIN yours y
ON y.sku = t.your_sku AND y.market = t.market
WHERE y.list_price IS NOT NULL
AND t.price_value IS NOT NULL
ORDER BY price_gap_pct DESC;
price_gap_pctãäŸ¡æ Œã«ãŒã«ã§å®çŸ©ãããéŸå€ãè¶
ããè¡ãã«ãŒãã£ã³ã°ããŸã:
- ããªãã®äŸ¡æ ŒéŸå€ãè¶ ãã (äŸãã°ããªãŒããŒããã5%以äžé«ã) â äŸ¡æ ŒèŠçŽãã
- MAPéŸå€æªæº â ãã£ã³ãã«ç®¡çãžã®MAPéåã¢ã©ãŒãã
- æšæ¥ä»¥éã®ããã¢ç¶æ ã®å€æŽ â ã«ããŽãªãããŒãžã£ãŒãžã®ç«¶åããã¢éç¥ã
å·®åã¯ãšãªã¯ãã³ã¬ã¯ã·ã§ã³ãšæææ±ºå®ã®éã®å¥çŽã§ããå庫ã¹ããŒããå®å®ããŠããã°ãäŸ¡æ ŒããŒã ã®ããŠã³ã¹ããªãŒã BIã¿ã€ã«ãã¢ã©ãŒããããã³äŸ¡æ Œã«ãŒã«ã¯å°å£²æ¥è ãDOMãå転ãããŠã倿Žãããããšã¯ãããŸãã â 倿Žãããã®ã¯ã¹ããã5ã®å°å£²æ¥è ããšã®æœåºåšã ãã§ãã
è¿ãããå 容
ïŒyour_skuãç«¶åãããŒã±ãããæ¥ïŒããšã«1ã€ã®NDJSONè¡ã次ã®ãããªåœ¢ç¶ã§ã:
json
{
"your_sku": "SKU-1001",
"competitor": "target_competitor_a",
"market": "US",
"url": "https://competitor-a.com/p/widget-pro",
"price_value": 79.99,
"price_currency": "USD",
"availability": "in_stock",
"promo_state": "on_sale",
"promo_discount_pct": 15.0,
"captured_at": "<ISO-8601 UTC timestamp written at read time>"
}
ãã¿ãŒã³ãå®è¡ããéã®æ£çŽãªèгå¯:
- ã¬ã³ããªã³ã°ã®ã¿ã€ãã³ã°ã¯DOMã®ç¹ç°æ§ãããéèŠã§ãã SSRã·ã§ã«ã«å¯ŸããŠå®è¡ãããã»ã¬ã¯ã¿ã¯ãäŸ¡æ ŒèŠçŽ ãæç»ãããåã«ç©ºæååãè¿ããŸãã
js_render=Trueã¯ãäŸ¡æ Œã»ã¬ã¯ã¿ã解決ããããã®ãæã€ãåŸã®æ°ŽåDOMãè¿ããŸãã - é貚ã¯åžå Žãšéè€ããŸããã åœå¢ãè¶ããSKUã¯ãããŒã«ã©ã€ãºããããã¡ã€ã³äžã§ããéçŸå°é貚ã衚瀺ããããšããããŸãã ã¬ã³ããªã³ã°ãããæååãä¿åããå庫ã¬ã€ã€ãŒã«æ£èŠåãããŸãã
- ããã¢ç¶æ
ã«ã¯å°ãªããšã3ã€ã®å€ããããŸãã
noneãon_saleãclearanceã¯ãäŸ¡æ Œæ¹å®ã«ãŒã«ã§ç°ãªãåäœãããŸã â clearanceããŒã¯ããŠã³ã¯ã©ã€ããµã€ã¯ã«ã®çµããã瀺ããããã¢ãŒã·ã§ã³ã®ããã·ã¥ã§ã¯ãããŸããã - å¯çšæ§ã¯äºçªç®ã«ã¢ã¯ã·ã§ã³å¯èœãªãã£ãŒã«ãã§ãã åšåº«åãSKUã®20%ã®äŸ¡æ Œå·®ã¯ãåšåº«ããSKUã®åãå·®ãšã¯ç°ãªãç«¶äºã·ã°ãã«ã§ãã ã©ã¡ããæææ±ºå®ã¬ã€ã€ãŒã«è¡šç€ºããŸãã
- 1ã€ã®ã«ããã«ã«ã¹ããŒããèè·éã®æ±ºå®ã§ãã å°å£²æ¥è ããšã®ãã£ãŒã«ããéè²šã®æ £ç¿ãããã¢åœ¢åŒã¯ç°ãªããŸãããå庫ããŒãã«ã¯ããã§ã¯ãããŸããã å€åãæœåºé¢æ°ã«æŒã蟌ã¿ãã¹ããŒããå¹³åŠã«ä¿ã¡ãŸãã
çµè«: ç«¶äºäŸ¡æ Œãã€ãã©ã€ã³ãã¹ã±ãŒã«ãããŸããã
ãã€ãã©ã€ã³ã¯6ã€ã®ã¹ãããã«çž®å°ãããŸã: ãã¹ã±ãããå®çŸ© â ScrapelessãéããŠåSKUãåžå Žããšã«ãšã°ã¬ã¹ãã³ã§ã¬ã³ããªã³ã° â æšæºçãªã¹ããŒãã«æœåº â NDJSONã«ã¹ããªãŒãã³ã° â å庫ã«ããŒã â èªèº«ã®äŸ¡æ Œãšå·®åãåãã åã¹ãããã¯èªã¿ãããã»ã©å°ãããæ§æã¯1æ¥1åã®cronã§8ã€ã®ç«¶åããã³4ã€ã®åžå Žã«ããã5,000SKUãåŠçããŸãã
äŸ¡æ Œé¢é£ã®ã¹ã¯ã¬ã€ãã³ã°ã®ããã®ãã³ããŒæ¯èŒãã¥ãŒïŒç¹ã«äžåç£äŸ¡æ ŒïŒã®ããã«ã2026å¹Žã®æé«ã®Zillowã¹ã¯ã¬ã€ããŒãªã¹ãã¯ãåæ§ã®ããŒã«ã©ã€ãºäŸ¡æ Œæœåºã®èª²é¡ã«å¯ŸããŠ8ã€ã®ããŒã«ãè©äŸ¡ããŸãã NDJSONåºåãã¯ã©ãŠãå庫ã«ããŒãããããã«ãScrapeless + SnowflakeããŒã¿ã€ã³ãžã§ã¹ãã§ã³ã¬ã€ãã¯COPY INTOãšã¹ããªãŒãã³ã°ãã¹ã®æé ã解説ããŸãã
åžå Žããšã«åºåœããã³çãããåSKUãç¬ç«ããŠã¬ã³ããªã³ã°ããæœåºæã«æ£èŠåããSKU/ç«¶å/åžå Ž/æ¥ããšã«1ã€ã®ã«ããã«ã«è¡ãä¿åããå庫ã§å·®åãåããŸã â ã¹ã¯ã¬ã€ããŒã§ã¯ãããŸããã
AIæèŒããŒã¿ãã€ãã©ã€ã³ãæ§ç¯ããæºåã¯ã§ããŸãããïŒ
ç§ãã¡ã®ã³ãã¥ããã£ã«åå ããŠç¡æãã©ã³ãååŸããç«¶äºäŸ¡æ Œãã€ãã©ã€ã³ãæ§ç¯ããŠããéçºè ãšã€ãªãããŸããã: Discord · Telegramã
app.scrapeless.com ã«ç¡æã®ã©ã³ã¿ã€ã ã§ãµã€ã³ã¢ããããäžèšã®ãã¿ãŒã³ãäŸ¡æ Œãã€ãã©ã€ã³ãå¿ èŠãšããåžå Žãç«¶åãSKUãã¹ã±ããã«é©å¿ãããŠãã ãããäŸ¡æ Œã®è©³çްã¯scrapeless.com/en/pricing; ãããã·ãœãªã¥ãŒã·ã§ã³ã®è£œåããŒãžã¯scrapeless.com/en/product/proxy-solutions; å®å šãªSDKãªãã¡ã¬ã³ã¹ã¯docs.scrapeless.comã«ãããŸãã
ãããã質å
Q1: ç«¶åã®äŸ¡æ Œãã¹ã¯ã¬ã€ãã³ã°ããããšã¯åæ³ã§ããïŒ
äŸ¡æ Œã¯å°å£²æ¥è ã®è£œåããŒãžäžã®å ¬çæ å ±ã§ãããäŸ¡æ Œæ¯èŒã¯ç¢ºç«ããã忥å®è·µã§ããåæ³æ§ã¯ãäœããã©ããããã©ã®ãããªæ¡ä»¶ã§ã¹ã¯ã¬ã€ãã³ã°ãããã«äŸåããŸããå ¬ã«èŠãããšãã§ããããŒã¿ã¯äžè¬çã«ã¢ã¯ã»ã¹å¯èœã§ããããµã€ãã®å©çšèŠçŽãå°åã®ãã©ã€ãã·ãŒæ³ïŒGDPRãCCPAïŒããã³èäœæš©ãé©çšãããŸããé倧ãªäœ¿çšäŸã«ã€ããŠã¯æ³çå©èšãæ±ããŠãã ãããScrapelessã¯å ¬ã«å©çšå¯èœãªããŒã¿ã®ã¿ã«ã¢ã¯ã»ã¹ããŸãã
Q2: ç«¶äºäŸ¡æ Œã®ããã«ãããã·ã¯å¿
èŠã§ããïŒ
ã¯ããåœã®PINã¯IPã®ããŒããŒã·ã§ã³ãããéèŠã§ããå°å£²æ¥è
ã¯åžå Žããšã«äŸ¡æ ŒãããŒã«ã©ã€ãºããŸããã¢ã¡ãªã«ãã.co.ukãã¡ã€ã³ãžã®ã¢ã¯ã»ã¹èŠæ±ã¯ããã©ãŒã«ããã¯äŸ¡æ Œããªãã€ã¬ã¯ãããŸãã¯å°ççãããã¯ãè¿ãå ŽåããããŸããæž¬å®å¯Ÿè±¡ã®åžå Žã«åœãåºå®ããã«ã¯ãUniversalProxy(country=...)ã䜿çšããŸãã195ãåœä»¥äžã®ã¹ã¯ã¬ã€ãã¬ã¹ã®äœå®
çšãããã·ã¯ãå¥ã®ãããã·ãããã€ããŒãã¹ã¿ãã¯ã«æã¡èŸŒãããšãªããäžè¬çãªäŸ¡æ Œãã¹ã±ãããã«ããŒããŸãã
Q3: ã¢ã³ãããããã£ã¬ã³ãžããããæ€åºã«ã©ã察åŠããã°ããã§ããïŒ
ã¬ã³ããªã³ã°ã¯ãäœå®
çšãšã°ã¬ã¹ããªã¢ã«ãªJavaScriptå®è¡ãã©ã³ããã€ãºããããã£ã³ã¬ãŒããªã³ãã£ã³ã°ã䜿çšããŠãã¹ã¯ã¬ã€ãã¬ã¹ã®ã¯ã©ãŠãã§ãµãŒããŒãµã€ãã§å®è¡ããããããå°å£²æ¥è
ã«å±ããªã¯ãšã¹ãã¯ã¿ãŒã²ããåžå Žã®äœå®
IPããã®æ®éã®ãã©ãŠã¶ã®ããã«èŠããŸããjs_render=Trueãèšå®ãããšãã¬ã¹ãã³ã¹ã¯äºåã¬ã³ããªã³ã°ã®ã·ã§ã«ã§ã¯ãªãããã€ãã¬ãŒã·ã§ã³åŸã®DOMã«ãªããŸãããããŠãèšæž¬ããåžå Žã«åœãåºå®ããŸãã
Q4: ãã€ãã©ã€ã³ã¯ã©ã®ãããã®é »åºŠã§å®è¡ãã¹ãã§ããïŒ
äŸ¡æ Œåèšå®ã®æ±ºå®ã®ããã®åºæ¬çãªé »åºŠã¯æ¥æ¬¡ã§ããäŸ¡æ Œãæ¥äžã«å€ããããã¢ãŒã·ã§ã³ãŠã£ã³ããŠã®ç£èŠã«å¯ŸããŠã¯ãæéæ¯ã®é »åºŠãçŸå®çã§ããSKUããšã®ã³ã¹ãã¯åäžã¬ã³ããªã³ã°ã³ãŒã«ã«å¶çŽãããŠãããããæ¥æ¬¡ã®é »åºŠã§5,000SKUã®ãã¹ã±ããã¯åäžã®Cronã·ã§ããã®äºç®å ã«åãŸããŸããããé«ãé »åºŠã¯ã³ã¹ããç·åœ¢ã§å¢å ãããŸã â äŸ¡æ Œæ±ºå®ãå®éã«æ¶è²»ããé »åºŠãéžæããŠãã ããã
Q5: å°å£²æ¥è ãDOMãããŒããŒã·ã§ã³ãããšäœãèµ·ãããŸããïŒ
ã¹ããã5ã®å°å£²æ¥è
ããšã®æœåºåšã ãã倿ŽãããŸããåºæ¬çãªã¹ããŒãããŠã§ã¢ããŠã¹ã®ããŒãã«ãBIã¿ã€ã«ãå·®åã¯ãšãªãã¢ã©ãŒãã«ãŒã«ã¯ãã¹ãŠåœ±é¿ãåããŸãããå°å£²æ¥è
ããªãªãŒã¹ãè¡ãéã«ã»ã¬ã¯ã¿ãåãã§ãã¯ããŠãã ãããå¯èœãªå Žåã¯[data-test='...']屿§ãåªå
ããæœåºåšãå€åããã¬ã€ã€ãŒãã¹ããŒããå®å®ããã¬ã€ã€ãŒãšããŠæ±ããŸãã
Q6: è€æ°ã®å°å£²æ¥è ã䞊è¡ããŠå®è¡ã§ããŸããïŒ
ã¯ããåã¬ã³ããªã³ã°ã³ãŒã«ã¯ã¹ããŒãã¬ã¹ã§ããããããªãŒã±ã¹ãã¬ãŒã¿ãŒã¯ïŒåžå Žãç«¶åãSKUïŒã¿ã¹ã¯ãã¹ã¬ããããŒã«å šäœã«ãã¡ã³ã¢ãŠãããã¢ã«ãŠã³ããã©ã³ãèš±å¯ããã¬ãã«ã«ã¯ãŒã«ãŒæ°ãå¶éããŸãã䞊è¡åŠçã¯ã¯ãŒã«ãŒã远å ããããšã§ã¹ã±ãŒã«ããã»ãã·ã§ã³ãå ±æããããšã§ã¯ãããŸãã â ç«¶åããæ¥ç¶ã¯ãããŸããã
Q7: ããã¢ç¶æ ãšå²åŒçãã©ããã£ãŠãã£ããã£ããŸããïŒ
ã¹ããã5ã®æœåºåšã¯ãã¬ã³ããªã³ã°ãããDOMããããã¢ãããžãçŽæ¥èªã¿åããpromo_stateïŒ"on_sale"ã"clearance"ã"none"ïŒãšpromo_discount_pctãå¥ã
ã®ãã£ãŒã«ããšããŠä¿åããŸãããŠã§ã¢ããŠã¹ã¯äž¡æ¹ãå·®åã¯ãšãªã«çµåãããç«¶åã¯çŸåšè²©å£²äžãïŒã察ãç«¶åã®éåžžäŸ¡æ Œã¯äœã§ããïŒãã§äŸ¡æ Œã«ãŒã«ãåå²ã§ããããã«ããŸãã
Q8: åœéé貚ãšFXã«ã€ããŠã¯ã©ãã§ããïŒ
ã¬ã³ããªã³ã°ãããé貚æååãåã¬ã³ãŒãããšã«ä¿åããŸãïŒUSDãEURãJPYãGBPïŒãéè²šå€æã¯ãŠã§ã¢ããŠã¹ã¬ã€ã€ãŒã«å±ããã¹ã¯ã¬ã€ããŒã«ã¯å«ãŸããŸãã â çã®äŸ¡æ Œ + çã®é貚 + åžå ŽãNDJSONã«ä¿æããBIåŽã§æ¥æ¬¡ã®FXã¯ãã¹ãžã§ã€ã³ãå®è¡ããŸããããã«ãããäžã€ã®æªãFXã¬ãŒããå
šå±¥æŽãæ±æããªãããã«ããŸãã
Scrapelessã§ã¯ãé©çšãããæ³åŸãèŠå¶ãããã³Webãµã€ãã®ãã©ã€ãã·ãŒããªã·ãŒãå³å¯ã«éµå®ããªãããå ¬éãããŠããããŒã¿ã®ã¿ã«ã¢ã¯ã»ã¹ããŸãã ãã®ããã°ã®ã³ã³ãã³ãã¯ããã¢ã³ã¹ãã¬ãŒã·ã§ã³ã®ã¿ãç®çãšããŠãããéæ³ãŸãã¯äŸµå®³ã®æŽ»åã¯å«ãŸããŸããã ãã®ããã°ãŸãã¯ãµãŒãããŒãã£ã®ãªã³ã¯ããã®æ å ±ã®äœ¿çšã«å¯Ÿãããã¹ãŠã®è²¬ä»»ãä¿èšŒãããæŸæ£ããŸãã ã¹ã¯ã¬ã€ãã³ã°æŽ»åã«åŸäºããåã«ãæ³åŸé¡§åã«çžè«ããã¿ãŒã²ãããŠã§ããµã€ãã®å©çšèŠçŽã確èªããããå¿ èŠãªèš±å¯ãååŸããŠãã ããã



