How to Scrape Amazon Product Data with Proxies (2026)

Amazon Scraping Requirements

Amazon’s bot-detection requires:

  1. Residential proxies (datacenter IPs are blocked)
  2. Per-request IP rotation
  3. Realistic browser headers
  4. Request throttling (not too fast per IP per ASIN)
  5. JavaScript execution for dynamic price loading
ProviderBlock Rate (Amazon)PricingManaged API
Bright Datameasuring~$10.50/GB✓ Datasets + Scraping Browser
Smartproxymeasuring~$8.50/GB✓ Site Unblocker
Oxylabsmeasuring~$12/GB✓ Web Scraper API

Amazon block rates measured via harness — see /benchmark/.

Setup: Amazon Product Scraping

Basic product page (Smartproxy + BeautifulSoup)

import requests
from bs4 import BeautifulSoup
import time
import random

def scrape_amazon_product(asin, proxy_user, proxy_pass):
    proxies = {
        "http":  f"http://{proxy_user}:{proxy_pass}@gate.smartproxy.com:10000",
        "https": f"http://{proxy_user}:{proxy_pass}@gate.smartproxy.com:10000",
    }
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    }
    
    url = f"https://www.amazon.com/dp/{asin}"
    resp = requests.get(url, proxies=proxies, headers=headers, timeout=15)
    
    if resp.status_code != 200:
        return {"error": f"Status {resp.status_code}"}
    
    soup = BeautifulSoup(resp.text, "html.parser")
    
    # Extract fields
    title = soup.select_one("#productTitle")
    price = soup.select_one(".a-price .a-offscreen, #priceblock_ourprice")
    rating = soup.select_one("[data-hook='average-star-rating'] .a-size-base")
    review_count = soup.select_one("#acrCustomerReviewText")
    
    return {
        "asin": asin,
        "title": title.get_text(strip=True) if title else None,
        "price": price.get_text(strip=True) if price else None,
        "rating": rating.get_text(strip=True) if rating else None,
        "review_count": review_count.get_text(strip=True) if review_count else None,
    }

# Rate limiting — 1 request per IP per 30-60 seconds on same ASIN
time.sleep(random.uniform(30, 60))

Managed API for JS-rendered prices (Oxylabs)

import requests

def scrape_amazon_with_api(asin):
    resp = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=("user", "pass"),
        json={
            "source": "amazon_product",
            "asin": asin,
            "domain": "com",
            "parse": True,
        }
    )
    return resp.json()["results"][0]["content"]

What Data You Can Collect

Data TypeAccessibilityNotes
Product titlePublicStandard HTML
List pricePublicMay require JS rendering
Buy Box pricePublicJS-rendered; use managed API
Prime priceLogin requiredOut of scope
Seller namePublicFrom offer listing page
Review countPublicStandard HTML
Review textPublicPaginated; rate-limit carefully
Product imagesPublicDirect URL from HTML

FAQ

How do I handle Amazon CAPTCHAs?

CAPTCHAs from Amazon typically mean your request pattern triggered detection. Solutions: 1) Reduce request frequency per IP, 2) Improve headers to match browser fingerprint more closely, 3) Use a managed scraping API (Oxylabs Web Scraper, Smartproxy Site Unblocker) that handles CAPTCHAs internally.

Can I scrape Amazon reviews?

Amazon customer reviews are publicly accessible. Rate-limit carefully — one review page per IP per hour is safe. For large-scale review collection, consider Bright Data’s Amazon Datasets, which provide pre-collected structured review data.

Collecting publicly displayed product information (titles, public prices, public reviews, seller information) is legal in most jurisdictions. Amazon’s ToS restricts automated collection, but the legality of ToS enforcement has been addressed in court cases (hiQ v. LinkedIn) establishing that collecting public data is not unlawful under US law. Consult legal counsel for your specific use case and jurisdiction.


This article was produced with AI assistance and reviewed by an editor. As of 2026-06-01. Benchmark figures: /benchmark/. Use proxies for legitimate purposes only.