How to Scrape Amazon Product Data with Proxies (2026)
Amazon Scraping Requirements
Amazon’s bot-detection requires:
- Residential proxies (datacenter IPs are blocked)
- Per-request IP rotation
- Realistic browser headers
- Request throttling (not too fast per IP per ASIN)
- JavaScript execution for dynamic price loading
Recommended Proxies
| Provider | Block Rate (Amazon) | Pricing | Managed API |
|---|---|---|---|
| Bright Data | measuring | ~$10.50/GB | ✓ Datasets + Scraping Browser |
| Smartproxy | measuring | ~$8.50/GB | ✓ Site Unblocker |
| Oxylabs | measuring | ~$12/GB | ✓ Web Scraper API |
Amazon block rates measured via harness — see /benchmark/.
Setup: Amazon Product Scraping
Basic product page (Smartproxy + BeautifulSoup)
import requests
from bs4 import BeautifulSoup
import time
import random
def scrape_amazon_product(asin, proxy_user, proxy_pass):
proxies = {
"http": f"http://{proxy_user}:{proxy_pass}@gate.smartproxy.com:10000",
"https": f"http://{proxy_user}:{proxy_pass}@gate.smartproxy.com:10000",
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
}
url = f"https://www.amazon.com/dp/{asin}"
resp = requests.get(url, proxies=proxies, headers=headers, timeout=15)
if resp.status_code != 200:
return {"error": f"Status {resp.status_code}"}
soup = BeautifulSoup(resp.text, "html.parser")
# Extract fields
title = soup.select_one("#productTitle")
price = soup.select_one(".a-price .a-offscreen, #priceblock_ourprice")
rating = soup.select_one("[data-hook='average-star-rating'] .a-size-base")
review_count = soup.select_one("#acrCustomerReviewText")
return {
"asin": asin,
"title": title.get_text(strip=True) if title else None,
"price": price.get_text(strip=True) if price else None,
"rating": rating.get_text(strip=True) if rating else None,
"review_count": review_count.get_text(strip=True) if review_count else None,
}
# Rate limiting — 1 request per IP per 30-60 seconds on same ASIN
time.sleep(random.uniform(30, 60))
Managed API for JS-rendered prices (Oxylabs)
import requests
def scrape_amazon_with_api(asin):
resp = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=("user", "pass"),
json={
"source": "amazon_product",
"asin": asin,
"domain": "com",
"parse": True,
}
)
return resp.json()["results"][0]["content"]
What Data You Can Collect
| Data Type | Accessibility | Notes |
|---|---|---|
| Product title | Public | Standard HTML |
| List price | Public | May require JS rendering |
| Buy Box price | Public | JS-rendered; use managed API |
| Prime price | Login required | Out of scope |
| Seller name | Public | From offer listing page |
| Review count | Public | Standard HTML |
| Review text | Public | Paginated; rate-limit carefully |
| Product images | Public | Direct URL from HTML |
FAQ
How do I handle Amazon CAPTCHAs?
CAPTCHAs from Amazon typically mean your request pattern triggered detection. Solutions: 1) Reduce request frequency per IP, 2) Improve headers to match browser fingerprint more closely, 3) Use a managed scraping API (Oxylabs Web Scraper, Smartproxy Site Unblocker) that handles CAPTCHAs internally.
Can I scrape Amazon reviews?
Amazon customer reviews are publicly accessible. Rate-limit carefully — one review page per IP per hour is safe. For large-scale review collection, consider Bright Data’s Amazon Datasets, which provide pre-collected structured review data.
Is collecting Amazon data legal?
Collecting publicly displayed product information (titles, public prices, public reviews, seller information) is legal in most jurisdictions. Amazon’s ToS restricts automated collection, but the legality of ToS enforcement has been addressed in court cases (hiQ v. LinkedIn) establishing that collecting public data is not unlawful under US law. Consult legal counsel for your specific use case and jurisdiction.
This article was produced with AI assistance and reviewed by an editor. As of 2026-06-01. Benchmark figures: /benchmark/. Use proxies for legitimate purposes only.