PRICIUM
All posts
amazon scraperproduct variationsweb scrapingplaywrightanti-bot

How to Scrape Amazon Product Variations Without Getting Blocked

Scraping Amazon for variation-specific pricing is notoriously difficult. Here's a breakdown of what breaks, why it breaks, and the approaches that actually work at scale.

Aman Patel

Aman Patel

Founder & CEO

2026-04-03 9 min read

Why Amazon Is So Hard to Scrape

Amazon is the world's most heavily scraped website - and consequently, it has the most sophisticated anti-scraping infrastructure. Understanding why variation scraping is particularly hard requires understanding Amazon's architecture.

The Three Core Challenges

1. Anti-Bot Detection

Amazon runs multiple detection layers simultaneously:

  • TLS fingerprinting - Identifies the specific TLS client hello pattern of your HTTP client
  • Browser fingerprinting - JavaScript challenges detect headless browsers vs real users
  • Behavioral analysis - Unusual request patterns, timing, and navigation paths trigger blocks
  • IP reputation scoring - Data center IPs are immediately flagged; shared proxies are quickly burned

Getting past these requires residential proxies, legitimate browser fingerprints, and human-like navigation patterns. A standard requests or curl call will get blocked within seconds.

2. Dynamic JavaScript Rendering

Amazon's product pages are React SPAs. Variation data is not in the initial HTML response - it's loaded and rendered by JavaScript after page load. A simple HTTP scraper sees an empty shell.

You need a full browser (Playwright, Puppeteer, or Selenium) to execute the JavaScript and see the rendered DOM.

3. Variation Data Isn't Just DOM-Visible

The trickiest part: variation prices aren't always visibly rendered until you click a variant. Amazon stores variation data in embedded dataLayer JavaScript objects and Twister JSON (Amazon's internal variation mapping format). To get all variations' prices, you need to:

  1. Parse the Twister JSON from the page's embedded <script> tags
  2. Decode the variation-to-ASIN mapping
  3. Either click each variant and capture the price, or fetch each variant's ASIN page separately

The DIY Approach (For Reference)

Here's a simplified Playwright approach - this is educational, not production-ready:

from playwright.async_api import async_playwright
import asyncio, json, re

async def scrape_variations(url: str):
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=False,  # Headless is detected - use headed in stealth mode
            args=['--disable-blink-features=AutomationControlled']
        )
        context = await browser.new_context(
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            viewport={'width': 1366, 'height': 768}
        )
        page = await context.new_page()
        
        # Add stealth script to mask Playwright signatures
        await page.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
        """)
        
        await page.goto(url, wait_until='networkidle', timeout=60000)
        
        # Try to extract Twister JSON from embedded scripts
        content = await page.content()
        twister_match = re.search(r'"twisterData"\s*:\s*(\{.+?\})', content, re.DOTALL)
        
        if twister_match:
            return json.loads(twister_match.group(1))
        
        # Fallback: click each variant swatch
        swatches = await page.query_selector_all('[id^="color_name_"] .swatch')
        results = []
        for swatch in swatches:
            await swatch.click()
            await page.wait_for_timeout(1500)
            price_el = await page.query_selector('.a-price .a-offscreen')
            price = await price_el.inner_text() if price_el else 'N/A'
            results.append({'swatch': await swatch.get_attribute('title'), 'price': price})
        
        return results

The honest reality: This approach gets blocked frequently, breaks with every Amazon UI update, requires constant maintenance, and doesn't handle geo-pricing or parallel variation enumeration well.

The Smarter Approach: Use the Pricium API

Pricium has built and maintains all this infrastructure for you:

import requests

response = requests.post(
    'https://api.pricium.store/scrape',
    json={'url': 'https://amazon.com/dp/B0EXAMPLE', 'location': 'US'},
    headers={'Authorization': 'Bearer YOUR_API_KEY'}
)

data = response.json()
for variation in data['variations']:
    print(f"{variation['size']} / {variation['color']}: ${variation['price']} - {'In stock' if variation['available'] else 'Out of stock'}")

One call. All variations. No blocks. No maintenance.

When to DIY vs. When to Use an API

FactorDIY ScraperPricium API
Setup timeWeeksMinutes
ReliabilityLowHigh
Anti-bot maintenanceOngoingHandled for you
Geo-pricing supportVery hardBuilt-in
Variation enumerationFragileComplete
CostEngineer timeAPI credits

For most builders, the API is the right call. The DIY path is only worth it if you need highly custom scraping logic for a retailer that Pricium doesn't yet support.


Skip the scraping headaches. Start with the Pricium API →

Aman Patel

Written by Aman Patel

Founder & CEO at Pricium