PRICIUM
All posts
priciumhow it worksecommerce apistructured datascraping

From URL to Structured Data: How Pricium Transforms E-commerce Scraping

A technical deep-dive into how Pricium converts a single product URL into complete, variation-aware, geo-accurate structured JSON - and why that transformation is harder than it looks.

Aman Patel

Aman Patel

Founder & CEO

2026-03-20 8 min read

The Deceptively Simple Interface

The Pricium API has the simplest possible interface:

POST /scrape
{
  "url": "https://amazon.com/dp/B0EXAMPLE",
  "location": "US"
}

One URL in. Complete structured product data out.

Under that simple interface is a significant amount of engineering. This post walks through what actually happens between input and output - and why getting it right is harder than it appears.

Step 1: Request Routing and Geo-Context Initialization

When a request arrives with "location": "UK", Pricium doesn't just change a header. It:

  • Routes the outbound request through a residential proxy pool in the UK
  • Sets up a browser session with UK locale, currency preferences, and a UK shipping address in session storage
  • Selects an appropriate browser fingerprint matching a common UK user profile (OS, browser version, screen resolution, timezone)

This multi-layer geo-context setup is what ensures the retrieved price is genuinely the UK price - not the US price with a UK flag attached.

Step 2: Anti-Bot Evasion

Major e-commerce platforms invest heavily in detecting non-human traffic. Before even loading the product page, Pricium's browser environment:

  • Configures a realistic TLS fingerprint (not the default Playwright TLS client hello)
  • Disables automation-related JavaScript properties (navigator.webdriver, etc.)
  • Loads necessary browser extensions and plugins that a real user would have
  • Sets up realistic browser history and cookie patterns

The goal: be indistinguishable from a real user with a real browser in a real location.

Step 3: Page Load and JavaScript Execution

The product URL is loaded in the configured browser context. Unlike HTTP-only scrapers:

  • All JavaScript is executed (including framework rendering, lazy loading, and dynamic content)
  • The scraper waits for meaningful page events - not just DOMContentLoaded, but actual price elements rendering
  • Network requests are monitored for XHR/fetch calls that load pricing data asynchronously

Step 4: Variation Data Extraction

This is the most technically challenging step. Pricium uses two strategies in parallel:

Strategy A: Embedded Data Parsing

Many platforms store variation data in the page's embedded JavaScript - Amazon's "Twister" JSON is a canonical example. This data structure contains a complete map of all variations and their attributes. Pricium parses these embedded objects directly:

// Pseudocode of Twister extraction
const pageContent = await page.content();
const twistedMatch = pageContent.match(/"twisterData":\s*(\{.+?\})/s);
if (twistedMatch) {
  const variationMap = JSON.parse(twistedMatch[1]);
  // Extract size/color/price mappings from variationMap
}

When this succeeds, we get all variation data from a single page load - efficiently.

Strategy B: Swatch Interaction

For platforms that don't expose variation data in embedded scripts, Pricium falls back to programmatic variation enumeration: identifying all interactive variation swatches, clicking each one, waiting for the DOM to update, and capturing the resulting price.

This is slower (handled with parallelism where possible) but comprehensive.

Step 5: Data Normalization

Raw extracted data varies in structure across retailers. Pricium normalizes everything into a consistent schema:

interface ProductData {
  product_title: string;
  source_url: string;
  currency: string;
  variations: Array<{
    size?: string;
    color?: string;
    config?: string;         // For electronics: storage, RAM, etc.
    price: number;
    original_price?: number; // Pre-discount price if on sale
    available: boolean;
    rating?: number;
    review_count?: number;
  }>;
  geo_pricing?: Record<string, {
    price: number;
    currency: string;
    tax_included: boolean;
  }>;
  scraped_at: string; // ISO 8601
}

The same schema regardless of whether the source was Amazon, Flipkart, Nike.com, or a Shopify store.

Step 6: Quality Validation

Before returning data, Pricium runs validation checks:

  • Are prices within reasonable bounds for the category? (Detect bot-countermeasure data)
  • Does the number of variations match known product structure? (Detect incomplete captures)
  • Is the price fresh? (Timestamp validation)
  • Is availability consistent with known stock patterns?

If validation fails, the request is retried with a different proxy and browser configuration.

What Comes Out: A Real Example

{
  "product_title": "Levi's Men's 514 Straight Fit Jeans",
  "source_url": "https://amazon.com/dp/B00EXAMPLE",
  "currency": "USD",
  "scraped_at": "2026-03-20T14:32:01Z",
  "variations": [
    { "size": "30x30", "color": "Dark Stonewash", "price": 44.99, "available": true, "rating": 4.3 },
    { "size": "32x30", "color": "Dark Stonewash", "price": 44.99, "available": true, "rating": 4.3 },
    { "size": "34x32", "color": "Dark Stonewash", "price": 49.99, "available": false, "rating": 4.2 },
    { "size": "30x30", "color": "Medium Stonewash", "price": 39.99, "available": true, "rating": 4.5 }
  ]
}

Start to finish: typically under 8 seconds for most products, including full variation enumeration.

Why This Matters

The transformation from "URL" to "complete structured product data" sounds simple but requires solving anti-bot, geo-routing, JavaScript rendering, variation enumeration, and data normalization problems simultaneously. Pricium does all of this so you don't have to - and exposes it through the simplest possible interface.


Start turning URLs into structured data with Pricium →

Aman Patel

Written by Aman Patel

Founder & CEO at Pricium