The Deceptively Simple Interface
The Pricium API has the simplest possible interface:
POST /scrape
{
"url": "https://amazon.com/dp/B0EXAMPLE",
"location": "US"
}
One URL in. Complete structured product data out.
Under that simple interface is a significant amount of engineering. This post walks through what actually happens between input and output - and why getting it right is harder than it appears.
Step 1: Request Routing and Geo-Context Initialization
When a request arrives with "location": "UK", Pricium doesn't just change a header. It:
- Routes the outbound request through a residential proxy pool in the UK
- Sets up a browser session with UK locale, currency preferences, and a UK shipping address in session storage
- Selects an appropriate browser fingerprint matching a common UK user profile (OS, browser version, screen resolution, timezone)
This multi-layer geo-context setup is what ensures the retrieved price is genuinely the UK price - not the US price with a UK flag attached.
Step 2: Anti-Bot Evasion
Major e-commerce platforms invest heavily in detecting non-human traffic. Before even loading the product page, Pricium's browser environment:
- Configures a realistic TLS fingerprint (not the default Playwright TLS client hello)
- Disables automation-related JavaScript properties (
navigator.webdriver, etc.) - Loads necessary browser extensions and plugins that a real user would have
- Sets up realistic browser history and cookie patterns
The goal: be indistinguishable from a real user with a real browser in a real location.
Step 3: Page Load and JavaScript Execution
The product URL is loaded in the configured browser context. Unlike HTTP-only scrapers:
- All JavaScript is executed (including framework rendering, lazy loading, and dynamic content)
- The scraper waits for meaningful page events - not just DOMContentLoaded, but actual price elements rendering
- Network requests are monitored for XHR/fetch calls that load pricing data asynchronously
Step 4: Variation Data Extraction
This is the most technically challenging step. Pricium uses two strategies in parallel:
Strategy A: Embedded Data Parsing
Many platforms store variation data in the page's embedded JavaScript - Amazon's "Twister" JSON is a canonical example. This data structure contains a complete map of all variations and their attributes. Pricium parses these embedded objects directly:
// Pseudocode of Twister extraction
const pageContent = await page.content();
const twistedMatch = pageContent.match(/"twisterData":\s*(\{.+?\})/s);
if (twistedMatch) {
const variationMap = JSON.parse(twistedMatch[1]);
// Extract size/color/price mappings from variationMap
}
When this succeeds, we get all variation data from a single page load - efficiently.
Strategy B: Swatch Interaction
For platforms that don't expose variation data in embedded scripts, Pricium falls back to programmatic variation enumeration: identifying all interactive variation swatches, clicking each one, waiting for the DOM to update, and capturing the resulting price.
This is slower (handled with parallelism where possible) but comprehensive.
Step 5: Data Normalization
Raw extracted data varies in structure across retailers. Pricium normalizes everything into a consistent schema:
interface ProductData {
product_title: string;
source_url: string;
currency: string;
variations: Array<{
size?: string;
color?: string;
config?: string; // For electronics: storage, RAM, etc.
price: number;
original_price?: number; // Pre-discount price if on sale
available: boolean;
rating?: number;
review_count?: number;
}>;
geo_pricing?: Record<string, {
price: number;
currency: string;
tax_included: boolean;
}>;
scraped_at: string; // ISO 8601
}
The same schema regardless of whether the source was Amazon, Flipkart, Nike.com, or a Shopify store.
Step 6: Quality Validation
Before returning data, Pricium runs validation checks:
- Are prices within reasonable bounds for the category? (Detect bot-countermeasure data)
- Does the number of variations match known product structure? (Detect incomplete captures)
- Is the price fresh? (Timestamp validation)
- Is availability consistent with known stock patterns?
If validation fails, the request is retried with a different proxy and browser configuration.
What Comes Out: A Real Example
{
"product_title": "Levi's Men's 514 Straight Fit Jeans",
"source_url": "https://amazon.com/dp/B00EXAMPLE",
"currency": "USD",
"scraped_at": "2026-03-20T14:32:01Z",
"variations": [
{ "size": "30x30", "color": "Dark Stonewash", "price": 44.99, "available": true, "rating": 4.3 },
{ "size": "32x30", "color": "Dark Stonewash", "price": 44.99, "available": true, "rating": 4.3 },
{ "size": "34x32", "color": "Dark Stonewash", "price": 49.99, "available": false, "rating": 4.2 },
{ "size": "30x30", "color": "Medium Stonewash", "price": 39.99, "available": true, "rating": 4.5 }
]
}
Start to finish: typically under 8 seconds for most products, including full variation enumeration.
Why This Matters
The transformation from "URL" to "complete structured product data" sounds simple but requires solving anti-bot, geo-routing, JavaScript rendering, variation enumeration, and data normalization problems simultaneously. Pricium does all of this so you don't have to - and exposes it through the simplest possible interface.
