[ ]

● ● ●extract.json

{}

Launching next week

The web,
as an API.

Bot-proof scraping, intelligent extraction, and schema-validated JSON — in one endpoint your team will never have to maintain.

No credit card
Bot-proof fetching
Schema-validated JSON

Try the full demo

https://

⌁reddit.com/r/formula1

POST /v1/smartscraper

rendered page

r/r/formula14.1m membersJoin

▲9.2k▼

Verstappen wins chaotic Japanese GP

◗ 1.4k↗ share✦ award

▲·▼

F1 TV Pro — stream every session

dropped · noise

▲3.6k▼

Hamilton on the Suzuka comeback

◗ 260↗ share✦ award

+ 47 more posts on this page

→idle

response.json200 OK

{

"url": "reddit.com/r/formula1",

"posts": [

{

"title": "Verstappen wins chaotic Japanese GP",

"upvotes": 9233,

"comments": 1438

{

"title": "Hamilton on the Suzuka comeback",

"upvotes": 3612,

"comments": 260

}

"count": 2

}

✓ schema valid · 412ms · 1 ad filtered

Drops into your stack — pick your weapon

$pnpm add@webscrape/sdk

// platform telemetry live

Bot detection rate

<0%

12mo · n=1.2M req

HTML noise reduction

0-0%

with reduce_content

Preset recipes

for popular sites

Features

Built for developers who need reliable data

A complete extraction toolkit — every feature audited, versioned, and shipped from one endpoint.

stable

AI Agent Extraction

Describe what you want in plain English. Our agents understand the page semantically and return exactly the structured data you asked for.

user_prompt: "Extract top 5 stories
               with title, url, points"

stable

Bot-Proof Browser

Our own Chromium fork, fingerprint-patched at the source — not a bolt-on script anti-bot can spot.

JA4HTTP/2CanvasWebGL

stable

Schema Enforcement

Validate, repair, guarantee. JSON Schema in, conforming data out.

validate schema
✓ 5/5 fields matched
✓ no repair needed

stable

Intelligent Chunking

Long pages split, processed in parallel, deduped on merge.

split→parallel→merge→dedup

stable

Any Input Source

URLs, raw HTML, Markdown, or PDFs — one API for all of them.

URL

HTML

PDF

beta

Content Reduction

A local NLP layer strips navbars, ads, and noise before extraction — cutting cost and latency by 50–80% with no accuracy loss.

noise removed0%

How It Works

One request. Six stages. Clean JSON.

Every request flows through the same deterministic pipeline — fetch, clean, reduce, extract, validate. No black box.

~/wsai — pipeline.trace live

horizontal · trace6 stages

URLPOST /v1/smartscraper—request

request

fetchworker fetch12ms405 KB html

405 KB html

cleanselectolax3ms80 KB dom

80 KB dom

reducenlp filter23ms5 KB text

5 KB text

extractvlm layer480ms5 fields

5 fields

JSONschema · validated2ms200 OK

▸total0/6 stages·~520ms

Use Cases

What you can build

Teams use SmartScraper to power product feeds, agents, dashboards, and entire data pipelines.

E-commerce monitoring

Track prices, stock, and reviews across thousands of product pages with consistent JSON schemas.

Lead generation

Extract names, emails, titles, and company info from directories and profiles at scale.

News & content intel

Pull articles, authors, dates, and entities from any publisher into clean, queryable data.

AI agent tooling

Plug structured web data into LangChain, n8n, or your own agents — no scrapers to maintain.

Why SmartScraper

Stop fighting with selectors and broken scripts

A side-by-side look at what you actually get out of the box.

	SmartScraper	DIY scraper	Headless browser
Works on any site without writing selectors
Schema-validated structured output
Bot-proof browser
Automatic chunking for long pages
50–80% noise removed before processing
PDF + HTML + Markdown input
Zero maintenance when sites change
No infrastructure to run

Developer Experience

One endpoint. Any language.

Integrate in minutes with any HTTP client. Schema-validated output means no post-processing.

~/wsai — request

# fetch the top 5 HN stories
curl -X POST https://api.webscrape.ai/v1/smartscraper \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "website_url": "https://news.ycombinator.com",
    "user_prompt": "Extract the top 5 stories with title, url, and points"
  }'

# fetch the top 5 HN stories
import requests

resp = requests.post(
    "https://api.webscrape.ai/v1/smartscraper",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "website_url": "https://news.ycombinator.com",
        "user_prompt": "Extract the top 5 stories with title, url, and points",
    },
)
data = resp.json()
print(data["result"])

// fetch the top 5 HN stories
const resp = await fetch("https://api.webscrape.ai/v1/smartscraper", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${API_KEY}`,
  },
  body: JSON.stringify({
    website_url: "https://news.ycombinator.com",
    user_prompt: "Extract the top 5 stories with title, url, and points",
  })
});

const { result } = await resp.json();
console.log(result);

// extraction pipeline

▸fetch12ms
▸clean3ms
▸reduce23ms
▸extract480ms
▸validate2ms

total~325ms

~/wsai — response
response.json200 OK
// 200 OK · extracted in ~325ms
{
  "stories": [
    {
      "title": "Show HN: I built a real-time code editor",
      "url": "https://example.com/editor",
      "points": 342
    },
    {
      "title": "Why Rust is the future of systems programming",
      "url": "https://example.com/rust",
      "points": 281
    },
    {
      "title": "PostgreSQL 18 released with major improvements",
      "url": "https://postgresql.org/18",
      "points": 256
    },
    {
      "title": "A deep dive into WebAssembly garbage collection",
      "url": "https://example.com/wasm-gc",
      "points": 198
    },
    {
      "title": "Open source alternative to Figma",
      "url": "https://example.com/penpot",
      "points": 175
    }
  ]
}

Pricing

Simple, transparent pricing

Start free, scale as you grow. Pay only for what you use, no hidden fees.

Free

Try the API with no commitment.

Get Started

500 starting credits
300 credits / month thereafter
1 concurrent request
10 requests / minute
7-day data retention
Limited SmartBrowse

Hobby

For side projects and prototypes.

$19

5,000 credits / month
10 concurrent requests
100 requests / minute
Standard proxy rotation
30-day data retention
20% off extra credit

Startup

Cost Efficient

For growing teams in production.

$79

30,000 credits / month
50 concurrent requests
500 requests / minute
Residential proxies
30-day data retention
Priority support
40% off extra credit

Enterprise

Tailored solutions for large organizations.

Custom

Contact Sales

Unlimited credits
Custom rate limits
Dedicated infrastructure
Premium proxy pool
99.9% SLA guarantee
Dedicated account manager
On-premise deployment

AI agent? Read the plain-text version at /pricing.md.

FAQ

Frequently asked questions

Everything teams ask before going to production.

// frequently asked · 6 entries

Ready to extract structured data?

Try the live playground or integrate the API in minutes. No credit card required.

webscrape · ~/extract live

Continue in playground Read the docs

The web,as an API.

Verstappen wins chaotic Japanese GP

F1 TV Pro — stream every session

Hamilton on the Suzuka comeback

Built for developers who need reliable data

AI Agent Extraction

Bot-Proof Browser

Schema Enforcement

Intelligent Chunking

Any Input Source

Content Reduction

One request. Six stages. Clean JSON.

What you can build

E-commerce monitoring

Lead generation

News & content intel

AI agent tooling

Stop fighting with selectors and broken scripts

One endpoint. Any language.

Simple, transparent pricing

Free

Hobby

Startup

Enterprise

Frequently asked questions

Ready to extract structured data?

The web,
as an API.