[ ]
● ● ●extract.json
{}
Launching next week

The web,as an API.

Bot-proof scraping, intelligent extraction, and schema-validated JSON — in one endpoint your team will never have to maintain.
  • No credit card
  • Bot-proof fetching
  • Schema-validated JSON
reddit.com/r/formula1
POST /v1/smartscraper
rendered page
r/r/formula14.1m membersJoin
9.2k

Verstappen wins chaotic Japanese GP

3.6k

Hamilton on the Suzuka comeback

+ 47 more posts on this page
idle
response.json200 OK
{
"url": "reddit.com/r/formula1",
"posts": [
{
"title": "Verstappen wins chaotic Japanese GP",
"upvotes": 9233,
"comments": 1438
},
{
"title": "Hamilton on the Suzuka comeback",
"upvotes": 3612,
"comments": 260
}
],
"count": 2
}
schema valid · 412ms · 1 ad filtered

Drops into your stack — pick your weapon

$pnpm add@webscrape/sdk
// platform telemetry live
Bot detection rate
<0%
12mo · n=1.2M req
HTML noise reduction
0-0%
with reduce_content
Preset recipes
+0
for popular sites
Features

Built for developers who need reliable data

A complete extraction toolkit — every feature audited, versioned, and shipped from one endpoint.
stable

AI Agent Extraction

Describe what you want in plain English. Our agents understand the page semantically and return exactly the structured data you asked for.

user_prompt: "Extract top 5 stories
               with title, url, points"
stable

Bot-Proof Browser

Our own Chromium fork, fingerprint-patched at the source — not a bolt-on script anti-bot can spot.

JA4HTTP/2CanvasWebGL
stable

Schema Enforcement

Validate, repair, guarantee. JSON Schema in, conforming data out.

validate schema
✓ 5/5 fields matched
✓ no repair needed
stable

Intelligent Chunking

Long pages split, processed in parallel, deduped on merge.

splitparallelmergededup
stable

Any Input Source

URLs, raw HTML, Markdown, or PDFs — one API for all of them.

URL
HTML
MD
PDF
beta

Content Reduction

A local NLP layer strips navbars, ads, and noise before extraction — cutting cost and latency by 50–80% with no accuracy loss.

noise removed0%
How It Works

One request. Six stages. Clean JSON.

Every request flows through the same deterministic pipeline — fetch, clean, reduce, extract, validate. No black box.
~/wsai — pipeline.trace live
horizontal · trace6 stages
URLPOST /v1/smartscraperrequest
request
fetchworker fetch12ms405 KB html
405 KB html
cleanselectolax3ms80 KB dom
80 KB dom
reducenlp filter23ms5 KB text
5 KB text
extractvlm layer480ms5 fields
5 fields
JSONschema · validated2ms200 OK
total0/6 stages·~520ms
Use Cases

What you can build

Teams use SmartScraper to power product feeds, agents, dashboards, and entire data pipelines.

E-commerce monitoring

Track prices, stock, and reviews across thousands of product pages with consistent JSON schemas.

Lead generation

Extract names, emails, titles, and company info from directories and profiles at scale.

News & content intel

Pull articles, authors, dates, and entities from any publisher into clean, queryable data.

AI agent tooling

Plug structured web data into LangChain, n8n, or your own agents — no scrapers to maintain.

Why SmartScraper

Stop fighting with selectors and broken scripts

A side-by-side look at what you actually get out of the box.
SmartScraper
DIY scraperHeadless browser
Works on any site without writing selectors
Schema-validated structured output
Bot-proof browser
Automatic chunking for long pages
50–80% noise removed before processing
PDF + HTML + Markdown input
Zero maintenance when sites change
No infrastructure to run
Developer Experience

One endpoint. Any language.

Integrate in minutes with any HTTP client. Schema-validated output means no post-processing.
~/wsai — request
# fetch the top 5 HN stories
curl -X POST https://api.webscrape.ai/v1/smartscraper \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "website_url": "https://news.ycombinator.com",
    "user_prompt": "Extract the top 5 stories with title, url, and points"
  }'
# fetch the top 5 HN stories
import requests

resp = requests.post(
    "https://api.webscrape.ai/v1/smartscraper",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "website_url": "https://news.ycombinator.com",
        "user_prompt": "Extract the top 5 stories with title, url, and points",
    },
)
data = resp.json()
print(data["result"])
// fetch the top 5 HN stories
const resp = await fetch("https://api.webscrape.ai/v1/smartscraper", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${API_KEY}`,
  },
  body: JSON.stringify({
    website_url: "https://news.ycombinator.com",
    user_prompt: "Extract the top 5 stories with title, url, and points",
  })
});

const { result } = await resp.json();
console.log(result);
// extraction pipeline
  • fetch12ms
  • clean3ms
  • reduce23ms
  • extract480ms
  • validate2ms
total~325ms
~/wsai — response
response.json200 OK
// 200 OK · extracted in ~325ms
{
  "stories": [
    {
      "title": "Show HN: I built a real-time code editor",
      "url": "https://example.com/editor",
      "points": 342
    },
    {
      "title": "Why Rust is the future of systems programming",
      "url": "https://example.com/rust",
      "points": 281
    },
    {
      "title": "PostgreSQL 18 released with major improvements",
      "url": "https://postgresql.org/18",
      "points": 256
    },
    {
      "title": "A deep dive into WebAssembly garbage collection",
      "url": "https://example.com/wasm-gc",
      "points": 198
    },
    {
      "title": "Open source alternative to Figma",
      "url": "https://example.com/penpot",
      "points": 175
    }
  ]
}
Pricing

Simple, transparent pricing

Start free, scale as you grow. Pay only for what you use, no hidden fees.

Free

Try the API with no commitment.

  • 500 starting credits
  • 300 credits / month thereafter
  • 1 concurrent request
  • 10 requests / minute
  • 7-day data retention
  • Limited SmartBrowse

Hobby

For side projects and prototypes.

  • 5,000 credits / month
  • 10 concurrent requests
  • 100 requests / minute
  • Standard proxy rotation
  • 30-day data retention
  • 20% off extra credit

Enterprise

Tailored solutions for large organizations.

  • Unlimited credits
  • Custom rate limits
  • Dedicated infrastructure
  • Premium proxy pool
  • 99.9% SLA guarantee
  • Dedicated account manager
  • On-premise deployment

AI agent? Read the plain-text version at /pricing.md.

FAQ

Frequently asked questions

Everything teams ask before going to production.
// frequently asked · 6 entries

Ready to extract structured data?

Try the live playground or integrate the API in minutes. No credit card required.

webscrape · ~/extract live