POST https://api.geekflare.com/webscraping
Basic Scrape
Scrape a URL and get back LLM-ready markdown.Response
Response
Output Formats
Choose one or more output formats. You can request up to 3 formats in a single call.| Format | Description |
|---|---|
html | Raw HTML |
markdown | Clean Markdown |
json | Structured JSON |
html-llm | HTML stripped for LLM consumption |
markdown-llm | Markdown stripped for LLM consumption |
text | Plain text |
text-llm | Plain text stripped for LLM consumption |
Response
Response
File Output
Get a CDN URL instead of inline content. Useful for large pages or when you need to store the result.Response
Response
JavaScript Rendering
Disable JS rendering for faster scrapes on static pages. Enabled by default.Stealth Mode
Bypass bot detection on protected pages. Slower but more reliable on heavily guarded sites.Wait Time
Add a delay after page load to capture lazy-loaded content or bypass bot checks.Proxy Routing
Route the request through a specific country’s IP address to bypass geo-blocks or scrape region-specific content.Device Emulation
Emulate a mobile device to scrape mobile-specific content.Structured Extraction — CSS Schema
Extract specific fields from a page using CSS selectors. Returns structured JSON.Response
Response
Structured Extraction — XPath Schema
Use XPath expressions for more precise extraction.Response
Response
Default Extraction — Static Fields
Inject static metadata fields alongside scraped content.All Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | required | Target URL |
device | desktop | mobile | desktop | Device to emulate |
format | array | ["html-llm"] | Output format(s). Up to 3. |
renderJS | boolean | true | Execute JavaScript before extracting |
blockAds | boolean | true | Block ads during scrape |
stealth | boolean | false | Bypass bot detection |
waitTime | number | 0 | Seconds to wait after page load |
fileOutput | boolean | false | Return CDN URL instead of inline data |
proxyCountry | string | — | Route through country ISO code (e.g. us, gb) |
extractionMode | default | cssSchema | xpathSchema | default | Extraction mode (used when format includes json) |
extractionSchema | object | — | Schema for structured extraction |