format parameter.
Whether you are archiving full web pages, extracting structured data, or feeding Retrieval-Augmented Generation (RAG) pipelines, you can request the exact data structure you need.
The -llm Optimized Formats
For AI developers, we offer specialized -llm formats. These formats run our semantic cleaning engine before returning the response. They automatically strip out boilerplate, navigation bars, footers, sidebars, advertisements, and hidden elements, returning only the primary content of the page.
Using -llm formats dramatically reduces noise, improves LLM inference accuracy, and saves massive amounts of tokens.
Format Reference
You can pass one of the following string values into theformat parameter of your API request.
| Format | Description | Best For | Avg. Token Savings* |
|---|---|---|---|
markdown-llm | (Recommended for AI) The primary content of the page, converted to clean Markdown. | RAG pipelines, AI Agents, LLM. | ~75% vs raw HTML |
text-llm | The primary content of the page as raw text. Strips all HTML tags, Markdown formatting, and structural data. | Vector embeddings, traditional NLP, maximum token efficiency. | ~85% vs raw HTML |
html-llm | The primary content of the page in HTML format. Strips out all <script>, <style>, <nav>, and <footer> tags. | AI applications requiring DOM structure, semantic HTML parsing. | ~60% vs raw HTML |
markdown | The entire rendered page converted to Markdown, including navigation links, sidebars, and footer text. | Full-page archiving, layout analysis. | ~60% vs raw HTML |
text | The entire rendered page as raw text. Contains no HTML tags, but includes all boilerplate text. | Keyword density checks, regex matching. | ~70% vs raw HTML |
html | The raw, unmodified HTML DOM of the page as it appears in the browser. | Traditional web scraping, republishing. | - |
json | Structured JSON output alongside the DOM. | Data analysts, structured databases. | - |
Example Usage
To request an LLM-optimized Markdown response, simply set theformat parameter in your JSON payload.
API request to get Markdown LLM format:
Choosing the Right Format for AI
If you are building an AI application, choosing between the-llm formats depends on your specific pipeline:
- Use
markdown-llmwhen you need to chunk data for a Vector Database. Markdown preserves## Headings, which chunking algorithms use to keep contextual ideas together. It also preserves data tables. - Use
text-llmwhen you are doing massive batch processing and need the absolute lowest token count possible, or when simply generating mathematical vector embeddings where structural tags are unnecessary. - Use
html-llmwhen you are passing data to an LLM that has been specifically fine-tuned to read DOM structures and CSS classes.