Website Scraper Service
Learn how to use the Website Scraper Service for extracting content from web pages.
Overview
The Website Scraper Service provides lightweight web content extraction capabilities. The service uses modern asynchronous HTTP requests and HTML parsing to retrieve and extract content from websites. It supports both full page extraction and targeted content selection through CSS selectors.
Endpoints
Website Content Scraping
The scrape endpoint performs web content extraction using aiohttp and BeautifulSoup.
API Endpoint
Endpoint Parameters
Here is an overview of the parameters that are used by the scrape endpoint.
Request
content
(string or object, required)
Either a direct URL string or an object containing:
url
(string): URL to scrapeselector
(string, optional): CSS selector to extract specific content
context
(object, optional)
Optional context information:
selector
(string, optional): Alternative way to specify CSS selector
config
(object, optional)
Optional configuration parameters:
include_html
(boolean, default: true): Whether to include HTML content in the response
Response
status
(string)
Status of the scraping process (success or error).
result
(object)
Extraction results with the following properties:
title
(string): Page titleurl
(string): URL that was scrapedtext
(string): Extracted text content from the pagehtml
(string, optional): HTML content if include_html is trueselected_content
(array, optional): Text content from selected elements if a selector was providedselected_html
(array, optional): HTML from selected elements if a selector was provided
metadata
(object)
Processing metadata:
processed_at
(number): Timestamp of when the request was processedinstance_id
(string): Service instance identifier
error
(string)
Error message if status is 'error'.
Examples
Below are examples demonstrating different ways to use the website scraper service.
Basic URL Scraping
Request
Response
Using CSS Selectors
Request
Response
Excluding HTML Content
Request
Response
Error Response Example
Response