Enterprise Web Scraping
& Data Extraction

Extract the web. Automate data harvesting.

Production-ready framework with 57 plugins: 47 core (built-in) + 10 extensions (optional), modern plugin architecture, session management, and professional web IDE.

Or learn more about features
WebHarvest Hero Icon - Data Extraction and Web Scraping
57
Plugins
3,091
Tests Pass
15+
Years

Professional Web IDE

Develop, test, and debug your scrapers directly in the browser

Monaco Editor

VS Code's powerful editor with XML syntax highlighting and auto-completion

Real-Time Updates

WebSocket-based live streaming of logs, progress, and results

Multi-Tab Support

Per-tab logs, results, and session tracking (v2.2)

Session Metrics

Track duration, tokens, and performance in real-time

scraper.xml
<config xmlns="http://org.webharvest/schema/2.1/core">
  <!-- Fetch HTML -->
  <def var="html">
    <http url="https://example.com"/>
  </def>
  
  <!-- Parse to XML -->
  <def var="page">
    <html-to-xml>
      <get var="html"/>
    </html-to-xml>
  </def>
  
  <!-- Extract Title -->
  <def var="title">
    <xpath expression="//title/text()">
      <get var="page"/>
    </xpath>
  </def>
</config>
Try IDE Now

Enterprise Features

Production-ready tools for professional web scraping

Advanced Parsing

HTML/XML parsing with XPath, XQuery, and CSS selectors. Multi-strategy parsing for maximum compatibility.

HTTP Client

Full-featured HTTP client with connection pooling, cookies, authentication, and advanced headers.

Plugin System

Modern plugin architecture with dependency injection, auto-discovery, and 47 built-in plugins.

Session Tracking

Built-in session management, metrics, token tracking, and performance monitoring for production use.

Unified Settings NEW v2.2.0

CLI and GUI share configuration. Modern -option syntax. Auto-discovery. Override via command-line.

12 Essential Core Plugins

Most commonly used processors - all included in webharvest-core

HTTP

Fetch web pages & APIs

HTML-to-XML

Parse HTML content

XPath

Extract data from XML

JSON-to-XML

Parse JSON data

Def

Define variables

Get

Retrieve variables

Template

Generate output

Loop

Iterate collections

XQuery

Advanced XML queries

Regexp

Pattern matching

Script

JavaScript & Groovy

File

Read & write files

Plus 35 more: if, while, xslt, xml-to-json, tokenize, case, try-catch, function, include, and more

View All 47 Core Processors 10 External Plugins

Start Building Today

Download WebHarvest v2.2.0 and automate your data extraction workflows

Java 11+ • Apache License 2.0 • Production Ready