WebHarvest
Plugin Ecosystem

57 production-ready plugins for web scraping

47 core plugins built-in, 10 optional extensions for specialized needs. Modern architecture with automatic discovery and dependency injection.

WebHarvest Plugins

Looking for something specific?

Technical Documentation

Complete reference for all 57 plugins with examples, parameters, and usage details.

View Plugin Reference →

Getting Started

New to WebHarvest? Start with our step-by-step tutorial and examples.

Start Tutorial →

Powerful, Modular, Production-Ready

47 core plugins built-in, optional extensions when you need them

WebHarvest delivers 47 production-ready core plugins right out of the box - everything you need for web scraping with zero external dependencies.

The framework's modular architecture allows you to add 10 optional extension modules only when your project needs them:

Why this architecture? This keeps your core lightweight and lets you add only the dependencies you actually need - smaller JARs, faster startup, and reduced complexity.

Core Plugins (47)

Ready to use immediately

Everything you need for web scraping:

HTTP/HTTPS requests
XPath & XQuery
Regular expressions
JSON/XML transform
Control flow
Variables & functions
JavaScript/Groovy
File I/O

Extension Modules (10)

Optional add-ons with specific dependencies

Specialized capabilities for enterprise use:

Database (JDBC)
Email (SMTP/IMAP)
FTP/FTPS
ZIP archives
Browser automation

Why Choose WebHarvest's Plugin Architecture?

Proven in production, trusted by developers worldwide

46
Built-in Plugins
Zero setup required
15+
Years in Production
Battle-tested
2.6k+
Test Cases
Comprehensive coverage
8
Extension Modules
Enterprise capabilities

Developer Benefits

  • Start scraping in minutes, not hours
  • No dependency hell or version conflicts
  • Clean, intuitive XML configuration
  • Extensive documentation and examples

Production Benefits

  • Proven reliability in enterprise deployments
  • Small footprint, fast deployment
  • Stable core with evolving extensions
  • Production-ready with comprehensive testing

Extension Modules

Optional modules with heavy dependencies - add only what your project needs

Ready to extend your scraping power? Each module is a single Maven dependency that adds specialized enterprise capabilities to your WebHarvest projects.

Tip: Add only the modules you need - keep your project lightweight and focused.

Database Plugin

Module: webharvest-database

Connect to any database seamlessly. Execute SQL queries, store scraped data, and integrate with your existing data infrastructure. Perfect for data warehouses, analytics pipelines, and enterprise applications.

View Documentation

Key Features:

  • Universal JDBC support - MySQL, PostgreSQL, Oracle, SQL Server, SQLite
  • Connection pooling - Efficient resource management
  • Named connections - Reuse across multiple queries
  • XML result mapping - Seamless integration with WebHarvest data flow
  • Transaction support - ACID compliance for critical operations

FTP Plugin

Module: webharvest-ftp

Secure file transfers made simple. Upload scraped data, download resources, and manage files on remote servers. Essential for data pipelines, backup systems, and distributed scraping architectures.

View Documentation

Key Features:

  • FTP & FTPS support - Secure file transfers with encryption
  • Directory operations - Create, list, and navigate remote directories
  • Binary & text modes - Handle any file type correctly
  • Connection modes - Passive/active for firewall compatibility
  • Progress tracking - Monitor large file transfers

Mail Plugin

Module: webharvest-mail

Automate email workflows with style. Send notifications, reports, and alerts with rich HTML formatting. Perfect for monitoring systems, data delivery pipelines, and automated customer communications.

View Documentation

Key Features:

  • Universal SMTP support - Gmail, Office365, custom servers
  • Rich content - HTML emails with embedded images and styling
  • Attachment support - Send scraped data as files
  • Authentication - OAuth2, SSL/TLS, and basic auth
  • Template system - Dynamic content with variables
  • Authentication support

ZIP Plugin

Module: webharvest-zip

Compress, organize, and distribute data efficiently. Create archives from scraped content, extract downloaded files, and manage data packages. Essential for backup systems, data distribution, and storage optimization.

View Documentation

Features:

  • Create ZIP archives
  • Extract ZIP contents
  • List archive contents
  • Compression levels

Web Browser Plugin

Module: webharvest-webbrowser

Handle modern web applications with ease. Execute JavaScript, interact with dynamic content, and scrape Single Page Applications (SPAs). Perfect for React, Vue, Angular apps, and sites with complex AJAX interactions.

View Documentation

Features:

  • Headless browser automation
  • JavaScript execution
  • DOM interaction
  • Screenshot capture

How the Plugin System Works

Auto-discovery, registration, and dependency injection

Auto-Discovery

  • Classpath Scanning: Plugins discovered automatically at startup
  • Annotations: @Autoscanned + @Definition
  • No Configuration: Just add JAR to classpath
  • Namespace Support: Multiple XML namespaces

Dependency Injection

  • Guice Integration: Full DI support via @Inject
  • Service Location: InjectorHelper for manual lookup
  • HttpService: Shared HTTP client with connection pooling
  • Lifecycle Management: Automatic initialization and cleanup

XSD Generation

  • Automatic Schemas: XSD generated from plugin annotations
  • IDE Support: Auto-completion in XML editors
  • Validation: Compile-time configuration checking
  • Documentation: Self-documenting plugins

Want to Build Custom Plugins?

Extend WebHarvest with your own functionality using the modern @CorePlugin architecture

Learn @CorePlugin architecture • Automatic discovery • Real examples • Best practices

Ready to Get Started?

From zero to scraping in minutes

Start with Core

Begin with the 47 built-in plugins. No setup required - just start scraping immediately.

Try Examples

Add Extensions

Need databases or email? Add the modules you need with a single Maven dependency.

Read Docs

Build Custom

Create your own plugins with the @CorePlugin annotation. Integrate any API or service.

Plugin Development Guide

Ready to Extend?

Download extension modules or build your own custom plugins

All Plugins • Apache License 2.0 • Production Ready