Complete plugin reference
57 plugins for web scraping: 47 built-in core plugins + 10 extension modules for mail, FTP, ZIP, database, and browser automation.
WebHarvest 2.2 uses a modern plugin architecture with automatic discovery and dependency injection:
Built-in plugins with @CorePlugin
annotation, included in webharvest-core. No external
dependencies required.
Separate plugin modules (mail, ftp, zip, webbrowser) requiring additional dependencies. Add as Maven/Gradle dependencies.
Essential plugins for basic scraping
HTTP requests with full client features
Query XML/HTML with XPath expressions
Pattern matching with regular expressions
Split text into tokens
While loop iteration
Define reusable functions
Call defined functions
Parse HTML to clean XML
Read, write and list files
Key plugins for advanced workflows
Define variables
Retrieve variable values
Set variable values
Iterate over collections
Execute JavaScript/Groovy/BeanShell
Generate dynamic content
Convert JSON to XML
Convert XML to JSON
Conditional execution
Advanced XQuery 3.1 queries
Include external configurations
Commonly used utility plugins
Advanced and specialized plugins
Configuration element
Extract node values
XML processing
XQuery expression body
XQuery parameters
XSLT transformations
XSLT stylesheet
HTTP request headers
HTTP parameters
Function call parameters
Else clause for if
Regex pattern definition
Regex source text
Regex match results
Create lists of items
Process body content
Optional extension modules (require separate dependencies)
Send emails with attachments via SMTP
EXTERNALFTP connection and operations
EXTERNALDownload files from FTP server
EXTERNALUpload files to FTP server
EXTERNALList files on FTP server
EXTERNALCreate and extract ZIP archives
EXTERNALAccess individual ZIP archive entries
EXTERNALAutomated browser interactions
EXTERNAL