Built by developers,
for developers.

Nearly two decades of innovation in web data extraction.

From its creation in 2006 to today's modern v2.2, WebHarvest has evolved through community contributions, architectural innovations, and a complete revival in 2024-2025.

WebHarvest Community

Our Journey

Nearly two decades of innovation in web data extraction

2006-2010

The Beginning & Foundation

Vladimir Nikic creates WebHarvest in 2006 as an open-source solution for web data extraction. Designed the processor-based architecture, XML configuration system, and Swing-based IDE. The project quickly gains traction in the Java community.

2010-2013

Community Growth & Active Development

Active development continues with contributions from Alexander Wajda (2010-2012), Robert Bala (2012-2013), Piotr Dyraga (2012-2013), and Maciej Czapiewski (2012-2013). New plugin architecture development, major features, and community contributions. WebHarvest becomes a trusted tool for web scraping projects worldwide.

2013-2024

Maintenance & Dormancy

After active development phase concluded in 2013, the project entered a period of maintenance mode. The codebase remained stable but dependencies and architecture gradually became outdated, setting the stage for comprehensive modernization.

2024

Revival & Architecture Completion

After years of dormancy, the project returns to active development. Robert Bala completes work on version 2.1.0, finishing the migration from old processor architecture to new auto-discovered plugin system. However, dependencies and IDE remained at 2012 level, requiring further modernization.

2025

Complete Modernization & New IDE

Major breakthrough: new plugin architecture fully implemented, core completely rewritten, most of the project refactored from scratch. Modern web-based IDE replaces legacy Swing GUI. Migration to Apache HttpClient 5, Java 11+, comprehensive test coverage (2600+ tests), updated dependencies, extensive documentation. Work continues on test improvements and bug fixes. WebHarvest v2.2 enters modern era.

Contributors

WebHarvest is the result of dedication and expertise from talented developers around the world

VN

Vladimir Nikic

Original Author

Created WebHarvest in 2006. Designed the original processor-based architecture, XML configuration system, and Swing-based IDE. Led the project through its formative years and established the foundation that built its reputation in the Java community.

2006-2013
RB

Robert Bala

Project Admin & Lead Developer

Core contributor since 2012. Designed modern plugin architecture, event-driven system, and cloud-ready infrastructure that powers today's framework. Led comprehensive modernization (2024-2025): Java 11+ migration, web-based IDE, HttpClient 5 upgrade, and extensive test coverage improvements.

2012-2025
AW

Alexander Wajda

Developer

Contributed to core functionality and plugin system development during the growth phase of the project.

2010-2012
PD

Piotr Dyraga

Developer

Enhanced the framework with valuable contributions to various components and features.

2012-2013
MC

Maciej Czapiewski

Developer

Contributed to the development and improvement of the WebHarvest framework.

2012-2013

Key Achievements

19
Years of Development
47+
Core Plugins
2,500+
Test Cases
100%
Open Source

License & Legal

WebHarvest is licensed under the BSD License, allowing free use, modification, and distribution

Original Work:
Copyright © 2006-2013, Vladimir Nikic

Modified Work:
Copyright © 2006-2025, the original author or authors

View Full Attribution (NOTICE)

Get WebHarvest

Download the latest version and join our community