Nearly two decades of innovation in web data extraction.
From its creation in 2006 to today's modern v2.2, WebHarvest has evolved through community contributions, architectural innovations, and a complete revival in 2024-2025.
Nearly two decades of innovation in web data extraction
Vladimir Nikic creates WebHarvest in 2006 as an open-source solution for web data extraction. Designed the processor-based architecture, XML configuration system, and Swing-based IDE. The project quickly gains traction in the Java community.
Active development continues with contributions from Alexander Wajda (2010-2012), Robert Bala (2012-2013), Piotr Dyraga (2012-2013), and Maciej Czapiewski (2012-2013). New plugin architecture development, major features, and community contributions. WebHarvest becomes a trusted tool for web scraping projects worldwide.
After active development phase concluded in 2013, the project entered a period of maintenance mode. The codebase remained stable but dependencies and architecture gradually became outdated, setting the stage for comprehensive modernization.
After years of dormancy, the project returns to active development. Robert Bala completes work on version 2.1.0, finishing the migration from old processor architecture to new auto-discovered plugin system. However, dependencies and IDE remained at 2012 level, requiring further modernization.
Major breakthrough: new plugin architecture fully implemented, core completely rewritten, most of the project refactored from scratch. Modern web-based IDE replaces legacy Swing GUI. Migration to Apache HttpClient 5, Java 11+, comprehensive test coverage (2600+ tests), updated dependencies, extensive documentation. Work continues on test improvements and bug fixes. WebHarvest v2.2 enters modern era.
WebHarvest is the result of dedication and expertise from talented developers around the world
Original Author
Created WebHarvest in 2006. Designed the original processor-based architecture, XML configuration system, and Swing-based IDE. Led the project through its formative years and established the foundation that built its reputation in the Java community.
Project Admin & Lead Developer
Core contributor since 2012. Designed modern plugin architecture, event-driven system, and cloud-ready infrastructure that powers today's framework. Led comprehensive modernization (2024-2025): Java 11+ migration, web-based IDE, HttpClient 5 upgrade, and extensive test coverage improvements.
Developer
Contributed to core functionality and plugin system development during the growth phase of the project.
Developer
Enhanced the framework with valuable contributions to various components and features.
Developer
Contributed to the development and improvement of the WebHarvest framework.
WebHarvest is licensed under the BSD License, allowing free use, modification, and distribution
Original Work:
Copyright © 2006-2013, Vladimir Nikic
Modified Work:
Copyright © 2006-2025, the original author or authors
Download the latest version and join our community