Professional Web IDE
for WebHarvest

Develop, test, and debug scrapers directly in your browser

Built on Monaco Editor with real-time execution, per-tab workspaces, session tracking, and Apple HIG-inspired UX design.

WebHarvest IDE

IDE Features

Everything you need for productive scraper development

Monaco Editor

VS Code's powerful editor engine with XML syntax highlighting, auto-completion, bracket matching, and multi-cursor support.

Syntax Highlighting Auto-Complete

Real-Time Execution

WebSocket-based live streaming of logs, progress bars, variable values, and execution results as your scraper runs.

Live Logs Progress

Multi-Tab Workspaces

Apple HIG-inspired design: each tab is an isolated workspace with dedicated logs, results, variables, and session state.

Per-Tab State Apple HIG

Session Management

Built-in session tracking with UUID-based IDs, lifecycle states, duration metrics, HTTP metrics, and plugin breakdown visualization.

Metrics Tracking History

One-Click Execution

Run configurations directly from the editor with instant feedback. Pause execution to inspect variables, resume to continue, or stop to cancel completely.

Quick Run Pause/Resume Stop

Embedded Server

Standalone executable with embedded Jetty server. No external dependencies, no complex setup—just run and go.

Jetty Standalone

Installation

Get up and running in 3 simple steps

1

System Requirements

  • Java: Java 11 or higher (OpenJDK or Oracle JDK)
  • Memory: Minimum 512 MB RAM (1 GB recommended)
  • Browser: Chrome, Firefox, Safari, or Edge (modern versions)
  • Ports: Port 8080 available (or configure custom port)
2

Download & Extract

Download the IDE distribution from SourceForge:

terminal
wget https://sourceforge.net/projects/web-harvest/files/webhervest/2.2.0/webharvest-ide-2.2.0.jar/download -O webharvest-ide-2.2.0.jar
java -jar webharvest-ide-2.2.0.jar
3

Launch IDE

Start the IDE server:

terminal
java -jar webharvest-ide-2.2.0.jar

Or use the provided launcher script:

terminal
./start-ide.sh    # Linux/macOS
start-ide.bat      # Windows

Then open your browser at: http://localhost:8080

Using the IDE

Quick guide to key workflows

Creating Configurations

  1. Click New Tab button in tab bar
  2. Enter configuration name
  3. Start typing your XML configuration
  4. Use Ctrl/Cmd + Space for auto-completion
  5. Save with Ctrl/Cmd + S

Running Scrapers

  1. Open or create a configuration in a tab
  2. Click Run button (or press F5)
  3. Watch real-time logs in Logs panel
  4. View results in Results panel
  5. Check session metrics in Session panel

Debugging

  1. Add <echo> tags to output debug info
  2. Check Variables panel for values
  3. Review Logs for errors and warnings
  4. Use Session panel for performance metrics
  5. Stop execution with Stop button if needed

Multi-Tab Workflow

  1. Open multiple configurations in separate tabs
  2. Each tab maintains its own execution state
  3. Switch tabs to view different logs/results
  4. Run multiple scrapers without losing context
  5. Close tabs with × button

Saving & Loading

  1. Save: Ctrl/Cmd + S to save current tab
  2. Load: Click Load button to open file
  3. Export: Right-click tab → Export
  4. Files are saved to ~/.webharvest/configs/
  5. Use Recent menu for quick access

Keyboard Shortcuts

  • Ctrl/Cmd + S - Save configuration
  • Ctrl/Cmd + N - New tab
  • F5 - Run scraper
  • Shift + F5 - Stop execution
  • Ctrl/Cmd + / - Toggle comment
  • Ctrl/Cmd + Space - Auto-complete

Enhanced Metrics Status (v2.2.0)

The IDE now includes HTTP Metrics and Plugin Breakdown visualization in the Session panel. Here's what you need to know:

Production-Ready Metrics

  • Duration - Real millisecond timing (100% accurate)
  • Elements Processed - Actual variable count (100% accurate)
  • Processing Rate - Calculated from real data (100% accurate)
  • Session History - Complete audit trail with REST API

Simulated Metrics (UI Demonstration)

  • HTTP Metrics - Shows example values (~5KB, 150ms response time)
  • Plugin Breakdown - Estimated times based on XML analysis

Why simulated? Real-time tracking requires deep EventBus integration with HttpService and all plugin lifecycle events. This involves modifying 50+ classes and extensive testing.

Coming in v2.3.0 (Q1 2026)

  • Real HTTP Tracking - Actual request/response metrics via EventBus
  • Real Plugin Timing - Precise timing for each plugin execution
  • Memory Metrics - JVM heap usage and GC events
  • Persistent History - SQLite database (survives restart)

UI is production-ready - All visualizations work perfectly
🔄 Real data coming soon - Foundation complete, integration in progress

Architecture

How the IDE works under the hood

Components

  • Frontend: Monaco Editor, WebSocket client, React-like state management
  • Backend: Jetty server, WebSocket endpoint, execution manager
  • Core: WebHarvest session API, token tracker, event system

Session API (v2.2)

  • UUID-based IDs: Unique identifier for each execution
  • Lifecycle: PENDING → RUNNING → COMPLETED/FAILED/CANCELLED
  • Metrics: Start time, end time, duration, element count, processing rate
  • Thread-safe: Concurrent session registry with atomic operations

Token Tracking

  • HTTP_REQUEST: Count of HTTP requests made
  • HTTP_BYTES: Total bytes transferred
  • CPU_TIME: Processing time in milliseconds
  • MEMORY_PEAK: Peak memory usage

Event System

  • SessionCreatedEvent: Session initialized
  • SessionStartedEvent: Execution began
  • SessionCompletedEvent: Finished successfully
  • SessionFailedEvent: Error occurred
  • SessionCancelledEvent: User cancelled

Advanced Configuration

Customize IDE behavior

Server Settings

Configure via ide.properties or JVM arguments:

ide.properties
# Port (default: 8080)
server.port=8080

# Context path (default: /)
server.contextPath=/

# Max threads (default: 200)
server.maxThreads=200

# Session timeout (default: 30 minutes)
server.sessionTimeout=1800

JVM arguments:

terminal
java -Dserver.port=9090 -jar webharvest-ide-2.2.0.jar

Execution Settings

Configure WebHarvest runtime behavior:

ide.properties
# Max HTTP connections (default: 20)
http.maxConnections=20

# Connection timeout (default: 30s)
http.connectionTimeout=30000

# Enable token tracking (default: true)
tracking.enabled=true

# Log level (default: INFO)
logging.level=INFO

Security

For production deployments, consider:

  • Authentication: Add reverse proxy (nginx, Apache) with auth
  • HTTPS: Use SSL certificate with reverse proxy
  • Firewall: Restrict access to trusted networks
  • CORS: Configure allowed origins in ide.properties

Troubleshooting

Common issues and solutions

Port Already in Use

Problem: Port 8080 is already in use

Solution:

terminal
# Option 1: Find and kill process
lsof -i :8080
kill -9 <PID>

# Option 2: Use different port
java -Dserver.port=9090 -jar webharvest-ide-2.2.0.jar

Execution Not Starting

Problem: Click Run but nothing happens

Solution:

  • Check browser console for errors (F12)
  • Verify WebSocket connection (should see green dot)
  • Check IDE logs for errors
  • Hard refresh browser (Ctrl/Cmd + Shift + R)

Browser Cache Issues

Problem: Old UI/styles showing after update

Solution:

terminal
# Hard refresh browser
Ctrl/Cmd + Shift + R

# Or clear cache manually
Settings → Privacy → Clear Browsing Data → Cached Images/Files

Java Version Error

Problem: Unsupported class file major version

Solution:

terminal
# Check Java version
java -version  # Should be 11+

# Install Java 11+ if needed
# Ubuntu/Debian:
sudo apt install openjdk-11-jdk

# macOS (Homebrew):
brew install openjdk@11

Memory Issues

Problem: OutOfMemoryError or slow performance

Solution:

terminal
# Increase heap size
java -Xmx2g -jar webharvest-ide-2.2.0.jar

# Or configure in start script
export JAVA_OPTS="-Xmx2g -Xms512m"
./start-ide.sh

Session Not Showing

Problem: Session panel empty after execution

Solution:

  • Ensure execution completed (check status)
  • Check Logs panel for errors
  • Try switching tabs and back
  • Verify IDE version is v2.2+ (session tracking added in v2.2)

Ready to Start?

Download WebHarvest IDE and experience professional web scraping development

Java 11+ Required • Apache License 2.0 • 15+ MB Download