Session
History

Track and analyze all scraper executions

Complete audit trail of all scraper executions in the WebHarvest IDE. Every execution is tracked with full metrics, performance data, and results.

Overview

Complete audit trail of all scraper executions

v2.2.0 Metrics Status

Production-Ready: Duration, Elements, Processing Rate, Session History

Simulated (Demo): HTTP Metrics, Plugin Breakdown - Real tracking in v2.3.0

Persistent Storage

All sessions stored in memory during IDE runtime

Performance Tracking

Duration, elements processed, processing rate

Advanced Filtering

By status, pagination, time range

REST API

Programmatic access via HTTP endpoint

Quick Start

View History in IDE

  1. Open IDE: http://localhost:8080
  2. Run some configurations to build history
  3. Click Session tab in right sidebar
  4. Click View All Sessions button
  5. Browse sessions - newest first
  6. Click any session to see full details

Access via API

bash
# Get last 20 sessions
curl http://localhost:8080/api/sessions/history

# Get only completed sessions
curl http://localhost:8080/api/sessions/history?status=COMPLETED

# Pagination
curl http://localhost:8080/api/sessions/history?limit=50&offset=0

API Reference

GET /api/sessions/history

Returns list of executed sessions with full metrics.

Query Parameters

Parameter Type Default Description
limit integer 20 Maximum sessions to return (max: 100)
offset integer 0 Skip first N sessions (pagination)
status string - Filter by status (COMPLETED, FAILED, etc.)

Response Format

JSON Response
{
  "sessions": [
    {
      "sessionId": "abc123-def456-789...",
      "timestamp": "2025-10-13T12:00:00Z",
      "status": "COMPLETED",
      "duration": 150,
      "configName": "data-scraper.xml",
      "elementsProcessed": 25,
      "processingRate": 166.67
    }
  ],
  "total": 42,
  "limit": 20,
  "offset": 0
}

Session Fields

Field Type Description
sessionId string (UUID) Unique session identifier
timestamp string (ISO 8601) Session start time (UTC)
status enum COMPLETED, FAILED, RUNNING, PENDING, CANCELLED
duration long Execution duration in milliseconds
configName string Name of executed configuration file
elementsProcessed integer Number of variables/elements processed
processingRate double Elements per second

Use Cases

1. Performance Monitoring

Track execution times across multiple runs:

JavaScript
// Fetch history
const response = await fetch('/api/sessions/history?limit=100');
const data = await response.json();

// Calculate average duration
const avgDuration = data.sessions
    .reduce((sum, s) => sum + s.duration, 0) / data.sessions.length;

console.log(`Average execution time: ${avgDuration}ms`);

2. Debugging Failed Executions

Filter only failed sessions:

bash
curl "http://localhost:8080/api/sessions/history?status=FAILED" | jq

3. Audit Trail

Export complete execution history:

JavaScript
// Get all sessions
const history = await fetch('/api/sessions/history?limit=1000')
    .then(r => r.json());

// Export to CSV
const csv = ['Timestamp,Config,Status,Duration,Elements']
    .concat(history.sessions.map(s => 
        `${s.timestamp},${s.configName},${s.status},${s.duration},${s.elementsProcessed}`
    ))
    .join('\n');

// Download
const blob = new Blob([csv], {type: 'text/csv'});
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'session-history.csv';
a.click();

4. Success Rate Analysis

Calculate success/failure ratio:

JavaScript
const sessions = await fetch('/api/sessions/history?limit=100')
    .then(r => r.json());

const completed = sessions.sessions.filter(s => s.status === 'COMPLETED').length;
const failed = sessions.sessions.filter(s => s.status === 'FAILED').length;
const total = sessions.total;

console.log(`Success rate: ${(completed / total * 100).toFixed(1)}%`);
console.log(`Failure rate: ${(failed / total * 100).toFixed(1)}%`);

Best Practices

Tip: Monitor Long-Running Jobs

Set up periodic checks for jobs that run too long:

JavaScript
setInterval(async () => {
    const response = await fetch('/api/sessions/history?limit=1');
    const latest = await response.json();
    const session = latest.sessions[0];
    
    if (session.status === 'RUNNING' && session.duration > 60000) {
        console.warn('Job running >60s:', session.sessionId);
        // Alert or take action
    }
}, 5000); // Check every 5 seconds

Tip: Performance Baseline

Compare current runs to historical average:

JavaScript
async function isSlowerThanUsual(currentDuration, configName) {
    const history = await fetch('/api/sessions/history?limit=50')
        .then(r => r.json());
    
    const similar = history.sessions
        .filter(s => s.configName === configName && s.status === 'COMPLETED');
    
    const avgDuration = similar.reduce((sum, s) => sum + s.duration, 0) 
        / similar.length;
    
    return currentDuration > avgDuration * 1.5; // 50% slower
}

Limitations

Future Enhancements

Planned for v2.3.0

Planned for v2.4.0

UI Screenshots

Session History List View

┌─────────────────────────────────────────────┐
│  Session History        6 sessions │
├─────────────────────────────────────────────┤
│ ✓ COMPLETED    2025-10-13 14:20:00         │
│ 📄 05_data_pipeline.xml                     │
│ ⏱ 165ms  📦 9 elements  ⚡ 54.5 elem/s    │
│ 🔑 97266a7e...                              │
├─────────────────────────────────────────────┤
│ ✓ COMPLETED    2025-10-13 14:18:30         │
│ 📄 03_json_api.xml                          │
│ ⏱ 320ms  📦 15 elements  ⚡ 46.9 elem/s   │
│ 🔑 37e37905...                              │
├─────────────────────────────────────────────┤
│ ✗ FAILED       2025-10-13 14:15:10         │
│ 📄 invalid_config.xml                       │
│ ⏱ 5ms  📦 0 elements  ⚡ 0.0 elem/s        │
│ 🔑 fa860c00...                              │
└─────────────────────────────────────────────┘
│         [🔄 Refresh]                        │
└─────────────────────────────────────────────┘

Related Documentation