Token Tracking &
Resource Monitoring

Track usage, enforce quotas, enable billing

Granular resource tracking for HTTP requests, bandwidth, CPU time, and memory with built-in support for quota enforcement and billing integration.

Resource Types

What tokens measure and why

HTTP_REQUEST

Unit: Count

  • Tracked: Every <http> call
  • Use Case: Rate limiting, API quotas
  • Example: 1,250 requests for product catalog scrape
  • Billing: $0.001 per request (typical pricing)

HTTP_BYTES

Unit: Bytes

  • Tracked: Download + upload bandwidth
  • Use Case: Data transfer monitoring, CDN costs
  • Example: 2.5 GB for data lake ingestion
  • Billing: $0.10 per GB

CPU_TIME

Unit: Milliseconds

  • Tracked: XPath, XQuery, Regex, Script processing
  • Use Case: Performance optimization, cost allocation
  • Example: 180,000 ms (3 minutes) for complex transformation
  • Billing: $0.02 per CPU-hour

MEMORY_PEAK

Unit: Bytes

  • Tracked: Maximum heap usage during execution
  • Use Case: Resource planning, instance sizing
  • Example: 1.2 GB peak for large XML processing
  • Billing: $0.01 per GB-hour

TokenTracker API

Core classes and usage patterns

TokenTracker Interface

Package: org.webharvest.runtime.tracking

TokenUsage.java
public interface TokenTracker {
    // Increment tokens
    void increment(ResourceType type);
    void increment(ResourceType type, 
                   long amount);
    
    // Query usage
    long getTokenCount(ResourceType type);
    TokenUsage getUsage(); // Snapshot
    
    // Reset (for testing)
    void reset();
    void reset(ResourceType type);
}

TokenUsage (Immutable)

Package: org.webharvest.runtime.tracking

TokenUsage.java
public final class TokenUsage {
    // Accessors
    long getHttpRequests();
    long getHttpBytes();
    long getCpuTimeMillis();
    long getMemoryPeakBytes();
    
    // Convenience
    double getHttpBytesGB();
    double getCpuTimeHours();
    double getMemoryPeakGB();
    
    // Export
    String toJson();
    Map toMap();
}

ResourceType Enum

Package: org.webharvest.runtime.tracking

TokenUsage.java
public enum ResourceType {
    HTTP_REQUEST("http.request", "count"),
    HTTP_BYTES("http.bytes", "bytes"),
    CPU_TIME("cpu.time", "milliseconds"),
    MEMORY_PEAK("memory.peak", "bytes");
    
    String getMetricName();
    String getUnit();
}

Usage Examples

Tracking, quotas, and billing scenarios

1

Basic Token Tracking

TokenUsage.java
// Execute with token tracking
ScraperSession session = service.executeAsync(config, client).get();
session.awaitCompletion();

// Get token usage
TokenTracker tracker = session.getTokenTracker();
TokenUsage usage = tracker.getUsage();

System.out.println("HTTP Requests: " + usage.getHttpRequests());
System.out.println("Bandwidth: " + usage.getHttpBytesGB() + " GB");
System.out.println("CPU Time: " + usage.getCpuTimeHours() + " hours");
System.out.println("Memory Peak: " + usage.getMemoryPeakGB() + " GB");

// Export to JSON
String json = usage.toJson();
// {"httpRequests":1250,"httpBytes":47185920,"cpuTime":12500,"memoryPeak":268435456}
2

Quota Enforcement (Pre-Execution)

QuotaPolicy.java
// Define quota policy
public class QuotaPolicy {
    private final long maxHttpRequests;
    private final long maxHttpBytes;
    
    public boolean isWithinQuota(ClientContext client, Config config) {
        // Get client's historical usage
        List sessions = 
            registry.getSessionsByClientId(client.getClientId());
        
        long totalRequests = sessions.stream()
            .map(ScraperSession::getTokenTracker)
            .filter(Objects::nonNull)
            .mapToLong(t -> t.getUsage().getHttpRequests())
            .sum();
        
        // Check quota
        if (totalRequests >= maxHttpRequests) {
            throw new QuotaExceededException(
                "HTTP request quota exceeded: " + totalRequests + 
                "/" + maxHttpRequests);
        }
        
        return true;
    }
}

// Use before execution
if (quotaPolicy.isWithinQuota(client, config)) {
    service.executeAsync(config, client);
} else {
    System.err.println("Quota exceeded - upgrade plan or wait");
}
3

Billing Integration (Stripe Example)

StripeIntegration.java
// Listen for completed sessions
@Subscribe
public void onSessionCompleted(SessionCompletedEvent event) {
    ScraperSession session = registry.getSession(event.getSessionId()).get();
    TokenUsage usage = session.getTokenTracker().getUsage();
    
    // Calculate cost
    double cost = 0.0;
    cost += usage.getHttpRequests() * 0.001;      // $0.001 per request
    cost += usage.getHttpBytesGB() * 0.10;        // $0.10 per GB
    cost += usage.getCpuTimeHours() * 0.02;       // $0.02 per CPU-hour
    cost += usage.getMemoryPeakGB() * 0.01;       // $0.01 per GB-hour
    
    // Create Stripe invoice line item
    InvoiceItemCreateParams params = InvoiceItemCreateParams.builder()
        .setCustomer(session.getClientId())
        .setAmount((long)(cost * 100)) // Convert to cents
        .setCurrency("usd")
        .setDescription(String.format(
            "WebHarvest Session %s: %d requests, %.2f GB",
            session.getSessionId(),
            usage.getHttpRequests(),
            usage.getHttpBytesGB()))
        .putMetadata("sessionId", session.getSessionId())
        .putMetadata("httpRequests", String.valueOf(usage.getHttpRequests()))
        .putMetadata("httpBytes", String.valueOf(usage.getHttpBytes()))
        .build();
    
    InvoiceItem.create(params);
    
    System.out.printf("Billed client %s: $%.4f%n", 
        session.getClientId(), cost);
}
4

Multi-Tenant Usage Aggregation

UsageReporter.java
// Generate monthly usage report
public class UsageReporter {
    
    public Map getMonthlyUsageByClient(
            YearMonth month) {
        
        Map> sessionsByClient = 
            registry.getAllSessions().stream()
                .filter(s -> isInMonth(s, month))
                .collect(Collectors.groupingBy(
                    ScraperSession::getClientId));
        
        Map aggregated = new HashMap<>();
        
        for (Map.Entry> entry : 
                sessionsByClient.entrySet()) {
            
            String clientId = entry.getKey();
            List sessions = entry.getValue();
            
            // Sum all tokens for this client
            long httpRequests = 0;
            long httpBytes = 0;
            long cpuTime = 0;
            long memoryPeak = 0;
            
            for (ScraperSession session : sessions) {
                TokenUsage usage = session.getTokenTracker().getUsage();
                httpRequests += usage.getHttpRequests();
                httpBytes += usage.getHttpBytes();
                cpuTime += usage.getCpuTimeMillis();
                memoryPeak = Math.max(memoryPeak, usage.getMemoryPeakBytes());
            }
            
            TokenUsage total = new TokenUsage(
                httpRequests, httpBytes, cpuTime, memoryPeak);
            aggregated.put(clientId, total);
        }
        
        return aggregated;
    }
}

// Generate report
Map usage = reporter.getMonthlyUsageByClient(
    YearMonth.of(2025, 10));

usage.forEach((client, tokens) -> {
    System.out.printf("Client: %s%n", client);
    System.out.printf("  HTTP Requests: %,d%n", tokens.getHttpRequests());
    System.out.printf("  Bandwidth: %.2f GB%n", tokens.getHttpBytesGB());
    System.out.printf("  CPU Time: %.2f hours%n", tokens.getCpuTimeHours());
});
5

Real-Time Quota Monitoring

TokenMonitor.java
// Monitor token usage during execution
public class TokenMonitor {
    private final long quotaLimit = 10_000; // 10k requests
    
    public void monitorSession(ScraperSession session) {
        new Thread(() -> {
            while (!session.getStatus().isTerminal()) {
                TokenTracker tracker = session.getTokenTracker();
                long requests = tracker.getTokenCount(
                    ResourceType.HTTP_REQUEST);
                
                // Check approaching limit
                if (requests > quotaLimit * 0.8) {
                    System.err.printf(
                        "WARNING: 80%% quota used (%d/%d)%n",
                        requests, quotaLimit);
                }
                
                // Hard limit
                if (requests >= quotaLimit) {
                    System.err.println(
                        "QUOTA EXCEEDED - Cancelling session");
                    session.cancel();
                    break;
                }
                
                Thread.sleep(1000);
            }
        }).start();
    }
}
6

Export to Analytics Platform

MetricsExporter.java
// Export to Prometheus metrics
@Subscribe
public void onSessionCompleted(SessionCompletedEvent event) {
    ScraperSession session = registry.getSession(event.getSessionId()).get();
    TokenUsage usage = session.getTokenTracker().getUsage();
    String clientId = session.getClientId();
    
    // Prometheus counters
    httpRequestsTotal.labels(clientId).inc(usage.getHttpRequests());
    httpBytesTotal.labels(clientId).inc(usage.getHttpBytes());
    cpuMillisecondsTotal.labels(clientId).inc(usage.getCpuTimeMillis());
    
    // Prometheus gauges
    memoryPeakBytes.labels(clientId).set(usage.getMemoryPeakBytes());
    sessionDurationSeconds.labels(clientId).set(
        session.getMetrics().getDuration().getSeconds());
}

// Export to DataDog
Map tags = new HashMap<>();
tags.put("client_id", session.getClientId());
tags.put("session_id", session.getSessionId());
tags.put("status", session.getStatus().name());

statsd.count("webharvest.http.requests", usage.getHttpRequests(), tags);
statsd.gauge("webharvest.http.bytes", usage.getHttpBytes(), tags);
statsd.histogram("webharvest.cpu.time", usage.getCpuTimeMillis(), tags);

Billing Scenarios

Real-world pricing models and implementations

Architecture Example

Note: This section demonstrates the technical capabilities of WebHarvest's token tracking system for billing and quota enforcement. The code is production-ready and can be integrated into cloud platforms. However, this is not a commercial service offering - WebHarvest is an open-source project. These examples show how you can build your own pricing models if deploying WebHarvest as a hosted service.

WebHarvest Cloud Pricing Tiers

FREE TIER

$0/month
100 HTTP requests/day
100 MB bandwidth/day
10 min CPU time/day
5 concurrent sessions
Email support

STARTER

$29/month
10,000 HTTP requests/month
10 GB bandwidth/month
5 hours CPU time/month
20 concurrent sessions
Overage: $0.001/request
Email support
POPULAR

PROFESSIONAL

$99/month
50,000 HTTP requests/month
50 GB bandwidth/month
20 hours CPU time/month
50 concurrent sessions
Overage: $0.0008/request
Priority queue • SLA 99.5%
Priority email + chat

ENTERPRISE

$499/month
250,000 HTTP requests/month
500 GB bandwidth/month
100 hours CPU time/month
Unlimited sessions
Overage: $0.0005/request
Dedicated instances • SLA 99.9%
24/7 phone + engineer

USAGE-BASED (Pay-as-you-go)

HTTP Requests: $0.001 each
Bandwidth: $0.10 per GB
CPU Time: $0.02 per CPU-hour
Memory: $0.01 per GB-hour
Minimum: $10/month

AWS Marketplace Integration

Bill through AWS Marketplace with automatic metering:

  • Report usage hourly via AWS Metering API
  • Dimensions: HttpRequests, Bandwidth, CPUTime
  • Customer billed on AWS invoice
  • WebHarvest receives revenue share

Stripe Subscription

Monthly subscriptions with usage-based billing:

  • Base subscription: $29-$499/month
  • Overage charges: Metered billing API
  • Invoice line items per session
  • Customer portal for usage visibility

Internal Cost Allocation

Track costs per project/department:

  • ClientContext metadata: project, department, cost-center
  • Aggregate usage by metadata tags
  • Generate chargeback reports
  • Export to ERP/accounting systems

Integration Examples

Connect token tracking to external systems

CloudWatch Logs (AWS)

TokenUsage.java
@Subscribe
public void onSessionCompleted(SessionCompletedEvent e) {
    ScraperSession s = registry.getSession(e.getSessionId()).get();
    TokenUsage u = s.getTokenTracker().getUsage();
    
    Map log = new HashMap<>();
    log.put("sessionId", s.getSessionId());
    log.put("clientId", s.getClientId());
    log.put("httpRequests", u.getHttpRequests());
    log.put("httpBytes", u.getHttpBytes());
    log.put("cpuTime", u.getCpuTimeMillis());
    log.put("duration", s.getMetrics().getDuration().toString());
    
    cloudwatchLogs.putLogEvent("webharvest-sessions", 
        new Gson().toJson(log));
}

Prometheus Metrics

TokenUsage.java
// Define metrics
Counter httpRequests = Counter.build()
    .name("webharvest_http_requests_total")
    .help("Total HTTP requests")
    .labelNames("client_id")
    .register();

Histogram cpuTime = Histogram.build()
    .name("webharvest_cpu_seconds")
    .help("CPU time in seconds")
    .labelNames("client_id")
    .register();

// Update on completion
@Subscribe
public void onCompleted(SessionCompletedEvent e) {
    ScraperSession s = ...;
    TokenUsage u = ...;
    
    httpRequests.labels(s.getClientId())
        .inc(u.getHttpRequests());
    cpuTime.labels(s.getClientId())
        .observe(u.getCpuTimeMillis() / 1000.0);
}

Custom Webhook

TokenUsage.java
@Subscribe
public void onSessionCompleted(SessionCompletedEvent e) {
    ScraperSession s = ...;
    TokenUsage u = ...;
    
    // Build webhook payload
    Map payload = Map.of(
        "sessionId", s.getSessionId(),
        "clientId", s.getClientId(),
        "status", s.getStatus().name(),
        "usage", u.toMap(),
        "duration", s.getMetrics().getDuration().toString()
    );
    
    // POST to webhook URL
    HttpRequest request = HttpRequest.newBuilder()
        .uri(URI.create("https://api.example.com/webhooks/usage"))
        .header("Content-Type", "application/json")
        .POST(HttpRequest.BodyPublishers.ofString(
            new Gson().toJson(payload)))
        .build();
    
    httpClient.sendAsync(request, 
        HttpResponse.BodyHandlers.ofString());
}