Complete guide to debugging WebHarvest configurations
Learn 5 powerful debugging patterns to inspect responses, variables, and execution flow when scraping complex websites.
Common debugging challenges in web scraping
"I am developing a scraping tool for a complex javascript website... frequently, I am getting the wrong response. However, it is not easy to spot such instances. Debugging a configuration is really complicated without having a possibility to look at what WebHarvest is actually processing."
The simplest and most effective debugging technique
Save every response to a file so you can inspect it in your browser or text editor. This lets you see exactly what HTML WebHarvest is processing.
<!-- Fetch page -->
<def var="response">
<http url="https://complex-website.com/api/data">
<http-header name="X-API-Key" value="${apiKey}"/>
</http>
</def>
<!-- 🐛 DEBUG: Save response to file -->
<file path="debug/response-${sys.timestamp()}.html" action="write">
${response}
</file>
<!-- Now process it -->
<def var="data">
<xpath expression="//div[@class='result']">
<html-to-xml>${response}</html-to-xml>
</xpath>
</def>
${sys.timestamp()} or ${index} to create
unique filenames for each request. This prevents overwriting and lets you see the sequence of requests.
See what's actually in your variables
Use <log> to print variable values at key points in your scraper.
<!-- Extract data -->
<def var="productName">
<xpath expression="//h1[@class='product-title']/text()">${page}</xpath>
</def>
<!-- 🐛 DEBUG: Print variable -->
<log message="Product name: ${productName}"/>
<!-- Continue processing... -->
Save output at each transformation step
For complex pipelines (fetch → parse → transform → extract), save output after each step.
<!-- Step 1: Fetch -->
<def var="raw">
<http url="https://example.com"/>
</def>
<file path="debug/1-raw.html" action="write">${raw}</file>
<!-- Step 2: Parse HTML to XML -->
<def var="xml">
<html-to-xml>${raw}</html-to-xml>
</def>
<file path="debug/2-xml.xml" action="write">${xml}</file>
<!-- Step 3: Extract data -->
<def var="data">
<xpath expression="//div[@class='result']">${xml}</xpath>
</def>
<file path="debug/3-data.txt" action="write">${data}</file>
<!-- Step 4: Transform -->
<def var="clean">
<regexp pattern="<[^>]+>" replace="">${data}</regexp>
</def>
<file path="debug/4-clean.txt" action="write">${clean}</file>
Log only when something unexpected happens
Use <if> to log only when conditions fail, reducing noise.
<!-- Extract product price -->
<def var="price">
<xpath expression="//span[@class='price']/text()">${page}</xpath>
</def>
<!-- 🐛 DEBUG: Log if price is empty or invalid -->
<if condition="${empty(price) or price == ''}">
<log level="ERROR" message="⚠️ Price extraction failed!"/>
<file path="debug/failed-page.html" action="write">${page}</file>
</if>
Toggle debugging on/off with a single variable
Create a DEBUG_MODE variable to control all debugging output.
<!-- Set debug mode (true/false) -->
<def var="DEBUG_MODE">true</def>
<!-- Your scraper logic -->
<def var="data">
<http url="https://example.com"/>
</def>
<!-- Conditional debug save -->
<if condition="${DEBUG_MODE}">
<file path="debug/response.html" action="write">${data}</file>
<log message="DEBUG: Saved response to debug/response.html"/>
</if>
DEBUG_MODE to true during developmentfalse in productionEssential steps when debugging complex websites
Always save HTTP responses to files using <file path="debug/...">
Use <log> to print variable values at key transformation points
Save output after each step in complex transformation pipelines
Use DEBUG_MODE variable to control all debugging output