Best practices for respectful web scraping
Learn how to control scraping speed, avoid overloading servers, and implement responsible scraping patterns.
Protecting servers and ensuring reliable scraping
Too many requests can crash small websites or trigger rate limits
Aggressive scraping gets your IP blocked permanently
Good scraping citizens don't abuse free services
Slower scraping = fewer errors = more successful extractions
Add pauses between requests
Use <sleep>
to pause execution for a specified time (in milliseconds).
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://org.webharvest/schema/2.1/core">
<loop item="url">
${urlList}
<!-- Fetch page -->
<http url="${url}"/>
<!-- ⏱️ WAIT 2 seconds before next request -->
<sleep time="2000"/>
</loop>
</config>
Vary delay times to appear more human
Random delays (1-3 seconds) are less detectable than fixed intervals.
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://org.webharvest/schema/2.1/core">
<loop item="url">
${urlList}
<http url="${url}"/>
<!-- Random delay between 1-3 seconds -->
<def var="randomDelay">
<script>1000 + Math.random() * 2000</script>
</def>
<sleep time="${randomDelay}"/>
</loop>
</config>