CSS Selectors
in XPath

Modern HTML querying with familiar CSS syntax

WebHarvest 2.2 supports CSS selectors as a simpler alternative to XPath for HTML querying, powered by jsoup library.

What's New

CSS selector support in XPath plugin

W3C Standard

Uses standard CSS selector syntax, not proprietary extensions

Simpler Syntax

a.link vs //a[@class='link']

Familiar

Same syntax as jQuery and CSS - web developers know it already

Backward Compatible

XPath still works! CSS selectors are optional alternative

Basic Usage

How to use CSS selectors in WebHarvest

css-selector-basic.xml
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://org.webharvest/schema/2.1/core">
<!-- CSS Selector Mode: type="css" -->
<xpath type="css" expression="a.product-link">
    ${html}
</xpath>

<!-- Extract text from all matching elements -->
<xpath type="css" expression="h1.title">
    ${html}
</xpath>

<!-- Get attribute value -->
<xpath type="css" expression="img.product" attribute="src">
    ${html}
</xpath>
</config>
💡 Key Point: Add type="css" attribute to use CSS selectors instead of XPath.

Common Patterns

CSS selector examples for typical scraping tasks

By Class

selector-by-class.xml
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://org.webharvest/schema/2.1/core">
<xpath type="css" expression=".product-title">${html}</xpath>
<xpath type="css" expression="div.result">${html}</xpath>
</config>

By ID

selector-by-id.xml
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://org.webharvest/schema/2.1/core">
<xpath type="css" expression="#main-content">${html}</xpath>
</config>

By Attribute

selector-by-attribute.xml
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://org.webharvest/schema/2.1/core">
<xpath type="css" expression="a[href*='product']">${html}</xpath>
<xpath type="css" expression="input[type='submit']">${html}</xpath>
</config>

Related Resources