Plug-in mechanism enabled - users may develop their own processors and seamlessly
integrate them to Web-Harvest.
New processors developed:
database: perform select/insert/update/delete operations
against specified database (JDBC driver is required on classpath).
mail: send emails with optional attachments.
zip: crate ZIP archives with specified content.
ftp: access FTP server and perform common operations: list, get, put, del, etc.
tokenize: split text to list of elements.
json-to-xml: convert JSON formatted value to XML.
xml-to-json: convert XML to JSON formatted value.
file processor updated with action to list files with specified name filter.
http processor updated to support multipart forms (enabling file uploads).
delimiter attributes added to
empty attribute added to
in order to prevent accumulating of large results that may produce memory leaks. This is replacement
empty processor inside the loop body.
Several new attributes added to
regexp processor to enable regular expression fine-tunning.
Complete access to http response headers.
Simple debugging added: user may define breakpoints where execution pauses and
runtime values can be seen.
Charset selection enabled in settings dialog for configuration files.
Editor auto-completion improvements - auto-completion is available for attribute
values wherever possible.
Editor improvements: copying lines/selection, deleting lines, (un)commenting xml fragments.
List of recently opened files added to File menu.
Dependency libraries updated:
HtmlCleaner updated to version 2.1.
Saxon updated to version 9 (XSLT 2.0, XQuery 1.0, XPath 2.0).
Number of new attributes supported in
Number of bug fixes.
Java 1.4 is no more supported - JRE 1.5 or higher is required.
Graphical user interface is introduced giving the environment for easier configuration
development and testing.
html-to-xml processor, which is based on HtmlCleaner, now exposes attributes
for controlling cleaner's behaviour.
Now it is possible to choose the favourite scripting engine or even mix them in a single
Web-Harvest configuration. This option is supported by adding new attributes to
Access to HTTP client is supported by introducing implicit context varibale
Now it is possible to check important HTTP response values, like
or even to obtain instance of
http.client and manipulate it in the runtime.
cookie-policy added to the
specifying the way HttpClient manage cookies.
Command-line use is improved by adding several new parameters.
For more comfortable use of Web-Harvest context variables in the script engines'
runtime scopes, several handy methods are added to the class
previous versions of Web-Harvest).
Several useful methods added in implicit Web-Harvest context variable
sys.defineVariable(varName, varValue, [overwrite]).
overwrite added in the
giving possibility to specify whether existing variables with specified name
will be overwriten or not.
<exit condition=... message=.../> is introduced
in order to support conditional execution break.
Encoding selection in
http processor is changed - if no explicitely
charset attribute, one given from HTTP response is used
instead to read downloaded text content.
NTLM proxy authentication scheme is supported.
Performance improvements and bug fixes.
html-to-xml parser is changed - HtmlCleaner is used instead of TagSoup. The
bad point in this is that some existing Web-Harvest configurations may need
corrections of XPath or XQuery processors. On the other hand, lot of problems
previously existing are now solved.
Script processor is introduced. It adds scripting support based on
BeanShell scripting language.
Check more detailed description in User manual, and see an
example illustrating it's power.
template processor is now based also on BeanShell instead of
OGNL, this way giving possibilty to share the same variables and methods
with script processing.
type is now added to
node(). It specifies type of external XQuery
parameter. Up to the Web-Harvest 0.5 this parameter was implicitely declared
at the beginning of XQuery expression and was always of
type. Now on, for each parameter defined
xq-param the matching explicit declaration inside
is required (
declare variable $var_name as var_type external;).
For more details see User manual and
the example showing the usage of XQuery
A couple of new constructors is added to the class
allowing loading configuration from URL or from arbitrary input stream.
include processors now support both absolute
and relative paths. File paths are regarded as absolute if they begin with
\, where X is a letter.
In order to avoid ambiguity in exchanging values with
template processing, Web-Harvest variables are case-sensitive
from this version.