All Classes Interface Summary Class Summary Enum Summary Exception Summary Error Summary Annotation Types Summary
Class |
Description |
AbstractConfigSource |
Base class for other more specialized implementations of ConfigSource
interface.
|
AbstractDatabasePlugin |
|
AbstractElementDef |
|
AbstractProcessor<TDef extends IElementDef> |
Base processor that contains common processor logic.
|
AbstractProcessorDecorator<TDef extends IElementDef> |
Abstract implementation of Processor interface which realizes
Decorator design pattern and overrides almost all Processor 's methods
(without #run() method) in the default way.
|
AbstractRefreshableResolver |
Abstract ConfigurableResolver implementation serving as a base class
for definition resolvers supposed to support multiple refresh operations.
|
AbstractRegistry<K,V> |
Abstract class implementing Registry interface.
|
AlreadyBoundException |
Checked exception thrown if object with the given name is already bound in
the registry.
|
AnnotatedPluginsPostProcessor |
|
AnnotatedPluginsPostProcessorBuilder |
Builder for creating AnnotatedPluginsPostProcessor instances.
|
Assert |
Created by IntelliJ IDEA.
|
AttributeHolder |
Implementors of this interface can serve as the backing store for
Objects that are scoped within an (subclass of) AttributeHolderScope .
|
AttributeHolderScope<AHT extends AttributeHolder> |
A Scope that uses an AttributeHolder as the backing store for
its scoped objects.
|
AuthSSLInitializationError |
|
AuthSSLProtocolSocketFactory |
AuthSSLProtocolSocketFactory can be used to validate the identity of the HTTPS
server against a list of trusted certificates and to authenticate to the HTTPS
server using a private key.
|
AuthSSLX509TrustManager |
AuthSSLX509TrustManager can be used to extend the default X509TrustManager
with additional trust decisions.
|
Autoscanned |
Indicates that the annotated plugin should be automatically registered in
system.
|
BaseException |
Basic Scraper exception.
|
BaseTemplater |
Simple templater - replaces ${expression} sequences in string with evaluated expressions.
|
BodyProcessor |
Processor which executes only body and returns variables list.
|
BodyProcessor.Builder |
|
BufferConfigSource |
Implementation of ConfigSource that uses plain old string as source
of configuration/
|
Cache<K,V> |
Cache containing semi-persistent, key-value mappings.
|
CallDef |
Definition of function call.
|
CallParamDef |
Definition of function call parameter.
|
CallParamProcessor |
Variable definition http param processor.
|
CallProcessor |
Function call processor.
|
CallProcessor10 |
Function call processor.
|
CaseDef |
Definition of case processor.
|
CaseProcessor |
Conditional processor.
|
CatchProcessor |
|
ClassLoaderUtil |
Class loading utility - used for loading JDBC driver classes and plugin
classes.
|
ClassPathScanner |
Component capable of scanning types available on Java class path that meet
certain criteria.
|
ClassPathScannerImpl |
|
CommonUtil |
Basic evaluation utilities
|
CommonUtil.IntPair |
Contains pair of integer values
|
Config |
Represents lazily loaded configuration object that is loaded from associated
ConfigSource on demand with call to Config.reload() method.
|
Config.Version |
|
ConfigDef |
Web Harvest's definition of the <config> element.
|
ConfigFactory |
Represents simple Config factory.
|
ConfigLocationVisitor |
Represents object implementing Visitor patter.
|
ConfigLocationVisitor.VisitableLocation |
|
ConfigParser |
Created by rba on 16.07.2017.
|
ConfigProcessor |
|
ConfigSource |
Represents source of XML configuration object.
|
ConfigSource.Location |
Just a marker interface to indicate the actual type of location eg. file,
url or any other
|
ConfigSourceFactory |
Represents factory object capable to instantiate ConfigSource
objects either from URL , File or just raw XML configuration.
|
ConfigurableResolver |
Interface to be implemented by the web harvest configuration elements
definition resolvers.
|
ConfigurationException |
Configuration exception - occurred during parsing configuration file etc.
|
ConnectionFactory |
Factory responsible for creating SQL Connection s basis on the
provided parameters.
|
ConnectionProxy |
|
ConstantDef |
Definition of constant processor.
|
ConstantProcessor |
Constant processor.
|
ContextFactory |
|
ContextHolder |
|
DatabaseException |
Database execution exception - occured during executing database processor.
|
DatabaseJNDIPlugin |
Web Harvest plugin supporting database operations within JNDI - enabled
environment.
|
DatabasePlugin |
Support for database operations.
|
DbColumnDescription |
Information about database record columns.
|
DbParamPlugin |
DB param plugin - can be used only inside database plugin.
|
DbRowVariable |
Special variable used for database records
|
DebugFileLogger |
|
DefaultDriverManager |
Default implementation of the DriverManger interface.
|
DefaultHandlerHolder |
|
DefaultHarvest |
Default implementation of Harvest interface.
|
DefaultPluginRegistrationStrategy |
Default implementation of PluginRegistrationStrategy.
|
DefaultProcessorExecutionStrategy |
Default implementation of processor execution strategy.
|
DefaultProcessorFactory |
Created by rbala on 13.07.2017.
|
Definition |
Annotation on Processor classes
specifying the IElementDef this particular
processor is described by.
|
DefinitionResolver |
Class contains information and logic to validate and crate definition classes for
parsed xml nodes from Web-Harvest configurations.
|
DefinitionResolverProvider |
Created by rba on 22.07.2017.
|
DefVarPlugin |
Support for database operations.
|
DriverManager |
DriverManager allows to register database drivers placed in the
arbitrary locations.
|
DynamicScopeContext |
Created by IntelliJ IDEA.
|
EasySSLProtocolSocketFactory |
EasySSLProtocolSocketFactory can be used to creats SSL Socket s
that accept self-signed certificates.
|
EasyX509TrustManager |
EasyX509TrustManager unlike default X509TrustManager accepts
self-signed certificates.
|
ElementInfo |
|
ElementInfoFactory |
Deprecated. |
ElementInfoPluginRegistrationStrategy |
ElementInfo-based implementation of PluginRegistrationStrategy.
|
ElementName |
Name of the single element (tag in configuraton xml).
|
ElementsRegistry |
Registry containing all valid Web Harvest definition elements.
|
ElementsRegistryImpl |
|
ElseProcessor |
|
EmptyDef |
Definition of empty element.
|
EmptyProcessor |
Empty processor = executes body and returns empty variable.
|
EmptyVariable |
Empty variable.
|
EnvironmentException |
Thrown when current JVM does not support a specific required feature.
|
ErrMsg |
Error messages utility
|
EventBasedStatusHolder |
StatusHolder implementation which bases on Scraper's events.
|
EventBusTypeListener |
Guice TypeListener implementation that is responsible for
registration of objects managed by Guice in singleton EventBus .
|
EventHandler<T> |
Represents an object that is handler for particular type of event.
|
EventSink |
Dispatches events to registered listeners.
|
ExitDef |
Definition of exit processor.
|
ExitProcessor |
Exit processor.
|
FileConfigSource |
Implementation of ConfigSource that uses a file system as
source of XML configurations.
|
FileDef |
Definition of file proessor.
|
FileException |
File management exception.
|
FileListIterator |
|
FileProcessor |
File processor.
|
FtpDelPlugin |
Ftp Del plugin - can be used only inside ftp plugin for deleting file on remote directory.
|
FtpGetPlugin |
Ftp Get plugin - can be used only inside ftp plugin for retrieving file from remote directory.
|
FtpListPlugin |
Ftp List plugin - can be used only inside ftp plugin for listing file in working remote directory.
|
FtpMkdirPlugin |
Ftp Mkdir plugin - can be used only inside ftp plugin for creating directory on remote directory.
|
FtpPlugin |
FTP processor
|
FtpPluginException |
Runtime exception for FtpPlugin
|
FtpPutPlugin |
Ftp Put plugin - can be used only inside ftp plugin for storing file to remote directory.
|
FtpRmdirPlugin |
Ftp Mkdir plugin - can be used only inside ftp plugin for removing subdirectory on remote directory.
|
FunctionDef |
Definition of user-defined function.
|
FunctionException |
Function processor exception.
|
FunctionProcessor |
Function definition processor.
|
GetVarPlugin |
|
HandlerHolder |
Represents an object that serves purpose as storage of EventHandler
(supporting different types of events).
|
Harvest |
Web-Harvest application facade that provides control over creation of
scraping processors Harvester and dispatching of scraping events.
|
Harvester |
Represents scraping session object that is associated with particular
configuration and can be executed multiple times.
|
Harvester.ContextInitCallback |
Context initialization callback that is invoked for all newly created
context objects shortly before of scraping session.
|
HarvesterEvent |
Represent an event object that is either addressed to particular
Harvester instance or represents state change that happened
on it.
|
HarvesterEventSink |
Implementation of EventSink intended to guarantee scraping scope's
events delivery.
|
HarvesterFactory |
Guice dynamic factory helper interface that help to instantiate
Harvester objects.
|
HarvestLoadCallback |
Callback interface representing successfully loaded scraping configuration
that is projected as collection of IElementDef objects.
|
HasReader |
Represent an object holding character stream Reader .
|
HtmlToXmlDef |
Definition of HTML to XML rensformation task.
|
HtmlToXmlProcessor |
Advanced HTML to XML processor using Chain of Responsibility pattern
with Strategy pattern for different HTML parsing strategies.
|
HttpClientManager |
HTTP client functionality.
|
HttpClientManager.ProxySettings |
|
HttpClientManager.ProxySettings.Builder |
|
HttpDef |
Definition of HTTP processor.
|
HttpException |
Http exception - occures during http requests.
|
HttpHeaderDef |
Definition of HTTP header.
|
HttpHeaderProcessor |
Variable definition http header processor.
|
HttpInfo |
Class offers access to HTTP client and response details to the user.
|
HttpModule |
Google Guice module containing bindings for Web-Harvest's HTTP-related
components.
|
HttpParamDef |
Definition of HTTP parameter.
|
HttpParamInfo |
Information about http request parameter.
|
HttpParamProcessor |
Variable definition http param processor.
|
HttpProcessor |
Http processor.
|
HttpResponseWrapper |
Class defines http server response.
|
IElementDef |
Marker for element definition.
|
IfDef |
Definition of conditional processor.
|
IfProcessor |
|
IncludeDef |
Definition of include element.
|
IncludeProcessor |
Include processor.
|
IncludeVisitor |
|
InjectorHelper |
Guice static injector helper.
|
JNDIConnectionFactory |
|
JsonToXmlPlugin |
Converter from JSON to XML
|
JSRScriptEngineAdapter |
Adapter design pattern implementation.
|
JSRScriptEngineFactory |
|
KeyValuePair<T> |
|
ListProcessor |
|
ListVariable |
List variable - String wrapper.
|
LockedRegistry<K,V> |
Generic locking registry implementation that follows 'decorator' design
pattern.
|
LoopDef |
Definition of loop processor.
|
LoopProcessor |
Loop list processor.
|
MailAttachPlugin |
Mail attachment plugin - can be used only inside mail plugin.
|
MailPlugin |
Mail sending processor.
|
MailPluginException |
Runtime exception for MailPlugin
|
NestedContextFactory |
|
NodeVariable |
Node variable - Single node wrapper.
|
ParserException |
General parsing exception.
|
PluginDef |
|
PluginDefinitionBuilder |
Builder for creating WebHarvestPluginDef instances.
|
PluginException |
Runtime exception occurred during plugin processors registration or creation.
|
PluginFactory |
Factory for creating plugin instances.
|
PluginRegistrationStrategy |
Strategy interface for plugin registration.
|
PostConstructListener |
TypeListener implementation enabling Guice support for JSR-250
@PostConstruct annotation.
|
Processor<TDef extends IElementDef> |
|
ProcessorExecutionContext |
Context object that holds execution state and configuration.
|
ProcessorExecutionStrategy |
Strategy interface for processor execution.
|
ProcessorFactory |
Created by rbala on 13.07.2017.
|
ProcessorReferenceGenerator |
Generator for processor reference documentation from source code annotations.
|
ProcessorStartEvent |
Event informing that the specified Processor has been started.
|
ProcessorStopEvent |
Event informing that the specified Processor has successfully
finished its work.
|
RealBodyProcessor |
|
RegexpDef |
Definition of regular expression processor.
|
RegexpPatternProcessor |
|
RegexpProcessor |
Regular expression replace processor.
|
RegexpResultProcessor |
|
RegexpSourceProcessor |
|
Registry<K,V> |
Generic registry interface following 'registry' design pattern.
|
ResolverPostProcessor |
Allows for custom modification of the web harvest configuration element
definition resolvers.
|
ResourcePathToURITransformer |
An implementation of Transformer interface which supports
transformation from given resource name (resource path as string) to its
URI .
|
ReturnDef |
Definition of function's return statement.
|
ReturnProcessor |
Function's return value processor.
|
RunningStatusController<TDef extends IElementDef> |
AbstractProcessorDecorator implementation which decorates
Processor#run(Scraper, DynamicScopeContext) method in the way that it
enters to the Monitor using Monitor.Guard verifying that processing
is not paused.
|
RunningStatusGuard |
Implementation of Monitor.Guard verifying that current status of
configuration's processing is 'running'.
|
RuntimeConfig |
Facade for runtime objects needed for specific processors' execution.
|
SAXConfigParser |
Created by rba on 16.07.2017.
|
SchemaComponentFactory |
|
SchemaFactory |
Factory creating an instance of Schema which is a base of XML
validation process.
|
SchemaFactoryImpl |
|
SchemaResolver |
Interface to be implemented by the web harvest XML schema sources resolvers.
|
SchemaResolverPostProcessor |
Allows for custom modification of the web harvest XML schema sources
resolvers.
|
SchemaResourcesPostProcessor<T> |
SchemaResolverPostProcessor implementation capable of transforming
specified XML schema resources which could be e.g. a paths to these resources
or some resource object.
|
SchemaSource |
An POJO object which contains XML schema's Source .
|
ScopeAttributeHolder |
Represents an object that implements AttributeHolder and is intended
to serve purpose as container for Guice scope's beans.
|
Scraper |
Basic runtime class.
|
ScraperContext |
Context of scraper execution.
|
ScraperContext10 |
Deprecated. |
ScraperExecutionContinuedEvent |
|
ScraperExecutionEndEvent |
Event informing that the execution of WebScraper has been
successfully completed.
|
ScraperExecutionErrorEvent |
Event informing that during the execution of Scraper some exception
has occurred.
|
ScraperExecutionExitEvent |
Event informing that the execution of configuration has exited.
|
ScraperExecutionPausedEvent |
|
ScraperExecutionStartEvent |
|
ScraperExecutionStoppedEvent |
Event informing that the execution of Harvester has been stopped.
|
ScraperModule |
Guice module for Web-Harvest configuration.
|
ScraperScope |
Scraping scope container.
|
ScraperState |
An enum containing all available Scraper's states.
|
ScraperXPathException |
XPath exception - occured during executing xpath processor.
|
ScraperXQueryException |
XQuery exception - occured during executing xquery processor.
|
Scraping |
Guice helper annotation used to indicate methods expected to be
invoked in exclusive scraping scope.
|
ScrapingAware |
Interface to be implemented by any object that wishes to be notified of
scraping scope possibly it runs in.
|
ScrapingAwareTypeListener |
Implementation of TypeListener that is responsible for registration
of detected ScrapingAware .
|
ScrapingHarvester |
Default implementation of Harvester interface aimed to perform data
extraction from remote websites.
|
ScrapingInterceptor |
Guice AOP interceptor responsible for taking action for method annotated with
Scraping annotation.
|
ScrapingInterceptor.ScrapingAwareHelper |
Guice aware helper class that maintains collection of registered
ScrapingAware listeners.
|
ScrapingScope |
Guice helper annotation used to indicate types to be instantiated and
kept in scraping scope.
|
ScriptDef |
Definition of script processor.
|
ScriptEngine |
Interface providing scripting functionality.
|
ScriptEngineException |
Script engine exception - thrown when there is a problem with a script engine itself, not a script source.
|
ScriptEngineFactory |
|
ScriptException |
Script execution exception - occurred during script compilation or evaluation.
|
ScriptingLanguage |
Created by IntelliJ IDEA.
|
ScriptingVariable |
This variables are unwrapped when passing into script engines
and preserve the mutable collections in the original state when passing over Scraper
|
ScriptProcessor |
Script processor - executes script defined in the body.
|
ScriptSource |
Created by IntelliJ IDEA.
|
SetVarPlugin |
Support for database operations.
|
SleepPlugin |
|
Stack<T> |
Simple Stack (LIFO queue).
|
StandaloneConnectionPool |
|
StatusHolder |
Component responsible for providing information about current status of
being processed configuration.
|
StoppedOrExitedProcessor<TDef extends IElementDef> |
|
StrictSSLProtocolSocketFactory |
A SecureProtocolSocketFactory that uses JSSE to create
SSL sockets.
|
StylesheetProcessor |
|
SystemUtilities |
Collection of useful constants and functions that are available in each
scraper context.
|
TargetNamespace |
This annotation may be used on the web harvest plugin class to indicate
one or more target XML namespaces for the plugin.
|
TemplateDef |
Definition of template task.
|
TemplateException |
Template exception - occured during executing templete processor.
|
TemplateProcessor |
Template processor.
|
TemplaterException |
Templater exception.
|
TextDef |
Definition of text processor.
|
TextProcessor |
Text processor.
|
ThreadLocalCache<K,V> |
Cache implementation based on ThreadLocal , that is, allowing
each thread to have separate cache bindings.
|
TokenizePlugin |
Support for database operations.
|
TransformationException |
Checked exception thrown if transformation process has failed.
|
Transformer<I,O> |
A component which is capable of transforming object from one type to another
type.
|
TransformerPair<I,T,O> |
Implementation of Transformer interface connecting two other
Transformer s where the output type of the first one is the same as
input of the second one.
|
TryDef |
Definition of try-catche element.
|
TryProcessor |
OnError processor - sets .
|
TypeMatchers |
|
Types |
Variable types.
|
URIToSchemaSourceTransformer |
An implementation of Transformer interface which supports
transformation from given resource URI to appropriate instance of
SchemaSource .
|
URLConfigSource |
Implementation of ConfigSource that uses a HTTP protocol as
source of XML configurations.
|
UserException |
Exception explicitly thrown by a user.
|
ValueOfPlugin |
|
VarDef |
Definition of variable call.
|
VarDefDef |
Definition of variable.
|
VarDefProcessor |
Deprecated.
|
Variable |
Variables Interface.
|
VariableException |
Variable processor exception.
|
VariableName |
|
VarProcessor |
Deprecated.
|
WebBrowserJavascriptPlugin |
Evaluates javascript on the page inside headless web browser.
|
WebBrowserLoadPlugin |
Load page inside headless web browser.
|
WebBrowserlPluginException |
Runtime exception for MailPlugin
|
WebBrowserPlugin |
Support headless web browser supported by PhantomJS open source project.
|
WebBrowserRenderPlugin |
Evaluates javascript on the page inside headless web browser.
|
WebHarvestPlugin |
Base for all user-defined plugins.
|
WebHarvestPluginDef |
Definition of all plugin processors.
|
WebScraper |
|
WHConstants |
Created by IntelliJ IDEA.
|
WhileDef |
Definition of while loop processor (while-empty and while-not-empty).
|
WhileProcessor |
Conditional processor.
|
WorkingDir |
Guice binder helper annotation for scraper's working directory path
(indicates where temporary files are kept).
|
XmlAttribute |
Information about single xml attribute
|
XMLConfig |
Implementation of ConfigSource capable to work with XML
based configurations.
|
XmlNode |
|
XmlNodeWrapper |
|
XMLProcessor |
|
XmlToJsonPlugin |
Converter from XML to JSON
|
XmlUtil |
XML utils - contains common logic for XML handling
|
XmlValidator |
|
XPathDef |
Definition of XPath processor.
|
XPathProcessor |
XQuery processor.
|
XQExpression |
|
XQParamProcessor |
|
XQueryDef |
Definition of XQuery processor.
|
XQueryExpressionPool |
Class represnts simple pool for XQuery expressions.
|
XQueryExternalParamDef |
Definition of XQuery external parameter.
|
XQueryProcessor |
XQuery processor.
|
XsltDef |
Definition of user-defined function.
|
XsltException |
Template exception - occured during executing XSLT processor.
|
XsltProcessor |
XSLT processor.
|
ZipEntryPlugin |
Zip entry plugin - can be used only inside zip plugin.
|
ZipPlugin |
ZIP processor
|
ZipPluginException |
Runtime exception for ZipPlugin
|