| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Home | SourceForge | Forums | Contact | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Example: Google imagesLike in Example #2, user-defined function is used from included configuration file functions.xml. Function download-multipage-list collects all URLs of Google Images search for the specified keyword. Here, it downloads at most 5 result pages. After that loop processor is used to iterate over collected URLs and to download and save images locally.
<?xml version="1.0" encoding="UTF-8"?> <!-- Expects following initial variable: search - search expression --> <!-- Updated on February, 9th, 2011 --> <config charset="UTF-8"> <include path="functions.xml"/> <!-- defines search keyword and start URL --> <var-def name="search" overwrite="false">banana</var-def> <var-def name="url"><template>http://images.google.com/images?q=${search}&hl=en</template></var-def> <!-- collects all image URLs --> <var-def name="imgLinks"> <call name="download-multipage-list"> <call-param name="pageUrl"><var name="url"/></call-param> <call-param name="nextXPath">//a[@id="pnnext"]/@href</call-param> <call-param name="itemXPath">//img[contains(@src, 'images?q=tbn')]/@src</call-param> <call-param name="maxloops">5</call-param> </call> </var-def> <!-- download images and saves them to the files --> <loop item="link" index="i" filter="unique"> <list> <var name="imgLinks"/> </list> <body> <file action="write" type="binary" path="google_images/${search}_${i}.gif"> <http url="${sys.fullUrl(url, link)}"/> </file> </body> </loop> </config> The result of extraction is collection of 100 image files stored on the file system. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Copyright © 2006-2013 by vnikic at users.sourceforge.net |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||