This is an engine for crawling data from any websites if you have a proper config file and the corresponding Processor by inherit from the PageProcessor class.

After create all required stuffs, add the link config entry to the crawler-config.xml file. After that, insert a record into the Scheduler table. Run the program, press 'n' when being asked to crawl data from the new scheduler record, otherwise press 'o'.

Last edited Mar 19, 2014 at 7:04 AM by khailq, version 3