Stephen E. Arnold: Discover the Open Source Alternative to the Autonomy Crawler Tool

IO Tools, Software
Stephen E. Arnold
Stephen E. Arnold

Discover the Open Source Alternative to the Autonomy Crawler

February 7, 2014

Whether Autonomy’s product success is true or false, as proprietary software it comes with a large price tag. The average small business or user cannot afford to purchase HP Autonomy’s IDOL Crawler. Open source is the best alternative, but for the longest time you could not get software comparable to IDOL Crawler. Norconex says that has changed in the article, “An Open Source Crawler For Autonomy IDOL.” Norconex released an HP Autonomy IDOL Committer for its open source Web crawler Norconex HTTP Collector.

The HTTP Collector is available for Github. The developer encourages people to download it and contribute to the project. Its features are mostly the same as those from HP Autonomy HTTP Connector.

The article states:

“Most key features of HP Autonomy HTTP Connector are available in Norconex HTTP Collector, including document changes detection on incremental crawls and purging documents from IDOL for deleted web pages. New ones are introduced, such as having different hit interval at different time of the day and the ability to overwrite pretty much every part of the web crawling flow with your own implementation logic. The IDOL Committer has been tested on diverse public and internal web sites with great performance.”

We can learn from the open source community that if there is not a piece of software you want, all you have to do is wait until a developer makes it or you can take the initiative to do it yourself.

Whitney Grace, February 07, 2014

Sponsored by ArnoldIT.com, developer of Augmentext