Stephen E. Arnold: Web Crawling ToolKit

IO Tools
0Shares
Stephen E. Arnold
Stephen E. Arnold

Short Honk: Crawl the Web at Scale

Short honk: I read “Aduana: Link Analysis to Crawl the Web at Scale.” The write up explains an open source project which can copy content “dispersed all over the Web.” Keep in mind that the approach focuses primarily on text. Aduana is a special back end for the developer’s tool for speeding up crawls which is built on top of a data management system. Read more.

Direct to Tool: Scrapinghub Platform

Opt in for free daily update from this free blog. Separately The Steele Report ($11/mo) offers weekly text report and live webinar exclusive to paid subscribers, who can also ask questions of Robert. Or donate to ask questions directly of Robert.