Neal Rauhauser: Mining Data Science Central

IO Mapping
Neal Rauhauser
Neal Rauhauser

Mining Data Science Central

I was browsing LinkedIn a little while ago and I noticed the Data Science Central group. One of my contacts had shared something from it and the charter looked interesting, so I clicked ‘join’.

Data Science Central is the industry’s online resource for big data practitioners. From Analytics to Data Integration to Visualization, the Data Science Central approach is to provide a community experience that includes a robust editorial platform, social interaction, forum-based technical support, the latest in technology, tools and trends –and industry job opportunities.

This got me a notice that I’d have to sign up for DataScienceCentral‘s website. This isn’t that unusual, I got a similar pitch from Rapid7 a few days ago, and this led to fresh installs of Nessus and Metasploit, neither of which I’d touched in several years. Once I signed up for the site it wanted me to make a profile. I used to be really resistant to this sort of thing, but this is an undeniable trend in professional networking sites.

My profile URL included my account name, NealRauhauser, and it was very straightforward. I poked around for a few minutes and I found there are 21 members per page, 559 total pages, and the nearly 12,000 professional profile URLs are embedded in these pages. I opened a shell, wrote a little script, and if my math is right by around 21:30 eastern I will have them all, but at a fetch rate that won’t cause their server to melt down.

I’ll have to parse them and then decide what to do with the resulting URLs. I could feed them to OpenCalais or Alchemy via Maltego, but 12,000 at once would swamp those Named Entity Recognition services from the perspective of Maltego’s public transform servers, and probably overrun my computer’s memory in the process.

I did a trial run with the first seven featured members …

Read full post with graphics.