Stephen E. Arnold: Uncurated Big Data via Data Access Proxy (DAP) and Data Tiling Service (DTS)

Access, Advanced Cyber/IO
Stephen E. Arnold
Stephen E. Arnold

Brown Dog Chases Answer to Uncurated Data

Depending on one’s field, it may seem like every bit of information in existence is now just an Internet search away. However, as researchers well know, there is a wealth of potentially crucial information that is still difficult to access. In fact, GCN tells us that marketing firm IDC estimates up to 90 percent of “big data” falls into this category. GCN also turns our attention to a potential solution in, “Brown Dog Digs Into the Deep, Dark Web.”

Brown Dog is a project out of the National Center for Supercomputing Application [NCSA] at the University of Illinois at Urbana-Champaign. In 2013, the team received a $10 million, five-year award from the National Science Foundation for the project. Already, they have developed two services that facilitate access to uncurated data collections. The write-up reports:

“The first, called Data Access Proxy (DAP), transforms unreadable files into readable ones by linking together a series of computing and translational operations behind the scenes. Similar to an Internet gateway, the configuration of the DAP would be entered into a user’s machine settings. Thereafter, data requests over HTTP would first be examined by the proxy to determine if the native file format is readable on the client device.

“The second tool, the Data Tilling Service (DTS), lets individuals search collections of data, using an existing file to discover similar files in the data. For example, while browsing an online image collection, a user could drop an image of three people into the search field, and the DTS would return images in the collection that also contain three people. If the DTS encounters a file format it is unable to parse, it would use the Data Access Proxy to make the file accessible. It also indexes the data and extracts and appends metadata to files to give users a sense of the type of data they are encountering.”

The article notes that Brown Dog’s makers are building on previous software development, and that they hope to “bring together every possible source of automated help already in existence.” That’s some goal! Not surprisingly, the prospective tools have been likened to a time machine of sorts. Kenton McHenry, one of the project’s leaders, reminds us that the world’s first web browser, Mosaic, was also developed at NCSA; his team hopes to leave a similarly significant legacy.

Cynthia Murrell, November 10, 2014

Sponsored by, developer of Augmentext

See Also:

Big Data @ Phi Beta Iota