Truth Finding on the Deep Web: Is the Problem Solved?
Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava.
In VLDB, 2013. [PDF][Report]
Big data integration & linkages. Xin Luna Dong and Divesh Srivastava. Tutorial in Proceedings of the VLDB Endowment (PVLDB), 6(11): 1188-1189, 2013. [ publisher site | talk slides(PPTX) ]
Bryan Alexander takes off from Jane Hart's personal knowledge management routine to describe his own method of handling information overload, which he calls “information wrangling.” He works through channels and sources daily, reflects, and shares. Alexander details each of these processes in his blog.
Jane Hart describes her daily personal knowledge management (or PKM) routine. It’s an inspiring yet practical workflow for information curation. Or information wrangling, as I like to call it:
I like this framework for various purposes, starting with how it describes a way of handling information overload. It’s also a good model for helping people transition from an analog (print, in-person) set of habits to one including the digital world.
Inspired by this, I’d like to describe my own.
Every day I work through a series of channels and sources (Hart’s “Seek” category), reflect on what I find (“Sense”), then share those reflections (“Share”). I’ll break it down into three aspects, but keep in mind that there’s a lot of back-and-forth across them.
With the Internet it is easier than ever to plagiarize by either stealing or buying someone’s work. The Internet is a double edge sword, however, because there are tools available to people to check a work for veracity and originality. Unless you are a teacher or in some form of academia, you might not be aware of the Web sites that are plagiarism checkers. Through our own research, we have complied a list:
Plagiarisma—Available in different languages with other useful features and downloadable apps.
Copyscape—Has the unique feature, Copysentry to allow users to monitor plagiarism on the Web.
Plagium—Like many of the other checkers, but has a beta version to check social media.
There is an expression that says, “there are no original ideas anymore.” New ideas spring up all the time, but it takes a lot more work to create something new than it does to make something from scratch. Plagiarism does not benefit anyone, especially the stealer. Use the plagiarism tools to improve your work quality and come up with something new.
For Obama’s 2012 re-election campaign, his team broke down data silos and moved all the data to a cloud repository. The team built Narwhal, a shared data store interface for all of the campaigns’ application. Narwhal was dubbed “Obama’s White Whale,” because it is almost a mythical technology that federal agencies have been trying to develop for years. While Obama may be hanging out with Queequag and Ishmael, there is a more viable solution for the cloud says GCN’s article, “Big Metadata: 7 Ways To Leverage Your Data In the Cloud.”
Data silo migration may appear to be a daunting task, but it is not impossible to do. The article states:
If you want to fire up your neurons here at year-end, I recommend reading over the now annual release of “emerging ethical dilemmas and policy issues in science and technology” from the University of Notre Dame’s Reilly Center.
Mark P. Mills
(Full disclosure, I’m on the Center’s Advisory Board – and though I wish I could take credit for it, I had no input on the list.)
Even though the list from Notre Dame is more provocative than IBM’s, each and every technology has already been demonstrated or deployed. So while for the uninitiated some of the following may seem like science fiction, there is the old adage that “truth is stranger than fiction.” In fact, much of what’s on this list has inspired novels and movies. And the Reilly team has helpfully provided links to articles and resources to dig deeper into each domain’s state of affairs.
Following, the Reilly top 10 along with a sampling of their associated ethical questions posed.
My issue with any visualization, however brilliant, is that it assumes that it will engage consensus and action.
Missing is the recognition that even if we had a hard data visualization that one million people were to die tomorrow somewhere — even in New York — the level of engagement would be low. Who says so? Is it a scam? Is it spin? etc
There is a glass ceiling effect which is not addressed, irrespective of the nature of the crisis. Central Arica ciurrently offers an example. Increasingly, who cares, or why should I care? I walk pass beggars everyday. What is not evident from such visualization is what it is expected that anyone should do and why. My first take on this was:
using networks of international organizations, world problems, strategies, and values
Abstract: The paper reports briefly on the ongoing process of systematic information collection and web presentation by the UIA of networks of over 30,000 international organizations, 56,000 perceived world problems, 32,000 advocated action strategies, and some 3,000 values — resulting in a total of 800,000 hyperlinks. These different entities constitute an interesting focal sub-system of whatever is to be understood by an emerging global brain – for which the “problems” might be understood as “neuroses”, if not “tumours”.
But I do think that the capacity to do anything with big data is very limited. We used fancy software — Netmap, as used by security services — but so what.See fancy graphic screenshots in:
The article on eweek titled Google Translate Adds Support for More World Languages announces Google’s addition of nine languages to its service, making the total number 80 languages. These included several African languages spoken in Nigeria, Somalia and South Africa. There are motions in progress to add Mongolian, Nepali, Punjabi and Maori. The last was only made possible by New Zealanders, as the article explains: