Automatically Extracting Disaster-Relevant Information from Social Media
My team and I at QCRI have just had this paper (PDF) accepted at the World Wide Web (WWW 2013) conference in Rio next month. The paper relates directly to our Artificial Intelligence for Disaster Response (AIDR) project. One of our main missions at QCRI is to develop open source and freely available next generation humanitarian technologies to better manage Big (Crisis) Data. Over 20 million tweets and half-a-million Instagram pictures were posted during Hurricane Sandy, for example. In Japan, more 2,000 tweets were posted every second the day after the devastating earthquake and Tsunami struck the Eastern Coast. Recent empirical studies have shown that an important percentage of tweets posted during disaster are informative and even actionable. The challenge before is how to find those proverbial needles in the haystack and to do so in as close to real-time as possible.
So we analyzed disaster tweets posted during Hurricane Sandy (2012) and the Joplin Tornado (2011). We demonstrate that disaster-relevant information can be automatically extracted from these datasets. The results indicate that 40% to 80% of tweets that contain disaster-related information can be automatically detected. We also demonstrate that we can correctly identify the type of disaster information 80% to 90% of the time. Because these classifiers are developed using machine learning, they get more accurate with more data. This explains why we are building AIDR. Our aim is not to replace human involvement and oversight but to significantly lessen the load on humans.