After the Category 5 Tornado in Oklahoma, map editors at Waze used the service to route drivers around the damage. While Uber increased their car service fares during Hurricane Sandy, they could have modified their App to encourage the shared use of Uber cars to fill unused seats. This would have taken some work, but AirBnB did modify their platform overnight to let over 1,400 kindhearted New Yorkers offer free housing to victims of the hurricane. SeeClick fix was used also to report over 800 issues in just 24 hours after Sandy made landfall. These included reports on the precise location of power outages, flooding, downed trees, downed electric lines, and other storm damage. Following the Boston Marathon Bombing, SeeClick fix was used to quickly find emergency housing for those affected by the tragedy.
My colleague Kalev Leetaru recently co-authored this comprehensive study on the various sources and accuracies of geographic information on Twitter. This is the first detailed study of its kind. The detailed analysis, which runs some 50-pages long, has important implications vis-a-vis the use of social media in emergency management and humanitarian response. Should you not have the time to analyze the comprehensive study, this blog post highlights the most important and relevant findings.
Crowds—rather than sole individuals—are increasingly bearing witness to disasters large and small. Instagram users, for example, snapped 800,000 #Sandy pictures during the hurricane last year. One way to make sense of this vast volume and velocity of multimedia content—Big Data—during disasters is with PhotoSynth, as blogged here. Another perhaps more sophisticated approach would be to use CrowdOptic, which automatically zeros in on the specific location that eyewitnesses are looking at when using their smartphones to take pictures or recording videos.
“Once a crowd’s point of focus is determined, any content generated by that point of focus is automatically authenticated, and a relative significance is assigned based on CrowdOptic’s focal data attributes […].” These include: (1) Number of Viewers; (2) Location of Focus; (3) Distance to Epicenter; (4) Cluster Timestamp, Duration; and (5) Cluster Creation, Dissipation Speed.” CrowdOptic can also be used on live streams and archival images & videos. Once a cluster is identified, the best images/videos pointing to this cluster are automatically selected.
My colleague Fernando Diaz has continued working on an interesting Wikipedia project since he first discussed the idea with me last year. Since Wikipedia is increasingly used to crowdsource live reports on breaking news such as sudden-onset humanitarian crisis and disasters, why not mine these pages for structured information relevant to humanitarian response professionals?
In computing-speak, Sequential Update Summarization is a task that generates useful, new and timely sentence-length updates about a developing event such as a disaster. In contrast, Value Tracking tracks the value of important event-related attributes such as fatalities and financial impact. Fernando and his colleagues will be using both approaches to mine and analyze Wikipedia pages in real time. Other attributes worth tracking include injuries, number of displaced individuals, infrastructure damage and perhaps disease outbreaks. Pictures of the disaster uploaded to a given Wikipedia page may also be of interest to humanitarians, along with meta-data such as the number of edits made to a page per minute or hour and the number of unique editors.
Fernando and his colleagues have recently launched this tech challenge to apply these two advanced computing techniques to disaster response based on crowdsourced Wikipedia articles. The challenge is part of the Text Retrieval Conference (TREC), which is being held in Maryland this November. As part of this applied research and prototyping challenge, Fernando et al. plan to use the resulting summarization and value tracking from Wikipedia to verify related crisis information shared on social media. Needless to say, I’m really excited about the potential. So Fernando and I are exploring ways to ensure that the results of this challenge are appropriately transferred to the humanitarian community. Stay tuned for updates.
See also: Web App Tracks Breaking News Using Wikipedia Edits [Link]
That was the unexpected question that my World Bank colleague Johannes Kiess asked me the other day. I was immediately intrigued. So I did some preliminary research and offered to write up a blog post on the idea to solicit some early feedback. According to recent statistics, international tourist arrivals numbered over 1 billion in 2012 alone. Of this population, the demographic that Johannes is interested in comprises those intrepid and socially-conscious backpackers who travel beyond the capitals of developing countries. Perhaps the time is ripe for a new form of tourism: Tourism for Social Good.
There may be a real opportunity to engage a large crowd because travelers—and in particular the backpacker type—are smartphone savvy, have time on their hands, want to do something meaningful, are eager to get off the beaten track and explore new spaces where others do not typically trek. Johannes believes this approach could be used to map critical social infrastructure and/or to monitor development projects. Consider a simple smartphone app, perhaps integrated with existing travel guide apps or Tripadvisor. The app would ask travelers to record the quality of the roads they take (with the GPS of their smartphone) and provide feedback on the condition, e.g., bumpy, even, etc., every 50 miles or so.
They could be asked to find the nearest hospital and take a geotagged picture—a scavenger hunt for development (as Johannes calls it); Geocaching for Good? Note that governments often do not know exactly where schools, hospitals and roads are located. The app could automatically alert travelers of a nearby development project or road financed by the World Bank or other international donor. Travelers could be prompted to take (automatically geo-tagged) pictures that would then be forwarded to development organizations for subsequent visual analysis (which could easily be carried out using microtasking). Perhaps a very simple, 30-second, multiple-choice survey could even be presented to travelers who pass by certain donor-funded development projects. For quality control purposes, these pictures and surveys could easily be triangulated. Simple gamification features could also be added to the app; travelers could gain points for social good tourism—collect 100 points and get your next Lonely Planet guide for free? Perhaps if you’re the first person to record a road within the app, then it could be named after you (of course with a notation of the official name). Even Photosynth could be used to create panoramas of visual evidence.
Humanitarian organizations and emergency management offices are increasingly interested in capturing multimedia content shared on social media during crises. Last year, the UN Office for the Coordination of Humanitarian Affairs (OCHA) activated the Digital Humanitarian Network (DHN) to identify and geotag pictures and videos shared on Twitter that captured the damage caused by Typhoon Pablo, for example. So I’ve been collaborating closely with my colleague Hemant Purohit to analyze the multimedia content shared by millions of tweets posted after the Category 5 Tornado devastated the city of Moore, Oklahoma on May 20th. The results are shared below along with details of a project I am spearheading at QCRI to provide disaster responders with relevant multimedia content in real time during future disasters.
For this preliminary multimedia analysis, we focused on the first 48 hours after the Tornado and specifically on the following multimedia sources/types: Twitpic, Instagram, Flickr, JPGs, YouTube and Vimeo. JPGs refers to URLs shared on Twitter that include “.jpg”. Only ~1% of tweets posted during the 2-day period included URLs to multimedia content. We filtered out duplicate URLs to produce the following unique counts depicted above and listed below.
- Twitpic = 784
- Instagram = 11,822
- Flickr = 33
- JPGs = 347
- YouTube = 5,474
- Vimeo = 88
Clearly, Instagram and Youtube are important sources of multimedia content during disasters. The graphs below (click to enlarge) depict the frequency of individual multimedia types by hour during the first 48 hours after the Tornado. Note that we were only able to collect about 2 million tweets during this period using the Twitter Streaming API but expect that millions more were posted, which is why access to the Twitter Firehose is important and why I’m a strong advocate of Big Data Philanthropy for Humanitarian Response.
Thanks to the excellent work carried out by my colleagues Hemant Purohit and Professor Amit Sheth, we were able to collect 2.7 million tweets posted in the aftermath of the Category 4 Tornado that devastated Moore, Oklahoma. Hemant, who recently spent half-a-year with us at QCRI, kindly took the lead on carrying out some preliminary analysis of the disaster data. He sampled 2.1 million tweets posted during the first 48 hours for the analysis below. Read full post.
FACT: Over half-a-million pictures were shared on Instagram and more than 20 million tweets posted during Hurricane Sandy. The year before, over 100,000 tweets per minute were posted following the Japan Earthquake and Tsunami. Disaster-affected communities are now more likely than ever to be on social media, which dramatically multiplies the amount of user-generated crisis information posted during disasters. Welcome to Big Data—Big Crisis Data.
Humanitarian organizations and emergency management responders are completely unprepared to deal with this volume and velocity of crisis information. Why is this a problem? Because social media can save lives. Recent empirical studies have shown that an important percentage of social media reports include valuable, informative & actionable content for disaster response. Looking for those reports, however, is like searching for needles in a haystack. Finding the most urgent tweets in an information stack of over 20 million tweets (in real time) is indeed a major challenge. Read full post.
As part of QCRI’s Artificial Intelligence for Monitoring Elections (AIME) project, I liaised with Kaggle to work with a top notch Data Scientist to carry out a proof of concept study. As I’ve blogged in the past, crowdsourced election monitoring projects are starting to generate “Big Data” which cannot be managed or analyzed manually in real-time. Using the crowdsourced election reporting data recently collected by Uchaguzi during Kenya’s elections, we therefore set out to assess whether one could use machine learning to automatically tag user-generated reports according to topic, such as election-violence. The purpose of this post is to share the preliminary results from this innovative study, which we believe is the first of it’s kind.
My colleague Hemant Purohit at QCRI has been working with us on automatically extracting needs and offers of help posted on Twitter during disasters. When the 2-mile wide, Category 4 Tornado struck Moore, Oklahoma, he immediately began to collect relevant tweets about the Tornado’s impact and applied the algorithms he developed at QCRI to extract needs and offers of help.