Patrick Meier: Data Mining Wikipedia in Real Time for Disaster Response [or Any Current Event]

Crowd-Sourcing, Data, Geospatial, Governance, Innovation, P2P / Panarchy, Resilience
Patrick Meier
Patrick Meier

Data Mining Wikipedia in Real Time for Disaster Response

My colleague Fernando Diaz has continued working on an interesting Wikipedia project since he first discussed the idea with me last year. Since Wikipedia is increasingly used to crowdsource live reports on breaking news such as sudden-onset humanitarian crisis and disasters, why not mine these pages for structured information relevant to humanitarian response professionals?

In computing-speak, Sequential Update Summarization is a task that generates useful, new and timely sentence-length updates about a developing event such as a disaster. In contrast, Value Tracking tracks the value of important event-related attributes such as fatalities and financial impact. Fernando and his colleagues will be using both approaches to mine and analyze Wikipedia pages in real time. Other attributes worth tracking include injuries, number of displaced individuals, infrastructure damage and perhaps disease outbreaks. Pictures of the disaster uploaded to a given Wikipedia page may also be of interest to humanitarians, along with meta-data such as the number of edits made to a page per minute or hour and the number of unique editors.

Click on Image to Enlarge
Click on Image to Enlarge

Fernando and his colleagues have recently launched this tech challenge to apply these two advanced computing techniques to disaster response based on crowdsourced Wikipedia articles. The challenge is part of the Text Retrieval Conference (TREC), which is being held in Maryland this November. As part of this applied research and prototyping challenge, Fernando et al. plan to use the resulting summarization and value tracking from Wikipedia to verify related  crisis information shared on social media. Needless to say, I’m really excited about the potential. So Fernando and I are exploring ways to ensure that the results of this challenge are appropriately transferred to the humanitarian community. Stay tuned for updates. 

See also: Web App Tracks Breaking News Using Wikipedia Edits [Link]

Patrick Meier: Could Lonely Planet Render World Bank Projects More Transparent?

Access, Crowd-Sourcing, Geospatial, Innovation, Resilience
Patrick Meier
Patrick Meier

Could Lonely Planet Render World Bank Projects More Transparent?

That was the unexpected question that my World Bank colleague Johannes Kiess asked me the other day. I was immediately intrigued. So I did some preliminary research and offered to write up a blog post on the idea to solicit some early feedback. According to recent statistics, international tourist arrivals numbered over 1 billion in 2012 alone. Of this population, the demographic that Johannes is interested in comprises those intrepid and socially-conscious backpackers who travel beyond the capitals of developing countries. Perhaps the time is ripe for a new form of tourism: Tourism for Social Good.

There may be a real opportunity to engage a large crowd because travelers—and in particular the backpacker type—are smartphone savvy, have time on their hands, want to do something meaningful, are eager to get off the beaten track and explore new spaces where others do not typically trek. Johannes believes this approach could be used to map critical social infrastructure and/or to monitor development projects. Consider a simple smartphone app, perhaps integrated with existing travel guide apps or Tripadvisor. The app would ask travelers to record the quality of the roads they take (with the GPS of their smartphone) and provide feedback on the condition, e.g.,  bumpy, even, etc., every 50 miles or so.

They could be asked to find the nearest hospital and take a geotagged picture—a scavenger hunt for development (as Johannes calls it); Geocaching for Good? Note that governments often do not know exactly where schools, hospitals and roads are located. The app could automatically alert travelers of a nearby development project or road financed by the World Bank or other international donor. Travelers could be prompted to take (automatically geo-tagged) pictures that would then be forwarded to development organizations for subsequent visual analysis (which could easily be carried out using microtasking). Perhaps a very simple, 30-second, multiple-choice survey could even be presented to travelers who pass by certain donor-funded development projects. For quality control purposes, these pictures and surveys could easily be triangulated. Simple gamification features could also be added to the app; travelers could gain points for social good tourism—collect 100 points and get your next Lonely Planet guide for free? Perhaps if you’re the first person to record a road within the app, then it could be named after you (of course with a notation of the official name). Even Photosynth could be used to create panoramas of visual evidence.

Continue reading “Patrick Meier: Could Lonely Planet Render World Bank Projects More Transparent?”

Eagle: Will Crushing Student Loans and Worthless College Degrees Politicize the Millennial Generation?

Crowd-Sourcing, Education, Politics
300 Million Talons...
300 Million Talons…

Will Crushing Student Loans and Worthless College Degrees Politicize the Millennial Generation?   (May 31, 2013)

The existing social and financial order is crumbling because it is unsustainable on multiple levels. The central state is not the Millennials' friend, it is their oppressor.No generation of young people is ever politicized by hunger in distant lands or issues of the elderly. It's no rap on youth that self-interest defines what issues have the potential to radically transform their political consciousness; the transformative cause must reveal the system is broken for them and that it intends on sacrificing their generation to uphold the Status Quo.

The Millennial generation, also known as Gen-Y (Gen-Y comes after Gen-X), is generally defined as those born between 1982 and 2004.

The oldest Millennials were children during the first Iraq War in 1991 (Desert Storm) and just coming of age in 2001 (9/11 and the war in Afghanistan) and the start of the second Iraq War (2003).

Click on Image to Enlarge
Click on Image to Enlarge

The Millennials have entered adulthood in a era characterized by permanent low-intensity wars and central-bank/state managed financial bubbles–2001 to the present. In other words, the only experience they have is of centralized state mismanagement on a global scale.

The gross incompetence of the government and central bank–not to mention the endless power grabs by these centralized authorities–has not yet aroused a political consciousness that the system is irrevocably broken, not just for older generations but most especially for them.

Anecdotally, it appears the Millennial generation is still operating on the fantasy that all they need to do to get a secure, good-paying job and a happy life is go to college and enter the Status Quo machine of government/corporate America.

There are two fatal flaws in this fantasy: the $1+ trillion student loan industry and a transforming economy. The higher education industry in the U.S. operates as a central state-enabled and funded cartel, limiting supply while demand (based on the fantasy that a college degree has critical value) soars. This enables the cartel to keep raising prices even as the value of its product (a diploma) sinks to near-zero.

Read full post with more graphics and links.

Continue reading “Eagle: Will Crushing Student Loans and Worthless College Degrees Politicize the Millennial Generation?”

Jean Lievens: Global Sharing Day 2 June – the Mind-Shift Begins

Crowd-Sourcing, Culture, Economics/True Cost, P2P / Panarchy, Resilience
Jean Lievens
Jean Lievens

Related Articles

Patrick Meier: Analysis of Multimedia Shared on Twitter After Tornado — Instagram Rules

Crowd-Sourcing, Geospatial, P2P / Panarchy, Resilience, Software, Transparency
Patrick Meier
Patrick Meier

Analysis of Multimedia Shared on Twitter After Tornado

Humanitarian organizations and emergency management offices are increasingly interested in capturing multimedia content shared on social media during crises. Last year, the UN Office for the Coordination of Humanitarian Affairs (OCHA) activated the Digital Humanitarian Network (DHN) to identify and geotag pictures and videos shared on Twitter that captured the damage caused by Typhoon Pablo, for example. So I’ve been collaborating closely with my colleague Hemant Purohit to analyze the multimedia content shared by millions of tweets  posted after the Category 5 Tornado devastated the city of Moore, Oklahoma on May 20th. The results are shared below along with details of a project I am spearheading at QCRI to provide disaster responders with relevant multimedia content in real time during future disasters.

Click on Image to Enlarge
Click on Image to Enlarge

For this preliminary multimedia analysis, we focused on the first 48 hours after the Tornado and specifically on the following multimedia sources/types: Twitpic, Instagram, Flickr, JPGs, YouTube and Vimeo. JPGs refers to URLs shared on Twitter that include “.jpg”. Only ~1% of tweets posted during the 2-day period included URLs to multimedia content. We filtered out duplicate URLs to produce the following unique counts depicted above and listed below.

  • Twitpic = 784
  • Instagram = 11,822
  • Flickr = 33
  • JPGs = 347 
  • YouTube = 5,474
  • Vimeo = 88

Clearly, Instagram and Youtube are important sources of multimedia content during disasters. The graphs below (click to enlarge) depict the frequency of individual multimedia types by hour during the first 48 hours after the Tornado. Note that we were only able to collect about 2 million tweets during this period using the Twitter Streaming API but expect that millions more were posted, which is why access to the Twitter Firehose is important and why I’m a strong advocate of Big Data Philanthropy for Humanitarian Response.

Read full post with more graphs.

Patrick Meier: Crowdsourcing Syrian Crisis via Twitter API

Crowd-Sourcing, Innovation, Knowledge
Patrick Meier
Patrick Meier

Crowdsourcing Crisis Information from Syria: Twitter API vs Firehose

Over 400 million tweets are posted every day. But accessing 100% of these tweets (say for disaster response purposes) requires access to Twitter’s “Firehose”. The latter, however, can be prohibitively expensive and also requires serious infrastructure to manage. This explains why many (all?) of us in the Crisis Computing & Humanitarian Technology space use Twitter’s “Streaming API” instead. But how representative are tweets sampled through the API vis-a-vis overall activity on Twitter? This is important question is posed and answered in this new study, which used Syria as a case study.

The analysis focused on “Tweets collected in the region around Syria during the period from December 14, 2011 to January 10, 2012.” The first dataset was collected using Firehose access while the second was sampled from the API. The tag clouds above (click to enlarge) displays the most frequent top terms found in each dataset. The hashtags and geoboxes used for the data collection are listed in the table below.

Click on Image to Enlarge
Click on Image to Enlarge

. . . . .

In terms of social network analysis, the the authors were able to show that “50% to 60% of the top 100 key-players [can be identified] when creating the networks based on one day of Streaming API data.” Aggregating more days’ worth of data “can increase the accuracy substantially. For network level measures, first in-depth analysis revealed interesting correlation between network centralization indexes and the proportion of data covered by the Streaming API.”

Read full post with graphs and other links.

Continue reading “Patrick Meier: Crowdsourcing Syrian Crisis via Twitter API”