Filtering—or the lack thereof—presented the single biggest challenge when we tested MicroMappers last week in response to the Pakistan Earthquake. As my colleague Clay Shirky notes, the challenge with “Big Data” is not information overload but rather filter failure. We need to make damned sure that we don’t experience filter failure again in future deployments. To ensure this, I’ve decided to launch a stand-alone and fully interoperable platform called MicroFilters. My colleague Andrew Ilyas will lead the technical development of the platform with support from Ji Lucas. Our plan is to launch the first version of MicroFilters before the CrisisMappers conference (ICCM 2013) in November.
A web-based solution, MicroFilters will allow users to upload their own Twitter data for automatic filtering purposes. Users will have the option of uploading this data using three different formats: text, CSV and JSON. Once uploaded, users can elect to perform one or more automatic filtering tasks from this menu of options:
[ ] Filter out retweets
[ ] Filter for unique tweets
[ ] Filter tweets by language [English | Other | All]
[ ] Filter for unique image links posted in tweets [Small | Medium | Large | All]
[ ] Filter for unique video links posted in tweets [Short | Medium | Long | All]
[ ] Filter for unique image links in news articles posted in tweets [S | M | L | All]
[ ] Filter for unique video links in news articles posted in tweets [S | M | L | All]
Note that “unique image and video links” refer to the long URLs not shortened URLs like bit.ly. After selecting the desired filtering option(s), the user simply clicks on the “Filter” button. Once the filtering is completed (a countdown clock is displayed to inform the user of the expected processing time), MicroFilters provides the user with a download link for the filtered results. The link remains live for 10 minutes after which the data is automatically deleted. If a CSV file was uploaded for filtering, the file format for download is also in CSV format; likewise for text and JSON files. Note that filtered tweets will appear in reverse chronological order (assuming time-stamp data was included in the uploaded file) when downloaded. The resulting file of filtered tweets can then be uploaded to MicroMappers within seconds.
In sum, MicroFilters will be invaluable for future deployments of MicroMappers. Solving the “filter failure” problem will enable digital humanitarians to process far more relevant data and in a more timely manner. Since MicroFilters will be a standalone platform, anyone else will also have access to these free and automatic filtering services. In the meantime, however, we very much welcome feedback, suggestions and offers of help, thank you!