Kevjn Lim is an independent writer and Middle East foreign policy analyst at Open Briefing: The Civil Society Intelligence Agency. In the latter half of 2013, he was Turkey representative for the Syria Needs Analysis Project (SNAP), covering the Syrian crisis in the northern governorates and Turkey. From 2007-2011, he served as delegate with the International Committee of the Red Cross (ICRC) in the Palestinian Territories, Sudan’s Darfur region, Iraq, Gaddhafi’s Libya and Afghanistan. He also taught modern languages (Hebrew, Arabic, French and Spanish) at both Trinity and Queens’ Colleges, University of Melbourne (2004-2006), and served as intelligence officer with the Singapore Armed Forces (2001-2004). Kevjn holds a BA and an Honours Degree (First Class) in political science and Jewish-Islamic studies from the University of Melbourne. He is currently doing postgraduate work in strategic studies. Among other languages, he is fluent in Arabic, Hebrew and Persian, as well as some Turkish, and is currently based in the Middle East.
I am familiar with your critique of the existing collection and analysis cycle, but I didn't know that even at this stage the US intelligence community doesn't capitalize on big data per se, or they don't do it in any efficient manner (how about NSA's PRISM? I know that some federal organizations are using software like Geofeedia to eavesdrop on socmint traffic).
STEELE: NSA appears to process less than 1% of what it collects — in high priority target domains, perhaps 5% on focused elements. This is consistent with all other “big data” collections. While NSA is very good at precision interception for specific purposes (e.g. supporting insider trading to create off-budget disposable income) it is largely worthless when it comes to global situational awareness, anomaly detection, real time warning, and pattern analysis across all mission areas. My views are based on my own public reading, not on classified sources and methods. The US “system” is designed to throw money at technology for collection, and is not held accountable for failing to process what it collects.
I suppose then, my interest would be in understanding how big data might fit effectively into the current intelligence structure, for strategic level intelligence, in terms of:
1) methodology (I argue for big data's use in generating hypotheses, and then refuting them – not corroborating)
STEELE: Certainly I agree that big data should be used to detect patterns and anomalies that can then be investigated more rigorously with applied human intelligence and expertise. A MAJOR problem we have today with the crappy analytic tools that we have been given is that the data goes into the software never to be seen again, and we lose the “picture” that we used to have when we were plotting data on a map with acetate. Big data today is divorced from reality — it is based on old collection models, old mind-sets, and very narrow incomplete objectives that are generally not supported by the existing information technology infrastructure (rotten feeds and speeds, no residual real time processing power).
In 1994 I was invited by the National Research Council (NRC) to comment on the US Army's multi-billion dollar future communications architecture. The title of my briefing to them was “DATA MINING: Don't Buy or Build Your Shovel Until You Know What You're Digging Into.”
The key point here is that “big data” as it exists today suffers from multiple sucking chest wounds, among which are a) indiscriminate collection in the digital arena with generally no collection in the analog and human arena; b) absence of analytic model(s) to focus collection on what matters and to identify gaps where gap-based iterative collection is needed; c) complete absence of geospatial attributes for all big data (less consumer behavior that is tagged by store location and zip code); and d) absence of a holistic analytic global game that allows all information in all languages — and all human minds engaged with that information — to operate at exoscale.
2) its place in the intelligence cycle or process
STEELE: The old intelligence cycle is retarded — a hold-over from the Weberian bureaucracy era of stopepipes and linear development. In South Africa in 1997 I listened to the director of the National Intelligence Agency talk about a DNA spiral, and many years later Mike Hayden, then director of NSA, realized that we have to make sense of data at the point of ingestion, not just at the finished intelligence production level. His chart, as radically improved by Detective Constable Steve Edwards of Scotland Yard, and myself, can be seen here to the side, followed by my concept for a new intelligence cycle that is pervasive, constant, and self-referencing (reflexive). The signal problem we have is that we do not collect everything on everything (e.g. poverty, infectious disease, environmental degradation) and we do not process anything on anything. Here below is my model for 21st Century intelligence (decision-support), see also the two chapters and two posts below the graphic.
3) how it should fit with/complement the subject matter expert (and even game theory),
Subject Matter Experts (SME) are generally not expert, and the mandarins of various organizations always make the mistake of having a few favorites and taking them at face value. Unless you are harvesting the knowledge of all SMEs and locational observers that are not SME but have local knowledge, you are not getting the whole picture.
It is of course insane to try to understand any topic using only one language or even a few. My standard today is 33 languages, it was never less than 29. For any given country, one must master both the published and the unpublished human understanding in both indigenous and adjacent country languages, this is the minimum. Add to that great power and criminal network languages (i.e. all stakeholders) and you have a foundation for actually learning in a holistic manner.
I particularly like the work by David Weinberger in this regard, see for instance his two books as I have reviewed them.
I continue to be stunned by the fact that neither the secret world nor the open world (academics, corporations, NGOs) do citation analytics — nor do they make full use of information brokers, private investigators, and graduate students in the nearest relevant academic departments. At the same time the various academic disciplines and the various industries and the various government departments are all isolated from one another. No one brings it all together, in part because most so-called professional intelligence managers have no idea what they are missing.
Here is one graphic of how badly fragmented the sciences are — the situation is much worse in the social sciences especially when cultural, ethnic, historical, and linguistic nuances are added.
Game theory in my lay view is a waste of time and totally wrong. I completely agree with Elinor Ostrom when she trashes game theory in her book Governing the Commons. In my experience most game theory is half-assed, and most games are childish empty self-fulfilling daydreams. Serious Games have come a long way but they are isolated from one another and lack inter-operability standards. Medard Gabel, co-creator with Buckminster Fuller of the analog World Game, created at my expense a budget and plan for creating a digital EarthGame(TM) that would properly factor in all threats, all policies, all costs, all relationships, and be infinitely scalable and of course comprehensive across all mission areas and all locations. That is where we need to be, rooted in ground truth with sparse matrix of all forms of data in all languages, all marked up for a geospatial foundation that can also be viewed across time.
4) and how it can respond to strategic intelligence requirements (war, peace, instability etc),
In 1976 I did the original model on predicting revolution across political-military, socio-economic, ideo-cultural, techno-demographic, and natural-geographic domains. While humble, the model has not been equaled to this day. The fact is that as long as we do not hold governments, banks, corporations, and NGOs accountable for outcomes, we will continue to privatize profit and externalize true costs to future generations. There is no lack of intelligence — understanding — of what is needed, what is lacking is integrity. Here is the matrix, with revolutionary conditions in the USA (and most countries in the West or led by dictators) shown in red.
For example, it is clearly documented by the United Nations in its 2004 report, A More Secure World: Our Shared Responsibility, that the ten high-level threats to humanity that need to be addressed in order to create a prosperous world at peace are these, in this order:
02 Infectious Disease
03 Environmental Degradation
04 Inter-State Conflict
05 Civil War
07 Other Atrocities
10 Transnational Crime
As Medard Gabel's graphic shown to the side makes clear, for one third of what we spend on war, we could create a prosperous world at peace. The problem is that war and pestilence are profitable for the 1% and create concentrated wealth — peace is vastly more profitable for the 99% and creates vastly more wealth, but that wealth is not concentrated. Hence, our signal challenge is what some call the pedagogy of freedom — we have to mobilize the masses so they both understand the details, and can act as a whole in their best interests, as a community preserving the commons. Virtually everything about how we do secret intelligence today is wrong, because it is based on the wrong assumptions and the wrong objectives. I address this in my most two recent chapters.
What this all boils down to are the words integrity and legitimacy. Absent integrity, no organization can be trusted to get it right. Absent legtimacy, no government or organization can expect to be sustainble, inevitably nature and the public bat last, and the government will fall. I have reviewed many books along these lines, see the lists of lists at Book Lists & Reviews.
In terms of big data, the three greatest errors that I can see are:
01 No holistic analytic model and no commitment to true cost big data — the later is huge. If you do not understand the role of water in detail — for example, in the Middle East Israel is stealing all the water from the Jordanian aquifer with long underground pipes that cross the subterranean border with impunity — you cannot truly appreciate the looming catastrophe for Lebanon, Jordan, and others.
02 No appreciation of the fact that any model and any data ecology must include all eight of the information tribes as collectors, producers, and consumers of SHARED big data
03 No appreciation of the fact that most monitoring and most help desk functions should be totally transparent and equally accessible by all eight tribes as well as the public, which is a phenomenal gap filler. Without an engaged educated public, big data is a phantom concept.
5) How, in tandem with greater emphasis on OSINT, it militates for the shift towards an Open Society.
STEELE: Big Data right now is a variation of Silicon Valley snake oil, and/or the cult of secrecy,
“if you only knew what I know.” It is another form of information asymmetry, in which the ignorant lie to the ignorant, and decisions are made on the basis of fluff and bias, not on the basis of facts. OSINT has been totally corrupted in the past quarter century since I started this fight in 1988. Instead of SMEs and local knowledge on demand, we have US citizens with clearances, “butts in seats,” and hundreds of millions if not billions wasted on corporate overhead. People are not being held accountable for ethical evidence-based decision-support. Lies — and the failure of due diligence — kill our comrades in arms and dishonor the public we are supposed to be serving.
Open Society demands an ecology of opens such as I addressed in my recent keynote to the Liberation Technology NYC offering during Internet Week in NYC. As long as Big Data remains a very expensive proposition rooted in flawed premises and dishonored in relation to data analytics, Big Data is simply another arrow in the quiver of the 1%, and hence toxic to the public. IF, HOWEVER, the public can begin to merge the concepts of Neighborhood Watch, holistic analytics, outreach, true cost economics, and all the other possibilities, then we are rapidly approaching the day when the public an put an entire corporation out of business overnight — Microsoft Office and Coca Cola come to mind, along with Wal-Mart.
6) Big data is meant to increase analysis time. But will it encumber it instead, because of analytical incompentence (the US navy collects 200 terabytes every 48 hours – but what percentage of all that can it interpret and analyze)?
STEELE: Less than 1% of the data is analyzed. This has been documented across the board, not just in the secret world. The pipes are not designed for big data, nor will they ever be as long as we fail to treat dark fiber as a public good. I would also emphasize that the human brain is a million times more power efficient than any computer, and that after a quarter century and a trillion dollars, NSA computers are still very disappointing.
It is also a fact that no one — IBM, Microsoft, Oracle, and Google in particular — have done anything truly worthwhile in the all-source analytic domain. Some of us pioneered the generic requirements in the 1980's and we still do not have the fundamentals in place — we don't have the data collection, we don't have the geospatial tagging, we don't have the back office or desk top analytic tool-kits.
I make additional comments in my review of Analytics in a Big Data World: The Essential Guide to Data Science and its Applications; the views of myself and others are also offered below. Among several gems in this book are the following link to polls and tables, http://www.kdnuggets.com/polls/2013/, within which the following are particularly noteworthy:
We've known all this since 1985-1989. In 1992 Andy Shepard pointed out that analysts at CIA were spending 40% of their time finding data and 40% of their time responding to editing demands from across a management spectrum that ranged from political to timid to ignorant. Only 20% of the analysts' time was spent in thinking and crafting the analytic report.
Hopefully other governments do not make the mistake of hiring children and military retirees as their analytic cadre — “hire to payroll” is code for get them cheap and assume they will be successful because they have access to secrets. Nothing could be further from the truth.
Big data does not exist in the secret world or in the US Government — if it did the Office of Management and Budget (OMB) would be easily identifying the 50% of the government that is waste. Data mining does work — my colleague Dr. Bert Little routinely provides a 20 to 1 return on investment — $4M will identify and eliminate $80 million in crop insurance fraud — and others have identified tens of billions of dollars in import-export tax fraud ($100 pencils imported from China, $300 rocket engines exported to Brazil) and so on. The fact is that there is no constituency in the USA for public intelligence (decision-support) in the public interest so no one does this.
This leads to one final thought: I believe we need to move toward two levels of big data analytics. At one level, the analyst should be able to access and exploit all data for their particular small pond, be it migratory fish or pirates or whatever. At another level, the level of the EarthGame or Global Game, we need to be able to do meta big data, with algoriths showing that when the big data for a small pond changes by X, this will impact on another pond by Y. Water acidity for example, on fish fertility. Right now we are children stumbling around in the dark. Creating a Smart Nation demands the mature integration and nurturing of education, intelligence, and research in a holistic manner that is rooted in absolute integrity.
If there is one area where I have become ruthless in the past ten years, that area is counterintelligence. Traitors, the corrupt, and the persistently stupid need to be eradicated from government operations. IMHO.