Bottom line up front: a superb book and a truly great overview that is easily understandable to me except for the fraction of the book that is math.

Right away I like the structure of the book in relation to analytics. Use Amazon’s Inside the Book feature to see that structure. I also appreciate the clarity and integrity demonstrated by the author in touching on major obstacles to big data analytics, among which are past biases in policy and collection and the absence of critical values needed to test NEW hypotheses. The author is brutal in a low key manner (which is to say, very professional) in evaluating the different types of data streams and the problems with each of them. Getting the raw data is a challenge — cleaning that data is a greater challenge — making sense of swiss cheese data with a host of underlying intellectual cancers is the greatest challenge of all.

The graphics and tables are superb and the first thing that confirmed for me this is a very good book. Four tables in particular are worthy of mention in this quick overview:

Figure 1.1. Size of Data Sets Analyzed (95% in Terabyte or less size, only 5% above 101 Terabytes toward Over 100 Petabytes

Table 1.1 Example Analytics Applications across marketing, risk management, government, web, logistics, and other

Table 7.2 Commercial and Open Source Tools Used is Analytics in 2012-2013

Table 7.10 Data Quality Dimensions

What I do not see in this book is environmental or precautionary principle risk, true cost economics, or the challenges of dealing with big data sets that cannot be moved over our impoverished pipes (10-100MB) or analyzed as a whole beyond petabytes toward exosclae computing. This is “the” book for quickly understanding big data analytics focused exclusively on consumer behavior. I would love to see the author (or the excellent Wiley series) take on the missing pieces.

In a closing observation I would express concern about the lack of cyber-infrastructure in the USA — all the carrier installed fiber is on its last legs, the emergent dark (carrier neutral) fiber market place is on thin ice because most mangers have no idea how much better off they are going direct to dark fiber — and also the lack of analytic centers with a sufficiency of data storage and processing necessary to do petabyte-level calculations. More often than not “big data” is a term that is simply not understood by managers or even their senior IT folks. To make the point, a petabyte of data would take three years to send along one of today’s existing pipes — and could not be processed at all in most data centers. As me more toward the promise of exoscale computing, I consider it essential that our leaders across all mission areas learn what is really required in the way of investment, consistency, and integrity. Of course I would urge them to consider embracing the open source everything manifesto, since open source is the only form of information technology that is affordable, inter-operable, and scalable.

