Stephen E. Arnold: What Is a Big Data Lake?

IO Impotency
Stephen E. Arnold
Stephen E. Arnold

Chasing Non Swimmers from the Data Lake

If you are one of the Big Data believers, you will find “Clearing Up Muddied Waters in the ‘Data Lakes’” a reminder about the plasticity of concepts and their connotations. The write up addresses a clever phrase used to describe a storage pool into which

You store raw data at its most granular level so that you can perform any ad-hoc aggregation at any time. The classic data warehouse and data mart approaches do not support this.

The write up points out that the original notion of a data lake has been prodded, stretched, and pulled. Not surprisingly, after the verbal chiropractic, data lake is just not its old self.

Who are the perpetrators of this conceptual improvement? A “real” journalist and—no big surprise—several Big Data experts laboring away at a mid tier consulting firm.

So what? The coiner of the phrase points me and other readers to the original write up about data lakes here. Worth revisiting? Will the “real” journalist or the mid tier consultants likely to read the source document? I would guess not.

Stephen E Arnold, November 22, 2014

Phi Beta Iota: CTOs did not get to be CTOs by actually understanding the guts of Information Technology — they are decades from actually having programmed code and they have little understanding of the totality of the digital and analog worlds, of holistic analytics, true cost economics (for which databases generally do not exist), and a limited understanding of open source everything engineering and why that matters in terms of affordability, interoperability at the code and datum levels, and of course scale — crossing all boundaries and borders. The myths and malpractice that most CTOs accept about Big Data are quite astonishing. Below is the key quote from Jamie Dixon's 2010 blog post that created the Data Lake concept (which is as far removed from Oracle or any other  structured “big data” repository as one can get):

If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

See Also:

Analytics @ Phi Beta Iota

Big Data @ Phi Beta Iota

Open Source @ Phi Beta Iota

True Cost @ Phi Beta Iota

 

Yoda: CIOs Need to Be Digital Leaders?

IO Impotency
Got Crowd? BE the Force!
Got Crowd? BE the Force!

Big leap!

CIOs Need to Become Digital Leaders

Most CIOs know they need to become digital leaders, but don’t feel that they are there yet, according to Gartner’s study of 2,300 CIOs. Gartner refers to this shift toward digitalization as the “third era” of enterprise IT.

Phi Beta Iota: Worth a complete read. One quarter of “IT” budget outside of CIO control (probably more since they are not counting manufacturing processes that should be migrating to open source everything engineering). Open source is all the rage but not one — anywhere — has conceptualized the mix of open geo, open data, and open  tools that are needed to unleash the full potential of all of us.

See Also:

Analytics     .   Big Data    .   OSE Spanish   .   Open Source     .   True Cost

Stephen E. Arnold: Court Rules for Google on Corrupt Search — Precision, Recall, Relevance are “Irrelevant” — Paid Outcomes are “Legal”

Corruption, Idiocy, IO Impotency, Law Enforcement
Stephen E. Arnold
Stephen E. Arnold

Google Free and Clear to Rank Search Results Any Way It Wants

Well, bad news for those who want to force Google to modify the order in which search results appear. If I understand “Court Rules Google Can Arrange Search Results Any Way It Wants,” relevance is what Google wants. Period.

Continue reading “Stephen E. Arnold: Court Rules for Google on Corrupt Search — Precision, Recall, Relevance are “Irrelevant” — Paid Outcomes are “Legal””

Stephen E. Arnold: Amazon (and CIA) Breaking Oracle — the Surveillance State Goes Open Source

Data, IO Impotency
Stephen E. Arnold
Stephen E. Arnold

Amazon and Oracle: The Love Affair Ends

I recall turning in a report about Amazon’s use of Oracle as its core database. The client, a bank type operation, was delighted that zippy Amazon had the common sense to use a name brand database. For the bank types, recognizable names used to be indicators of wise technological decisions.

I read “Amazon: DROP DATABASE Oracle; INSERT Our New Fast Cheap MySQL Clone.” Assume the write up is spot on, Amazon and Oracle have fallen out of love or at least beefy payments from Amazon for the sort of old Oracle data management system. This comment becomes quite interesting to me:

Continue reading “Stephen E. Arnold: Amazon (and CIA) Breaking Oracle — the Surveillance State Goes Open Source”

Stephen E. Arnold: Enterprise Search Fails to Meet Legal Standards

IO Impotency
Stephen E. Arnold
Stephen E. Arnold

Information Governance Standards Group Suggests Caution in Approaching eDiscovery

The records management group ARMA International weighs in about search with an article in their Information Management magazine: “Enterprise Search vs E-Discovery Search: Same or Different?” The short answer, not surprisingly, is “different.” Writer Kamal Shah explains:

“To date, most enterprises have used the same search technologies for both tasks. However, a recent trend among large and small enterprises suggests that a significant divergence is occurring between enterprise searches and e-discovery searches. Both start by entering a search term in a search box, but that’s where the similarities end. The business requirements are different and, as a result, each needs different capabilities.”

The article goes on to elaborate on the reasons traditional enterprise search is not sufficient for most eDiscovery needs.

Continue reading “Stephen E. Arnold: Enterprise Search Fails to Meet Legal Standards”

Stephen E. Arnold: Is LinkedIn Replacing Unethical Expensive Ineffective Media and Academic Archives??

IO Impotency
Stephen E. Arnold
Stephen E. Arnold

Why Traditional Print and Database Publishers Are in Even More Trouble Than Thought

EXTRACT

This means that LinkedIn may benefit from “real” newspapers and magazines charging for inclusions. As LinkedIn’s audience grows, it—not the publishers nor the intermediating database folks—will get the big paydays necessary to live high on the hog.

Read full post.

noble gold