Wednesday, January 08, 2014

Big Data and Antifragile Data Handlers

The problem:

A key problem of Big Data can be described as follows: In the old days, data was represented through highly structured, well-defined schemas. Nowadays, data come from many sources. Pre-defined schemas increasingly cannot represent meaning or capture value as well as they used to.

The solution:

Let's look at this through the lens of fragile and antifragile systems, in view of the concepts coined by Nassim Nicholas Taleb. A fragile data handler could be described as one that controls data modelling rigidly by dictating how data is received. An antifragile data handler, instead, thrives on the unpredictability of data input and the inherent risks. It assumes that failures to understand data are inevitable, but represent opportunities to make data handling more intelligent over time.

Our antifragile Big Data data handler should be like that. We want it to take advantage of all the free form data out there.

This leads us to the Big Question. What does it mean for data handler design? We have placed a kind of conceptual marker. Now we have to try and see the shape of the thing.

Taleb finds the bona fide antifragile process to be evolution. When we place the data system under stress, the weaker data units and handlers die. Survivors determine the next generation.

Our antifragile data handler accepts feedback loops at high speed and primarily makes sense of learning rather than of data. This is so because data is a static given that cannot be controlled. Learning can be controlled, but data can not.

As with evolution it is diversity that is the key. A single strategy is bound to fail in the long run. Data handlers should not rely on human intervention to decide how to adapt to changing data sources.

We could look at Neural Darwinism and other related fields for inspiration. One thing is for sure, traditional control-freak strategies will be only one of many. Its survival is not assured.

No comments: