Saturday, March 2, 2019

Big Data – What is it? (a new approach)

What is Big Data?
The first step to understand Big Data is to agree upon what it is not: Big Data is not a lot of ordinary data. As Tera-bites change to Peta-bites, they do not suddenly transform into Big Data. A corollary is that as computers become more powerful, they do not help us solve the problem of Big Data. Quite the opposite in fact: They generate more diverse and complex data exponentially. Computer power and ubiquitousness is the cause of Big Data, not its solution.
So what is it, then?
The usual definition is the following: data that is too large, complex and dynamic for any conventional data system to capture, store, manage and analyze. It is also commonly defined by the 3V: Volume, Variety and Velocity. But experience shows that this definition comes short to really understanding what Big Data is so here's a better one:
Big Data is an emergent phenomenon created by the convergence and combination of different types of data within a complex system which generates more data and meta-data than the input. It is therefore dependent on a system, real or virtual, which non only collects and processes data but also creates new data.
If the data sources are multiple and the system is very complex, Big Data can emerge from a relatively small amount of data. Conversely, a very large amount of unprocessed data will remain just “data” with none of the characteristics of Big Data.
Where do we find Big Data?
Big Data has always been around us as the best Big Data processing machine we know is the human brain. It can accept a huge amount of information from inside and outside our body and make it understandable in a simple and accessible way to our consciousness. This is the ultimate Big Data system.
Of course, if we put the goal so high everything else we create, look simple and to some extent, it is.
In marketing, Big Data is a relatively new phenomenon mostly related to the Internet and the very large amount of information that Google, Facebook, Tweeter and the likes generate, compounded by the I-phone.
But some of our clients such as “smart home” system providers will soon create an even larger amount of Big Data thanks to the Internet of Things (IoT). This data will need to be organized and conveyed, from the fridge reading the RFIDs of objects to inform the “home manager” which will send an order automatically over the Internet while informing us for confirmation or authorization. These systems will soon need to replicate most of the simple and complex functions we perform daily without giving them second thoughts. Artificial intelligence will develop concomitantly.
But conversely, why is the emergence of Big Data so slow? Since we understand the concept, surely applications should follow in droves.
Visualizing and understanding Big Data
This is a difficult question to answer, but it seems that one of the main obstacle for a more widespread use of Big Data is the lack of visualization tools and therefore our inability to grasp complex answers to simple questions.
To take the example of marketing, we now have access to a huge number of disparate data but mostly struggle to make sense out of it beyond what is already proven and well known. The concepts of clusters, social groups and one to one marketing are progressing slowly but mostly in a haphazard way based on trial and error. The main difference compared to 20 years ago is that the cycles have accelerated tremendously and we now learn faster than ever with instant testing and feedbacks.
But for most companies, the main tools to display and analyse data remains Excel or related systems such as Tableau and different types of Dashboards.
Some companies use our GIS (Geographic Information Systems) to analyze client data but very few go beyond that simply because the tools do not exist yet. GIS systems are among the very few which allow company to visualize different layers of data over a map, but not all client data can be Geo-localized and almost all the non geographic systems are far less advanced to display complex data.
We are currently working on this subject and I will come back to it with innovative solutions in future posts.
The problem of Privacy and security
Eventually, as most of our lives either become digital or to put it another way, as our digital footprint becomes larger and larger, we will need to improve both security and privacy to insure that we can trust these systems. We are currently doing far too little concerning privacy and this should be a major concern as many innovative applications will lag or even may not be developed at all if we do not develop appropriate answers.
Likewise, although the issues concerning security are well known, we are far too “cavalier” currently in protecting Big Data, especially under the form of Meta-data. Eventually, most of the data transiting over the Internet will be through applications and will therefore be “meta-data” with little or no data contents. Already, a link to a YouTube video on Facebook generates nothing but meta-data and the video itself. The Internet of Things will take this concept to a new level where information, temperature outside for example, will generate a cascade of automatic and inferred consequences far beyond the initial data input.
Privacy will need to be defined far more precisely and the correct level of security must apply. Personal data, for example, need to be protected by biometrics but biometrics itself needs a higher level of security since it is quite difficult to “recover” from a breach of biometric data. Less important data will need less drastic “passwords” down to our public “access” points which must be easy to interface with.
In this respect, we must once again learn from natural systems which are more than 2 billion years ahead as far as “technology” and usability are concerned. The DNA, the body's way to store information shows us that it is not possible for a system to be both flexible and therefore able to evolve and closed. Nature could not get rid of virus or hacking and neither will we. But the DNA shows us how to keep the damage local and confined most of the time. We will need to replicate this knowledge if we want to succeed. Large, centralized databased will be accessed and compromised, it is just a mater of time. We will therefore need to learn how to build distributed systems which communicate just the right amount of information to the right processing level.
This is in fact a key factor in the development of big data. Sending all the data over the Internet to be processed in data centers and sending “orders” back for execution is a recipe for failure, or more likely hacking. One of the challenge of big data is therefore to understand: what data must be processed locally, what data must be sent, to what system, at what level with what level of privacy and what level of security, to be stored (or not) and where. This complexity alone could be another definition of Big Data.
The future of Big Data
The I-phone is already unleashing a torrent of Big Data which is now covering the globe from the richest to the poorest countries. The Internet of Things which is just being born as we speak has the potential to lead us much farther down the road of Big Data. Within a few years our environment will first become “aware” of our presence then able to communicate pro-actively with us.
This can easily become a nightmare of unwelcome marketing intrusion and government surveillance if we do nothing about it. Conversely, we can significantly increase the complexity of our lives in the background while simplifying the interface with everything around us and therefore our well being. The most likely outcome is of course a mixture of both worlds. But let's hope that we learn quickly from our mistakes and find the right “mix”, knowing that there is no correct answer or optimum balance between all the factors.
The Big Data technologies we are currently working on sound and look very much like science fiction although they are only a few short years away from applications. The road ahead is still uncharted but a whole new continent of data is waiting for adventurous minds to explore and “map”.

No comments:

Post a Comment

Why am I afraid of AI and why should you too?

  About 10 years ago, I started working with early AI models. The first thing we started doing was not AI at all. We were calling it: The Ra...