What is Big Data?
The first step to understand Big Data is to agree upon what it is
not: Big Data is not a lot of ordinary data. As Tera-bites change to
Peta-bites, they do not suddenly transform into Big Data. A corollary is
that as computers become more powerful, they do not help us solve the
problem of Big Data. Quite the opposite in fact: They generate more
diverse and complex data exponentially. Computer power and
ubiquitousness is the cause of Big Data, not its solution.
So what is it, then?
The usual definition is the following: data that
is too large, complex and dynamic for any conventional data system to
capture, store, manage and analyze. It is also commonly defined by the
3V: Volume, Variety and Velocity. But experience shows that this
definition comes short to really understanding what Big Data is so
here's a better one:
Big Data is an emergent phenomenon created by the convergence and
combination of different types of data within a complex system which
generates more data and meta-data than the input. It is therefore dependent on a system, real or virtual, which non only collects and processes data but also creates new data.
If the data sources are multiple and the system is very complex, Big
Data can emerge from a relatively small amount of data. Conversely, a
very large amount of unprocessed data will remain just “data” with none
of the characteristics of Big Data.
Where do we find Big Data?
Big Data has always been around us as the best Big Data processing
machine we know is the human brain. It can accept a huge amount of
information from inside and outside our body and make it understandable
in a simple and accessible way to our consciousness. This is the
ultimate Big Data system.
Of course, if we put the goal so high everything else we create, look simple and to some extent, it is.
In marketing, Big Data is a relatively new phenomenon mostly related
to the Internet and the very large amount of information that Google,
Facebook, Tweeter and the likes generate, compounded by the I-phone.
But some of our clients such as “smart home” system providers will
soon create an even larger amount of Big Data thanks to the Internet of
Things (IoT). This data will need to be organized and conveyed, from the
fridge reading the RFIDs of objects to inform the “home manager” which
will send an order automatically over the Internet while informing us
for confirmation or authorization. These systems will soon need to
replicate most of the simple and complex functions we perform daily
without giving them second thoughts. Artificial intelligence will
develop concomitantly.
But conversely, why is the emergence of Big Data so slow? Since we
understand the concept, surely applications should follow in droves.
Visualizing and understanding Big Data
This is a difficult question to answer, but it seems that one of the
main obstacle for a more widespread use of Big Data is the lack of
visualization tools and therefore our inability to grasp complex answers
to simple questions.
To take the example of marketing, we now have access to a huge number
of disparate data but mostly struggle to make sense out of it beyond
what is already proven and well known. The concepts of clusters, social
groups and one to one marketing are progressing slowly but mostly in a
haphazard way based on trial and error. The main difference compared to
20 years ago is that the cycles have accelerated tremendously and we now
learn faster than ever with instant testing and feedbacks.
But for most companies, the main tools to display and analyse data
remains Excel or related systems such as Tableau and different types of
Dashboards.
Some companies use our GIS (Geographic Information Systems) to
analyze client data but very few go beyond that simply because the tools
do not exist yet. GIS systems are among the very few which allow
company to visualize different layers of data over a map, but not all
client data can be Geo-localized and almost all the non geographic
systems are far less advanced to display complex data.
We are currently working on this subject and I will come back to it with innovative solutions in future posts.
The problem of Privacy and security
Eventually, as most of our lives either become digital or to put it
another way, as our digital footprint becomes larger and larger, we will
need to improve both security and privacy to insure that we can trust
these systems. We are currently doing far too little concerning privacy
and this should be a major concern as many innovative applications will
lag or even may not be developed at all if we do not develop appropriate
answers.
Likewise, although the issues concerning security are well known, we
are far too “cavalier” currently in protecting Big Data, especially
under the form of Meta-data. Eventually, most of the data transiting
over the Internet will be through applications and will therefore be
“meta-data” with little or no data contents. Already, a link to a
YouTube video on Facebook generates nothing but meta-data and the video
itself. The Internet of Things will take this concept to a new level
where information, temperature outside for example, will generate a
cascade of automatic and inferred consequences far beyond the initial data input.
Privacy will need to be defined far more precisely and the correct
level of security must apply. Personal data, for example, need to be
protected by biometrics but biometrics itself needs a higher level of
security since it is quite difficult to “recover” from a breach of
biometric data. Less important data will need less drastic “passwords”
down to our public “access” points which must be easy to interface with.
In this respect, we must once again learn from natural systems which
are more than 2 billion years ahead as far as “technology” and usability
are concerned. The DNA, the body's way to store information shows us
that it is not possible for a system to be both flexible and therefore
able to evolve and closed. Nature could not get rid of virus or hacking
and neither will we. But the DNA shows us how to keep the damage local
and confined most of the time. We will need to replicate this knowledge
if we want to succeed. Large, centralized databased will be accessed and
compromised, it is just a mater of time. We will therefore need to
learn how to build distributed systems which communicate just the right
amount of information to the right processing level.
This is in fact a key factor in the development of big data. Sending
all the data over the Internet to be processed in data centers and
sending “orders” back for execution is a recipe for failure, or more
likely hacking. One of the challenge of big data is therefore to
understand: what data must be processed locally, what data must be sent,
to what system, at what level with what level of privacy and what level
of security, to be stored (or not) and where. This complexity alone
could be another definition of Big Data.
The future of Big Data
The I-phone is already unleashing a torrent of Big Data which is now
covering the globe from the richest to the poorest countries. The
Internet of Things which is just being born as we speak has the
potential to lead us much farther down the road of Big Data. Within a
few years our environment will first become “aware” of our presence then
able to communicate pro-actively with us.
This can easily become a nightmare of unwelcome marketing intrusion
and government surveillance if we do nothing about it. Conversely, we
can significantly increase the complexity of our lives in the background
while simplifying the interface with everything around us and therefore
our well being. The most likely outcome is of course a mixture of both
worlds. But let's hope that we learn quickly from our mistakes and find
the right “mix”, knowing that there is no correct answer or optimum
balance between all the factors.
The Big Data technologies we are currently working on sound and look
very much like science fiction although they are only a few short years
away from applications. The road ahead is still uncharted but a whole
new continent of data is waiting for adventurous minds to explore and “map”.
Making sense of the world through data The focus of this blog is #data #bigdata #dataanalytics #privacy #digitalmarketing #AI #artificialintelligence #ML #GIS #datavisualization and many other aspects, fields and applications of data
Subscribe to:
Post Comments (Atom)
Why am I afraid of AI and why should you too?
About 10 years ago, I started working with early AI models. The first thing we started doing was not AI at all. We were calling it: The Ra...
-
A little less complete than the previous article but just as good and a little shorter. We are indeed entering a Covid dystopia. Guest Pos...
-
In a sad twist, from controlled news to assisted search and tunnel vision, it looks like intelligence is slipping away from humans alm...
-
A rather interesting video with a long annoying advertising in the middle! I more or less agree with all his points. We are being "...
No comments:
Post a Comment