Can we still protect privacy in the age of
Google, Facebook and the Internet?
The answer should be simple and clear. And
to some extent see but privacy is slowly being eroded and melting
away. A combination of technology, concentration of the Internet,
centralization of database and legal vacuum almost guaranty that sooner than
later privacy will be gone. So to worry about privacy on the Internet is not
paranoia, even if reality does not yet corroborate our worst fears as we shall
see, the trend is not our friend and asking harsh questions about privacy is
legitimate.
But to approach the subject rationally to bring meaningful answers, let’s first look at the way privacy is
protected in the current “free”, advertising based Internet. To do so, it is
essential to understand that just as the way software is built is very
different than the way it is used, the perception of privacy on the Internet as
people experience it is very different to reality as companies deal with it.
We experience privacy literally and
personally at our individual level whenever companies request information or
more viscerally whenever we find on a Google search or any other database,
personal information that we believe should not be there. This is the
perception aspect of privacy, the one we sense and the one we react to. Why do
“they” know this? Where did “they" got the information from?
But for corporations, the landscape of
“private” information is very different. In spite of the “one-to-one” image
being projected, very few companies (none?) look at their clients individually
simply because it is impossible. Mostly, they approach clients statistically as
clusters in order to maximize sales. And as such individual information which
we call PII (Private Identifiable Information) only has value in a specific
context rather than by itself and this value for marketing purpose is very
specific: It is to increase the accuracy of statistical analysis in order to
optimize sales. Nothing more, nothing less.
As such it is rather innocuous. Marketing, even
if sometimes irritating, has existed in a technical sense for over a century
without garnering much concern. The reason is that information was collected,
used. . . and mostly forgotten or lost. This is of course what has changed with the
Internet: We do not “lose” or “forget” information anymore. Everything
collected can be stored, retrieved and more worryingly shared almost forever.
But beside our perception that it is: Is it really and for what purpose?
Here, we need to understand what is really
going on behind the curtain, how information is processed and used to bring a
meaningful answer to the question.
We live in the age of the Internet which
means that data is easy to collect and use in vast quantities, so much so that
most companies are drowning in data. But the purpose of most companies is not
to gather data, it is to use the data skillfully to get information and insight
into the behavior of their clients in order to increase sales of products and
services. The definition of marketing.
With a very few exceptions, most companies
only have a very limited range of products and services as efficiency is
normally increased by producing fewer products in a greater number. Likewise,
as companies need to understand how people react, buy and use they products,
their go through a similar process of optimization of their marketing and
therefore approach their clients as discrete groups with similar characteristics
and behavior which we call clusters. These clusters can then be analyzed with
statistical tools in order to increase sales and optimize interactions.
This quantitative approach to marketing and
behavior is what has really changed with the Internet and the reason why data
has suddenly become so valuable. The slow increase of efficiency of marketing
which was almost imperceptible before has greatly accelerated online creating a
whole new ecosystem of data users and providers, analytics and data exchanges
which has grown almost overnight to improve and optimize the use of data.
Because of its complexity, it is difficult
to give a complete overview of the online data marketplace which is in any case
not the purpose of this article, but it is important to understand how data navigates
this ecosystem.
Companies collect data on their clients. We
call this data: first party data. It usually consists of names, addresses,
telephone numbers, internet addresses and transaction information. This is the
core of their database but it is usually not enough to get real marketing
insight into clients’ behavior. To analyze a database effectively and
make sense of the data you already have, you need more data to understand “context”.
This can be any public data you can put your hands on; a telephone directory,
census data, geographic data . . or this can be other companies’ data which we call
third party data.
Third party data gives you the ability to
know more about your clients (what other products do they buy for example) and
to learn something about people who are not yet clients but potentially could
be. This is of course the data people are most concerned about.
In most countries, third party data is
restricted and cannot be exchanged freely. But this is not the case in the US
and this matters since the core of the Internet is and remains an American
based network. This lack of legislation in the US is one of the main reasons
why there is so much concern globally about privacy. Without limits, isn’t it
likely that most companies will abuse privacy if it gives them an edge in
marketing?
In reality, mostly, the answer is “no”. This
is only partly related to the fact that in the long term abusing your clients
is not the best way to keep them but more importantly it is because there are
practical limits to what you can do with data, marketing wise. As we have seen,
for marketing purpose, we approach clients as clusters and as such companies
can only handle a limited, optimum number of such clusters. So companies will
offer a range of products and for each product, they will look as a number of
distinct behavior or clusters. If you have for example 100 products and you
look at 10 distinct clusters for each product, you are already dealing with
1,000 different possible strategies to interact with your clients. This is a
lot of analytics and few companies can do it efficiently. In reality, many
companies have more than 100 products and in many cases it is easy to
distinguish more than 10 clusters of behavior for each product. This is not
practical, which is why traditionally companies would limit themselves to a
limited number of advertising strategies to optimize cost and efficiency. This
is what is rapidly changing with the introduction of new technologies.
Traditional advertising companies were more
“art” than science, especially data science. They had an overall understanding
of clients and of their reaction based on countless advertising campaigns built
over the years. An invaluable but also worthless experience! By the time you
know how your targeted clusters reacted to a campaign, it is too late. The
environment has changed and earlier success may easily engender future failures
due to uncontrollable factors.
Data exchange and instant feedback from
internet advertising has shrunk this gap to almost nothing. With simple A/B
testing on a web page for example, you can learn what works and does not, very
quickly and adjust your targeting. This would be bad enough if there were
thousands of companies competing for this service on-line. But there is not.
More than two third of internet advertising is controlled by two giant
companies: Google and Facebook. This gives them unheard of power over companies
who want to advertise, over web pages and sites who want to monetize space on
the pages they display. And more crucially for privacy, over what people
actually experience.
Data in this respect has become the
currency of the 21C as without it you are hopelessly in the dark and run the
risk to be left behind while your competitors surf a web of improved targeting
and better ROI. The pressure is immense but the tools limited. Which is of
course when people start cutting corners and looking for low hanging fruits
even when these are forbidden.
Privacy until very recently has gravitated
around names in relation to addresses (which are public) and telephone numbers
which can be public (phone directories) or private. So using your client list
for marketing purpose is considered acceptable, adding “data” to the list as
cluster data is fine too. (“you” live in a rich neighborhood based on your
address so we add a flag or an index to your name which is translated and can be used as a cookie) but transferring or giving access to a
list of clients for marketing (or political purpose) as Facebook did last year
is not acceptable.
This is clear or at least it would be if
privacy could be defined solely by a name attached to other data. But what happens
when data alone defines you? This problem has long been identified and this is
why people who live in census tracks with only a few households are pooled together and aggregated at a higher level. It is simply too
easy to find the name of a person living in the countryside when there is only
one or two households in the area and link whatever data the census gives you
back to that person. (Income for example)
The problem with the internet is that this
issue has been multiplied a hundred times as data points on households went
over the years from hundreds to thousands then millions. It is easy to
understand why. If you have 10 types of data on one million people in a city,
everyone’s privacy is fairly well protected. But if you have millions of data,
clearly individual patterns tend to emerge and become recognizable. Until
recently, no one was able to combine huge amount of data to identify patters.
Now of course with big data, this is exactly what has become possible.
Is this a risk? In marketing: Not really.
As we saw earlier, marketing understands people as clusters, not as individuals.
And no company is advanced enough to create clusters of “one”! But it is not
very difficult to imagine many other nefarious ways to use the technology to “follow”
people individually. Furthermore, as artificial intelligence makes progress,
more sophisticated marketing actions can be implemented. With simple if/then
rules, we can already have a significant impact on people’s behavior. With more
advanced rules, where exactly is the limit?
And this raises another more important
question: Can you follow all the current rules and still breach privacy? With
technology, the answer is yes! Already today, you do not need to know the name
of a person in order to gather valuable information about someone and create a
strategy based on the information. In fact, slowly, as information becomes more
plentiful, personal information becomes less and less relevant. Your face
captured in a public place is a much better “point” to link other data together
and can easily replace a name!
To resume, artificial intelligence will
quickly erode privacy thanks to its ability to recognize patterns in huge data
sets and make them unique as well as its potential to identify “you” without
any reference to your name or other traditional PII data. A pattern such as the
way you walk, or a specific buying behavior can just as readily identify you.
It is important to understand why this is
happening and that it is not a fatality. More than anything the strength of the
Internet is its ability to link data and database together to create a space
of knowledge. But at the same time, this is also this ability which is
destroying privacy. In other words, what is destroying privacy is not the end
result that people see but the mere fact of linking database together which is
one of the basic principles of the Internet.
Is this an issue? And should privacy be
protected above anything else?
In the end, this is the question we must
answer. How much do we value privacy?
In the past, living in a village, almost
everyone would know everything about everybody else. There were “secrets” but
you had to be careful as there was no privacy to speak of. Your neighbors knew
at what time you were waking up and what you were doing, the store owner, knew
everything you were buying and often the state of your finance, etc… No
privacy!
Large cities have given us the illusion of
privacy as a “right” for a while or at least something worth defending but now
data is catching up and very soon, the global IoT will know in real time where
you are and what you are doing. Worse, it will broadcast the information everywhere
to whoever cares to know. The information will then be recorded, more or less
forever, in multiple database to be shared with other data to get deep insight
into your habits and behavior for god-knows-what purpose. Difficult to imagine
a more total loss of privacy although this is exactly the future we are rushing
into with smart cars and smart cities which in the end will know so much about
us that the few “unknown” left will be easy to deduct reducing privacy to
nothing.
Is there an alternative to this dystopian
vision? Although this is not the route we are on right
now, I think there is. But it requires to understand that the
problem of privacy is not in the data as people believe but in the database and
their ability to enhance data. This is the “fight” we are currently witnessing
with crypto-currencies. Distributed systems guaranty anonymity. Centralized systems
bring the opposite. Should decentralized crypto-currencies become the norm, the
anonymity of money will be maintained. The alternative is for money in your
wallet to be remotely controlled, allowing you to buy or not to buy a plane
ticket for example. (The same as the Chinese system of social credit but
applied more broadly if it can be linked to money! A doctor could ideally
restrain a diabetic patient from buying a “toxic” sugary drink! More likely, he
would delegate the task to your mobile assistant.)
What is amazing and also frightening is
that most of these “decisions” will be taken by default by tech companies as
they implement new technologies. Sometimes the technology will be rejected such
as the Google glass which was both a fantastic and dreadful idea, but sometimes
it will just get through without much discussion such as Facebook face
recognition on pictures. Just another feature among many others.
So in the end, will privacy soon belong to
the past as vinyl records and paper pictures? Hard to tell at this stage but we
are rushing head-on towards such a future. The problem is that the technology
is exponential and out ability to react arithmetic. When your smart assistant
knows your name, tells the fridge who tells Amazon who tells everybody else
related, when a beacon on a smart machine identify your phone and broadcast
your location in a smart city, when every red light is equipped with cameras
and sensors, smart keys open your car, your door and give you access to your
office, you will become that much more efficient, but eventually just a cog in
the machine with no privacy left. A technology utopia or more likely distopia depending on
the way you look at it.