The Data Insider: Facebook, PII Data and the GDPR!

Now that the Facebook controversy is behind, it is time to turn back and understand what all this was really about and ask what “privacy” means on the Internet where the product is “us”!

When Facebook launched in February 2004, the motto was: “It’s free and always will be!” Not as daunting as Google’s “Do no evil!” but quite bold nevertheless since it is hard to have a less commercial statement for a company to list on a stock exchange.

Since, of course, we’ve learned better and everybody now knows the deeper truth of the Internet that “When it’s free, you are the product!” The economic model is in fact so overwhelming that the whole Internet has developed on this principle without much thought about the consequences. Whenever we install a new “free” application, we just say “yes” to 10 pages of legal gibberish which states in plain words the actual price of the service, usually on page 7 to make sure that nobody goes that far, and explain what data will be collected, and consequently monetize to pay for the application.

It is a model without options (Or rather it was before the implementation of Europe GDPR, General Data Protection Regulation, as we will see later.). You can say “no”, in which case you are left with no application and no service so in reality there is no choice. Reason why nobody reads the terms of use! But it works. A whole ecosystem has developed around this principle introduced as “applications” by Apple and is now here to stay. Most people have now forgotten the fact that computers, as initially designed, were “universal” machines and could therefore be adapted to any use. We now have a system where applications restrict the possibilities in exchange for stability, ease of use and an invisible price.

Conversely, what the applications do for the providers has exploded and one thing that they do better than any other software is collect information. It is no hazard that at the same time computers migrated from our desktops to our pockets where they can follow and track our every move, action and thoughts and digitalize everything for quantitative and qualitative analysis.

The initial and obvious use of the technology was marketing, targeting and advertising. These commercial activities have traditionally been more “art” than science, and the sudden explosion of “data” allowed for a completely new and more accurate approach of people transforming them from clients and consumers to users.

But to go further in the understanding of the controversy, it is necessary to digress and have a quick look behind the curtain to understand what is really going on.

Traditional advertising is simple. Put your advertising somewhere: on a wall, on TV, on a train and try to measure who is seeing it and what impact it has. It is an impossible task! We use panels, interviews and other techniques but the result is “nobody knows!” as illustrated by the famous statement: “Half of advertising is useless but we cannot know which half!” On the Internet, it’s different because everything can be measured, not just in a static way but in a dynamic way and improvement therefore becomes possible.

Based on this principle, a whole ecosystem has developed to put the right advertising in front of the right eyeballs. It is a complex ecosystem with many actors and two giants at the top: Google and Facebook who control the system.

Initially, Internet advertising developed around banners and pop-ups based on very simple metrics: location and type of articles. But the limits of the system were quickly reached. Then Internet companies had the great idea to “follow” people around with cookies and suddenly the same irritating advertising appeared on every site you visited with the obvious negative consequences on people’s perception of being “followed”. Then advertisers realized that to improve efficiency which they decided could be measured with “clicks” as actual sales were difficult to link with on-line behavior, they not only needed to exchange data with other sites, but to actually buy “other” data to make sense of context. Third party data was born.

To do this, complex systems were developed, where client data was first anonymized then exchanged as data “clusters” so that valuable information could be traded without breach of individual information. In this way, you could now “know” that within your 10,000 clients data list. 2000 had great potential (cluster A), 5000 not so much (cluster B) and 3000 none (cluster C) A valuable and flexible insight made even more useful by its instant, automatic application and use.

The combination of all these technologies, on-line and off-line data, as well as actual feedback has created a dynamic ecosystem where something, advertising, which was mostly “art” is suddenly becoming measurable science with an infinite scope for improvement. And this is of course where the nature of marketing changes and morph into “manipulation”. If you know exactly how a certain cluster of people will react to a certain type of advertising, you can actually toggle your approach to have exactly the desired effect on your target.

As long as this is used for commercial applications, it seems to be acceptable. After all this has always been the stated goal of advertising. But why restrict such powerful tools to marketing? If you can manipulate people into buying a product, you can probably also manipulate them into buying ideas, and from product to politics the gap is small. This too is of course not a very new idea. Already in the early 20th century, early sociologists like Edward Barney realized that much could be achieved in this field provided the right techniques were implemented. Soon the tools were developed and political science became more “technical”. But, just as advertising, the measurement tools were crude and the “science” was likewise more art than science.

Until the advent of the Internet where large scale “improvements” became possible and money could suddenly buy elections (which had always been the case) but in a more complex and apparently “democratic” way, undermining the foundations of our social system. And this is where, Facebook crossed the Rubicon!

When it became apparent that not only were they “monetizing” data but actually pro-actively using the data to undermine the political and economic system for monetary and ideological gains… with no real limits in sight.

So now what?

It is in fact extremely difficult to answer this question at this stage.

Facebook had done a “mea culpa” and promised to stop using third party data. Cosmetically they will, but in reality they cannot because the whole “free” Internet system as we know it, is based on this principle. Moreover, there is nothing wrong with third party data! Third party data as it is currently used does not breach any privacy law, including the GDPR since it is anonymous and index based. As long as no personal data is exchanged which is the case of most marketing applications, there should be no problem. In many cases, the population census is the base of third party data and no country is planning to cancel their census or give an opt-out option to people! So, just putting restrictions on some type of applications should do the trick and insure that the current system can stay in place without inviting further controversies. In this respect, it would be wise to rename third party data, “context data” which would be a better description.

If only!

The problem with the Internet is that nothing is static. Technology is progressing at a breakneck pace and transforming the world in front of our eyes.

Two technologies in particular will completely change marketing on the Internet in the next few years. The first one is the IoT (Internet of Things) and the arrival of smart always on-line objects which exchange information and communication with other objects creating the potential for a huge and permanent breach of privacy. Thinking about it: How much privacy is left when every object knows where you are and broadcast your presence in real time to anyone who cares to know? The extreme case is beacon technology for example.

The 2nd technology which will change everything is the arrival of AI (Artificial Intelligence) applications which will improve tremendously the efficiency of targeting thanks to the ability to “learn” and improve targeting in real time to insure optimal efficiency and eventually create clusters of “one” with optimal results.

Seen from this point of view, it is clear that the question raised by the Facebook controversy is an important one. Can we still protect individual privacy in the 21st century? And more fundamentally, what exactly is “privacy” in the digital age?

The answer to this question is far less obvious than it seems and it will take many years of trial and error to find solutions which are both commercially workable and socially acceptable. Facebook because it was so far in front was the first company to be confronted with these questions.

The implementation of the GDPR (General Data protection Regulation) by the European Union which applies to all database which store data from European Citizens wherever they are located, will oblige us to confront these complex problems.

What exactly is “personal” data?

Who can use it and for what purpose?

Who can sell it and with what restrictions?

To illustrate how complex the question can be, let’s take the answer of the NSA as an example: “We do not collect telephone data, only meta-data!”
What does this mean? The content of the conversation is not recorded (really?) Only who you call, when, how long, etc… But although the example works for a telephone call, it does not for the Internet, where there is no difference between a call and a mail, where you can attach links, pictures and documents.

Eventually, as our lives become more and more on-line, data disappear and give rise to a meta-data only world. When a person with (a,b,c) characteristics does a X transaction with a person with (d.e.f) characteristics on the net, there will be no “data” left, just meta-data. The transaction is of course recorded in 10 different places for different purposes (identification, authentication, analysis, etc…). What is acceptable and what is not.

To answer this question, we need to take a look at the architecture of the Internet and of the centralized database which have been built to take advantage of the opportunities offered by this extraordinary network. This will be the object of a follow-up post. But just as a hint of a possible answer, the implementation of structured distributed networks could offer a local solution to a global problem.

The Data Insider

Sunday, March 3, 2019

Facebook, PII Data and the GDPR!

No comments:

Post a Comment

Companies Offering Top AI Researchers Hundreds of Millions of Dollars

Report Abuse