Friday, June 28, 2019

The end of privacy - How AI will destroy our privacy





Can we still protect privacy in the age of Google, Facebook and the Internet?

The answer should be simple and clear. And to some extent see but privacy is slowly being eroded and melting away. A combination of technology, concentration of the Internet, centralization of database and legal vacuum almost guaranty that sooner than later privacy will be gone. So to worry about privacy on the Internet is not paranoia, even if reality does not yet corroborate our worst fears as we shall see, the trend is not our friend and asking harsh questions about privacy is legitimate.

But to approach the subject rationally to bring meaningful answers, let’s first look at the way privacy is protected in the current “free”, advertising based Internet. To do so, it is essential to understand that just as the way software is built is very different than the way it is used, the perception of privacy on the Internet as people experience it is very different to reality as companies deal with it.

We experience privacy literally and personally at our individual level whenever companies request information or more viscerally whenever we find on a Google search or any other database, personal information that we believe should not be there. This is the perception aspect of privacy, the one we sense and the one we react to. Why do “they” know this? Where did “they" got the information from?

But for corporations, the landscape of “private” information is very different. In spite of the “one-to-one” image being projected, very few companies (none?) look at their clients individually simply because it is impossible. Mostly, they approach clients statistically as clusters in order to maximize sales. And as such individual information which we call PII (Private Identifiable Information) only has value in a specific context rather than by itself and this value for marketing purpose is very specific: It is to increase the accuracy of statistical analysis in order to optimize sales. Nothing more, nothing less.

As such it is rather innocuous. Marketing, even if sometimes irritating, has existed in a technical sense for over a century without garnering much concern. The reason is that information was collected, used. . . and mostly forgotten or lost. This is of course what has changed with the Internet: We do not “lose” or “forget” information anymore. Everything collected can be stored, retrieved and more worryingly shared almost forever. But beside our perception that it is: Is it really and for what purpose?  

Here, we need to understand what is really going on behind the curtain, how information is processed and used to bring a meaningful answer to the question.

We live in the age of the Internet which means that data is easy to collect and use in vast quantities, so much so that most companies are drowning in data. But the purpose of most companies is not to gather data, it is to use the data skillfully to get information and insight into the behavior of their clients in order to increase sales of products and services. The definition of marketing.

With a very few exceptions, most companies only have a very limited range of products and services as efficiency is normally increased by producing fewer products in a greater number. Likewise, as companies need to understand how people react, buy and use they products, their go through a similar process of optimization of their marketing and therefore approach their clients as discrete groups with similar characteristics and behavior which we call clusters. These clusters can then be analyzed with statistical tools in order to increase sales and optimize interactions.

This quantitative approach to marketing and behavior is what has really changed with the Internet and the reason why data has suddenly become so valuable. The slow increase of efficiency of marketing which was almost imperceptible before has greatly accelerated online creating a whole new ecosystem of data users and providers, analytics and data exchanges which has grown almost overnight to improve and optimize the use of data.

Because of its complexity, it is difficult to give a complete overview of the online data marketplace which is in any case not the purpose of this article, but it is important to understand how data navigates this ecosystem.

Companies collect data on their clients. We call this data: first party data. It usually consists of names, addresses, telephone numbers, internet addresses and transaction information. This is the core of their database but it is usually not enough to get real marketing insight into clients’ behavior. To analyze a database effectively and make sense of the data you already have, you need more data to understand “context”. This can be any public data you can put your hands on; a telephone directory, census data, geographic data . .  or this can be other companies’ data which we call third party data.

Third party data gives you the ability to know more about your clients (what other products do they buy for example) and to learn something about people who are not yet clients but potentially could be. This is of course the data people are most concerned about.

In most countries, third party data is restricted and cannot be exchanged freely. But this is not the case in the US and this matters since the core of the Internet is and remains an American based network. This lack of legislation in the US is one of the main reasons why there is so much concern globally about privacy. Without limits, isn’t it likely that most companies will abuse privacy if it gives them an edge in marketing?

In reality, mostly, the answer is “no”. This is only partly related to the fact that in the long term abusing your clients is not the best way to keep them but more importantly it is because there are practical limits to what you can do with data, marketing wise. As we have seen, for marketing purpose, we approach clients as clusters and as such companies can only handle a limited, optimum number of such clusters. So companies will offer a range of products and for each product, they will look as a number of distinct behavior or clusters. If you have for example 100 products and you look at 10 distinct clusters for each product, you are already dealing with 1,000 different possible strategies to interact with your clients. This is a lot of analytics and few companies can do it efficiently. In reality, many companies have more than 100 products and in many cases it is easy to distinguish more than 10 clusters of behavior for each product. This is not practical, which is why traditionally companies would limit themselves to a limited number of advertising strategies to optimize cost and efficiency. This is what is rapidly changing with the introduction of new technologies.

Traditional advertising companies were more “art” than science, especially data science. They had an overall understanding of clients and of their reaction based on countless advertising campaigns built over the years. An invaluable but also worthless experience! By the time you know how your targeted clusters reacted to a campaign, it is too late. The environment has changed and earlier success may easily engender future failures due to uncontrollable factors.

Data exchange and instant feedback from internet advertising has shrunk this gap to almost nothing. With simple A/B testing on a web page for example, you can learn what works and does not, very quickly and adjust your targeting. This would be bad enough if there were thousands of companies competing for this service on-line. But there is not. More than two third of internet advertising is controlled by two giant companies: Google and Facebook. This gives them unheard of power over companies who want to advertise, over web pages and sites who want to monetize space on the pages they display. And more crucially for privacy, over what people actually experience.

Data in this respect has become the currency of the 21C as without it you are hopelessly in the dark and run the risk to be left behind while your competitors surf a web of improved targeting and better ROI. The pressure is immense but the tools limited. Which is of course when people start cutting corners and looking for low hanging fruits even when these are forbidden.

Privacy until very recently has gravitated around names in relation to addresses (which are public) and telephone numbers which can be public (phone directories) or private. So using your client list for marketing purpose is considered acceptable, adding “data” to the list as cluster data is fine too. (“you” live in a rich neighborhood based on your address so we add a flag or an index to your name which is translated and can be used as a cookie) but transferring or giving access to a list of clients for marketing (or political purpose) as Facebook did last year is not acceptable.

This is clear or at least it would be if privacy could be defined solely by a name attached to other data. But what happens when data alone defines you? This problem has long been identified and this is why people who live in census tracks with only a few households are pooled together and aggregated at a higher level. It is simply too easy to find the name of a person living in the countryside when there is only one or two households in the area and link whatever data the census gives you back to that person. (Income for example)

The problem with the internet is that this issue has been multiplied a hundred times as data points on households went over the years from hundreds to thousands then millions. It is easy to understand why. If you have 10 types of data on one million people in a city, everyone’s privacy is fairly well protected. But if you have millions of data, clearly individual patterns tend to emerge and become recognizable. Until recently, no one was able to combine huge amount of data to identify patters. Now of course with big data, this is exactly what has become possible.

Is this a risk? In marketing: Not really. As we saw earlier, marketing understands people as clusters, not as individuals. And no company is advanced enough to create clusters of “one”! But it is not very difficult to imagine many other nefarious ways to use the technology to “follow” people individually. Furthermore, as artificial intelligence makes progress, more sophisticated marketing actions can be implemented. With simple if/then rules, we can already have a significant impact on people’s behavior. With more advanced rules, where exactly is the limit?

And this raises another more important question: Can you follow all the current rules and still breach privacy? With technology, the answer is yes! Already today, you do not need to know the name of a person in order to gather valuable information about someone and create a strategy based on the information. In fact, slowly, as information becomes more plentiful, personal information becomes less and less relevant. Your face captured in a public place is a much better “point” to link other data together and can easily replace a name!

To resume, artificial intelligence will quickly erode privacy thanks to its ability to recognize patterns in huge data sets and make them unique as well as its potential to identify “you” without any reference to your name or other traditional PII data. A pattern such as the way you walk, or a specific buying behavior can just as readily identify you.

It is important to understand why this is happening and that it is not a fatality. More than anything the strength of the Internet is its ability to link data and database together to create a space of knowledge. But at the same time, this is also this ability which is destroying privacy. In other words, what is destroying privacy is not the end result that people see but the mere fact of linking database together which is one of the basic principles of the Internet.

Is this an issue? And should privacy be protected above anything else?
In the end, this is the question we must answer. How much do we value privacy?
In the past, living in a village, almost everyone would know everything about everybody else. There were “secrets” but you had to be careful as there was no privacy to speak of. Your neighbors knew at what time you were waking up and what you were doing, the store owner, knew everything you were buying and often the state of your finance, etc… No privacy!
Large cities have given us the illusion of privacy as a “right” for a while or at least something worth defending but now data is catching up and very soon, the global IoT will know in real time where you are and what you are doing. Worse, it will broadcast the information everywhere to whoever cares to know. The information will then be recorded, more or less forever, in multiple database to be shared with other data to get deep insight into your habits and behavior for god-knows-what purpose. Difficult to imagine a more total loss of privacy although this is exactly the future we are rushing into with smart cars and smart cities which in the end will know so much about us that the few “unknown” left will be easy to deduct reducing privacy to nothing.

Is there an alternative to this dystopian vision? Although this is not the route we are on right now, I think there is. But it requires to understand that the problem of privacy is not in the data as people believe but in the database and their ability to enhance data. This is the “fight” we are currently witnessing with crypto-currencies. Distributed systems guaranty anonymity. Centralized systems bring the opposite. Should decentralized crypto-currencies become the norm, the anonymity of money will be maintained. The alternative is for money in your wallet to be remotely controlled, allowing you to buy or not to buy a plane ticket for example. (The same as the Chinese system of social credit but applied more broadly if it can be linked to money! A doctor could ideally restrain a diabetic patient from buying a “toxic” sugary drink! More likely, he would delegate the task to your mobile assistant.) 

What is amazing and also frightening is that most of these “decisions” will be taken by default by tech companies as they implement new technologies. Sometimes the technology will be rejected such as the Google glass which was both a fantastic and dreadful idea, but sometimes it will just get through without much discussion such as Facebook face recognition on pictures. Just another feature among many others.

So in the end, will privacy soon belong to the past as vinyl records and paper pictures? Hard to tell at this stage but we are rushing head-on towards such a future. The problem is that the technology is exponential and out ability to react arithmetic. When your smart assistant knows your name, tells the fridge who tells Amazon who tells everybody else related, when a beacon on a smart machine identify your phone and broadcast your location in a smart city, when every red light is equipped with cameras and sensors, smart keys open your car, your door and give you access to your office, you will become that much more efficient, but eventually just a cog in the machine with no privacy left. A technology utopia or more likely distopia depending on the way you look at it.

No comments:

Post a Comment

Why am I afraid of AI and why should you too?

  About 10 years ago, I started working with early AI models. The first thing we started doing was not AI at all. We were calling it: The Ra...