Saturday, July 5, 2025

How is your data coded? (and stored)

   This is an old slide so you can easily imagine that a lot of progress has been made since. Still, it is important to understand for this is how your data and everybody's else data is coded by the NSA and other agencies which job is to know everything.

 

  The first challenge is to gather everything which is why the Internet is free. Otherwise it would cost an absolute fortune to run such a vast network. (Did someone forgot to tell you that nothing is free in life?) Then convince you to provide as much data as possible through Social networks. (This was truly genius!) 

  The second challenge is to organize the data efficiently. Most of it, at a time "t" has no value or interest whatsoever. It just needs to be stored. But when you need it, it has to be retrieved instantly. So a good classification system is needed. The system as explained above looks old but was enough 20 years ago. It must be an order of magnitude more advanced by now.

  The third and most complex challenge is to merge skillfully and smoothly different types of information: text, video, audio, slides, etc... This, at some stage was my job and later became almost everybody's job if you had anything to do with data. We call it "enrichment" which mostly consists in merging different database and getting insight from the data. This is where most of the progress has been made over the last 20 years.

  Finally, 5 years ago, AI which until then was a rather cumbersome and hard to use tool, called "machine learning" made an extraordinary breakthrough thanks to the technology of transformers and everything changed. The job of organizing and understanding data became much simpler as we entered the next phase. Synthetic data.

  Synthetic data it the data you do not have but can build or deduce from other data. Its realm is infinite. Let's take a simple example: You post a picture on Facebook of you reading a letter to someone. Plain and innocent enough, right? 10 years ago, it would have been.

  Now, I can enhance the picture and reverse read what's written on the letter thanks to the reflection in the mirror behind. Because the picture was 4K, I can see that you are reading the letter to your little sister which is reflected on the pupil of your eyes. Thanks to the meta data, I know when the picture was taken. I can see a house outside the window which can be used to geo-locate the place. There is a person walking outside the house. Scan the face: John Hattaway. Strange, it's 3pm, he is supposed to be at work downtown at this hour. I check John Hattaway's profile. It looks like he is calling regularly a house a little further down the row. (Remember that meta-data, where and who you call is officially fair game in the US.) A romantic encounter? It looks like John is calling regularly on Friday around lunchtime... Which happens to be a very interesting information because the person living in that house is a lawyer dealing with sensitive cases... 

  Do you see where this is going? In the early 2000s, the technology was exposed in movies such as The Bourne Identity. It was possible but difficult to implement and somehow reassuringly, a single well trained individual could easily outsmart the system. That was when humans were doing the intelligence work. Now, imagine today, as we outsource everything, the data, the meta data, the construction of the synthetic data and the deduction process to AI. Add millions, then billions of data sources: public cameras, sensors, alarms, home appliances scanning your house... 

  Now try to imagine "privacy" in such a world? It simply has no meaning. Soon enough the environment around you will be aware of your presence. Forget your mobile phone which has been broadcasting "I am here" for over 15 years now. About 10 years ago, I worked with a company installing "beacons" on vending machines in the streets and shopping malls which sole purpose was collecting a confirmation of your presence and reselling it (for marketing purpose.) This is low value data, but add it to other data and suddenly you have something actionable. Not yet the Matrix but most certainly well on our way!


No comments:

Post a Comment

Caffeine May Slow Cellular Aging By Activating A Protective Stress Response

   How extraordinary is it when science confirms what we have always known?    Here's the 5 ingredients to good health;   Vitamin C ,   ...