How and why do we accumulate so much data?¶
Authors and date
- Submitted on: April 12th 2021
- Laurent Devernay ; Referent Trainers Occitania ; simplon.co
By definition, data is what is known, what can be used as a basis for thinking. During the evolution of the human being, it quickly appeared essential to manage data in order to gather, compare and transmit them. This is also why the media used have evolved.
One of the initial promises of the web was to allow everyone to access information more easily and especially for free. However, the growth of data requires processing, formatting, centralization and sharing, which is both time-consuming and resource-intensive. The latter have multiplied and, according to IBM, 90% of the data produced by humanity has been produced in the last 2 years1. And this statement has been true for more than 30 years. Except that, today, 90% of the stored data would be useless2.
Imagine for example that each autonomous car collects 4TB of data per day for about 1.5 hours of driving (even if only a part of it goes through the network)3! With tens of billions of connected objects in the world also collecting and transferring data, the problem becomes significant. The environmental impact of the transfer and storage of all this information is not insignificant4.
The example of photography¶
Photography illustrates well the evolution of our uses linked to digital technologies. The transition from argentic to digital format has been a revolution. Cost reduction (no more need for film or development), new possibilities and new needs (print your own pictures but with specific paper and sometimes printers designed for that, easy storage). With smartphones, all this has become even easier in appearance. It's no big deal to have multiple tries to get a picture right and filters can easily be added. It has never been so easy to share photos via messaging or social networks. There is little risk of losing them as they are automatically sent to the cloud by default. All this is free, but only if you omit the price of the equipment used and the actual (environmental) cost of storage. Until recently, cloud storage was mostly free of charge. Until Google realized that 4 billion photos and videos are sent to its servers every day5. Beyond a certain limit, the service became chargeable. More generally, dematerialization and free access have impacted our vision of the digital world. It is necessary to keep in mind that the cloud remains a set of physical equipment and that everything we store has a cost. Not necessarily a financial cost but a cost for the planet (equipments to manufacture, to supply, to cool down and which will be regularly replaced). Except that we rely more and more on digital services in our daily life. For administrative procedures but not only. The dematerialization of sales receipts is proposed (or even imposed) as a good practice for the environment, totally ignoring the digital pollution generated (and the issue of personal data) 6. The trend is also spreading to invoices and other bank statements. After thorough analysis, La Poste realized that some of its promotional communications were more impactful via digital than paper 7.
The case of personal data¶
With the web, one type of data in particular has become more valuable: personal data. Personal data is information about a physical person. It may concern your first name or last name but also what relates to your religion, your political opinions, the programs you watch on TV, etc. On the web8, this information is recovered mainly in three ways:
- Information that you willingly give, for example when you fill out a form to make an online purchase.
- Cookies, which are text files used by websites to store data.
- Trackers, which are bits of code designed to capture information about the user, such as what pages they visit, what they search for on the web, etc.
Thanks to all this, it is possible to build a profile for each Internet user. Some large companies built their economic model on the collection and resale of this data as well as on algorithms that allow, from personal data, to deduce new information about someone (if you liked this book, you will probably like this other book). Among these companies, data brokers made personal data a business in its own right9.
Except that this collection of personal data is problematic. The GDPR10 appeared in 2018 to protect european citizens. Thus the collection of data from European Internet users or by European structures can only be done with:
- the explicit consent of Internet users
- full transparency (on the nature of the data collected and the use made of it, especially if it is sold to third parties)
- a right to be forgotten (making it possible to ask a company to delete the data collected on oneself)
In addition, tools to protect oneself from trackers are emerging (Ghostery, Blacklight, etc.). In any case, it is important today to ask ourselves what happens to our data.