AI and the tyranny of the data commons

The myth of data as a non-rival resource needs to be abandoned.

FILE PHOTO: An AI (Artificial Intelligence) sign is seen at the World Artificial Intelligence Conference (WAIC) in Shanghai, China July 6, 2023. REUTERS/Aly Song/File Photo
An AI (artificial intelligence) sign is seen at the World Artificial Intelligence Conference (WAIC) in Shanghai, China on July 6, 2023 [File: Aly Song/Reuters]

I am here to tell you the sad but true story of the demise of the sharing economy. Remember how we were told, back in the 1990s and 2000s, that we were contributing to the creation of the largest commons known to humanity?

Well, to paraphrase The Lord of the Rings, we were all of us deceived, for another ring was made. Artificial intelligence (AI) is making that clearer than ever.

The free data we generated by spending thousands of hours on Big Tech’s platforms has been appropriated and converted into training data for AI models. To add insult to injury, the corporations that carried out this appropriation are now pretending to be as concerned as we are about AI’s disruptive power, and are even theatrically begging us to regulate their industry as they rake in the profits.

How did we get here? The seeds of the wholesale appropriation of our data were planted a long time ago when economists and media theorists declared data a non-rival resource, the basis for a sharing economy, where ownership is not important and consumers are free to create and distribute goods outside of a market system precisely because they are non-rivalrous.

An example of rivalrous good is a cake. If I eat the cake, no one else can eat it. A non-rival resource, on the other hand, can be consumed by many people without diminishing its value. Think of a digital picture of a cake. If I use it on a website or a social media post, this would not prevent others from doing the same and it would not diminish the quality and value of the digital picture.

We were told that the commons that was given shape by these non-rival goods represented nothing less than a new mode of production, an alternative to the exploitative mechanisms of capitalism. Data wanted to be free, and networks were inexhaustible sources of wealth, they said.

But is data a cost-free good? For a digital picture of a cake to exist, there has to be a real cake in the first place, or at least an artistic representation of a cake. This labour is made invisible in the data economy.

Even if the baker or the photographer of the cake willingly donate their labour (which they might do, if they believe they are contributing to a commons), other costs that go into the production, transmission and repurposing of that picture need to be accounted for.

What about the energy costs related to the circulation and storage of the picture, and the associated pollution costs? What about the human labour (often extracted under exploitative conditions) of labelling that cake picture, along with millions of others, to train AI models? On the surface, data may appear to be a non-rival good, but behind it, there is human effort, creativity and resources that are definitely rivalrous goods.

This tension has been somewhat managed by instituting two separate standards for valuing data. Individual data – like our health data or our browsing data – is legally protected, at least in theory (in practice, not so much). The same is true for data produced and owned by corporations; treat that as a common good, and you will be labelled a thief or a pirate.

But public data, our data commons, has been declared “free”, without an owner, just there for the taking. Its accumulation by corporations is an instance of what Nick Couldry and I call data colonialism, and its repurposing to train AI models represents not a tragedy, but a tyranny of the commons.

There is a reason why I am using “tyranny” rather than “tragedy”, which is what usually comes to mind when we think of the commons. The idea of the tragedy of the commons was popularised in a 1968 article by Garrett Hardin, an ecologist who was concerned with overpopulation.

Hardin used the allegory of a pasture that was not privately owned but used communally by shepherds to exemplify the dangers of unmanaged population growth. In his narrative, Hardin considers what happens when a shepherd decides to add one sheep to their flock. And then another, and another.

This act obviously brings gains to the individual shepherd, but when all shepherds do the same, the resources of the pasture are strained to the point of disaster. The lesson is that because environmental costs are not assumed by anybody, the community ends up abusing the natural resource until the system collapses. Hardin pointed out that privatisation or state control were the only ways to avoid this collapse.

Many have taken issue with Hardin’s tragical model, including Nobel Prize-winning economist Elinor Ostrom, who provided counter-examples of actual commons – from forests in Switzerland and Japan to irrigation systems in Spain and the Philippines – that were managed quite effectively by communities on their own terms.

These ideas about the positive power of the commons, which became popular around the same time the internet was coming of age, greatly influenced the idealism behind the sharing economy. We were led to believe that there was no tragedy in this commons, that it was ok to give our data to corporations because data was a non-rival good. We were encouraged to spend as much of our lives as possible in this digital land of plenty, where all benefitted equally.

Unfortunately, this idea, despite its aspirational beauty, has not served us well. This is because while corporations have been publicly persuading us to believe in the data commons and encouraging us to contribute to it, behind closed doors they have been doing everything in their power to privatise and monetise it. That’s where the tyranny comes in.

As the communities of Facebook, Twitter, Reddit and other social media platforms have found out, the data we generate does not belong to us at all. It belongs to corporations that care more about profit than about community.

In short, the sharing economy’s world without money was built on top of a world where money was everything, and the bill has come due. Our data has not only been appropriated, but is increasingly used against us. It has become the fuel behind AI models whose power and influence on our lives we are just beginning to fathom, but that we can already see are not all positive, or not all to our advantage – especially to those most vulnerable in our societies.

Big Tech will continue to cling to the idea of data as a non-rival good, claiming that data extractivism is all done for our benefit. They may even promise that their AI models will be open-source public goods, which supposedly means that our stolen data will come back to us as a more useful product, capable of solving the world’s problems.

We must see these moves for what they are: Not the altruistic actions of benevolent corporations, but a way to avoid lawsuits, delay attempts at serious regulation and more importantly, justify the privatisation of the commons.

There are precedents for this kind of deceit. One such precedent is the creation of natural reserves and national parks during the 20th century in the United States. Once the land had been stolen from its rightful owners, the First Nation peoples, some of it was declared a public good for the enjoyment of everyone, as a way to conceal the original act of dispossession. We must prevent this from happening again in new contexts and new forms.

The views expressed in this article are the author’s own and do not necessarily reflect Al Jazeera’s editorial stance.