Google’s retiring of Internet archiving tool draws ire of China researchers

The search giant’s cached links long helped researchers keep track of China’s heavily censored internet.

Google China - LP LEAD
Google's has decided to retire its cache function [Roman Pilipey/EPA-EFE]

Taipei, Taiwan – For researchers of China, keeping up with the country’s politics or economy is hard enough due to its opaque leadership and pervasive censorship.

Now they face a challenge from an unexpected source: Google.

Late last year, Google began quietly removing links to cached pages from its search results, a function that had allowed Internet users to view old versions of web pages.

Danny Sullivan, Google’s public liaison for search, confirmed earlier this month that the function had been discontinued.

“It was meant for helping people access pages when way back, you often couldn’t depend on a page loading. These days, things have greatly improved. So, it was decided to retire it,” Sullivan said in a post on X earlier this month.

Although originally introduced to improve internet performance, Google’s cache function had the unintended effect of boosting transparency and became an invaluable resource for researchers.

Academics, journalists and others used cached pages to view past incarnations of websites and deleted content – a particularly useful tool for China’s internet, which Beijing carefully edits to avoid embarrassment and ward off potential dissent.

“The loss of the Google cache function will be a blow to China researchers who have long leaned on this function to preserve access to information that may later be removed, particularly in research citations,” Kendra Schaefer, the head of tech policy research at Trivium China, told Al Jazeera.

A Google spokesperson confirmed the change to Al Jazeera.

“Google’s cached page feature was born over two decades ago, at a time when pages might not be dependably available. The web – and web serving as a whole – has greatly improved since then, making the need for cached pages less necessary,” the spokesperson said by email.

China’s “Great Firewall” means that popular sites from Wikipedia to Facebook are inaccessible without a virtual private network, while its government censors trawl the web for sensitive content to remove.

Taboo topics

In addition to taboo topics such as the 1989 Tiananmen Square crackdown and criticism of Chinese President Xi Jinping, censors have taken aim at targets ranging from the socially conscious Chinese rock band Slap to comments made by the late Premier Li Keqiang about strengthening HIV/AIDS prevention work.

Throughout the COVID-19 pandemic, Beijing closely monitored and removed undesirable content and has since then been trying to rewrite the post-pandemic narrative by suppressing politically inconvenient scientific studies and international news reports.

There are alternatives to Google’s cached pages, namely the non-profit Internet Archive’s Wayback Machine.

But Google’s removal of cached links makes it harder to know what is missing in the first place, said Dakota Cary, a non-resident fellow at the Atlantic Council’s Global China Hub.

“We’re not going to know how much we are missing because we can’t measure what was lost, because it’s not something we can see any more,” Cary told Al Jazeera.

Even dead links in Google’s search results could give researchers pointers or show how a website had been changed, he said.

“Now you have to expand the ways in which you might think about doing or looking for certain items and maybe ask people who specialise in a particular place if they have access or have a backup of a particular document. The way that research is conducted is going be a lot more difficult,” Cary added.

bing
China’s internet is subject to heavy government censorship [Andy Wong/AP]

Graham Webster, the editor-in-chief of the DigiChina Project at Stanford University, said he was less concerned about the impact – mainly because Western sites like Google and Wayback Machine had not been as thorough at scouring the Chinese internet as other domains.

“Cached pages have at times been a resource for China researchers to access deleted pages for usually a short period after they come down. [The Internet Archive] Archive.org generally was not crawling the net as thoroughly and sometimes, it would not grab the key parts of a page but it’s still a resource if you know the URL you’re looking for,” Webster told Al Jazeera.

Cary said Google’s decision to step away from “backing up the internet” raises questions about whose responsibility it should be to keep a record going forward.

“Archiving is an incredibly useful function and given the way that so much of our lives has transformed into this digital medium, I don’t know if we’ve really taken steps to preserve the information that’s put out and published on the internet.”

Cary said inspiration could be taken from the US government, which does extensive work archiving online content produced by foreign governments and other sources.

“There’s a whole system for that and it seems like maybe this is a place where our systems could kind of adapt to the age that we now live in.”

Source: Al Jazeera

Advertisement