According to a study from Pew Research, 38% of webpages that were live in 2013 are no longer accessible. This underscores the fleeting nature of online content and examines how digital decay impacts various online platforms.
The implications of this digital decay are considerable, as they affect the reliability and permanence of online information. The study employed a conservative approach to measure the accessibility of webpages, ensuring that the findings reflect a significant trend in the loss of digital content over the past decade.
Government and News Websites
The study sampled nearly 1 million webpages from the Common Crawl web repository, spanning each year from 2013 to 2023. The findings indicate that 25% of all pages from this period are now inaccessible. Specifically, 16% of these pages are individually unreachable while their root domains remain active, and 9% are inaccessible due to defunct root domains.
Government websites are also affected by this trend. Out of approximately 500,000 pages sampled from government sites, 21% contained at least one broken link. The study found that 6% of the links on these pages are no longer functional. Local government pages, particularly those at the city level, are especially prone to broken links.
News websites show similar patterns of digital decay. The analysis of 500,000 pages from 2,063 news websites revealed that 23% of these pages contain at least one broken link. This issue is consistent across both high-traffic and low-traffic news sites, with 25% of pages on highly trafficked sites and 26% on less trafficked sites having broken links.
Wikipedia and Social Media
Wikipedia, a widely used online encyclopedia, is not immune to this issue. The study sampled 50,000 English-language Wikipedia pages and found that 54% of these pages contain at least one broken reference link. In total, 11% of all reference links on Wikipedia are no longer accessible.
Social media platforms are also experiencing digital decay. The research tracked nearly 5 million tweets posted between March 8 and April 27, 2023, on the platform X (formerly known as Twitter). It was found that 18% of these tweets were no longer publicly visible by June 15, 2023. In 60% of the cases, the account that posted the tweet was either made private, suspended, or deleted. The remaining 40% involved individual tweets being deleted while the accounts remained active.