Monthly Archives: August 2010

Analysis: Half of published URLs are dysfunctional a decade later

As a small aside when setting up a local mirror of Medline, I extracted 15,915 URLs that were mentioned in the abstracts. Checking them revealed that 12,354 of them (78%) were functional, which may not seem that bad. However, plotting the percentage of dysfunctional URLs as a function of publication year reveals a less pleasant trend:

Dysfunctional URLs

After just 10 years, half of all published URLs are no longer functional, and do not redirect to the new location of the service (if one exists). The fairly high success rate overall is merely a consequence of most URLs having been published within the last few years. Unless the persistence of URLs is improving (which I see no sign of in the plot), we can thus expect to have thousands of URLs in the published literature that are no longer valid.

Edit: Andrew Lang pointed out a similar study of URLs cited in communications journals.

Edit: Duncan Hull pointed out a paper on URL decay in Medline by Jonathan Wren, which reminded me of an even earlier paper on the topic.