Link rot

Link rot (or linkrot) refers to the process by which hyperlinks on individual websites or the Internet in general point to web pages, servers or other resources that have become permanently unavailable. The phrase also describes the effects of failing to update out-of-date web pages that clutter search engine results.

Terminology

Link rot is also called "link death", "link breaking" or "reference rot". A link that does not work any more is called a "broken link", "dead link", or "dangling link". Formally, this is a form of dangling reference: The target of the reference no longer exists.

Causes

One of the most common reasons for a broken link is that the web page to which it points no longer exists. This frequently results in a 404 error, which indicates that the web server responded but the specific page could not be found. Another type of dead link occurs when the server that hosts the target page stops working or relocates to a new domain name. The browser may return a DNS error or display a site unrelated to the content originally sought. The latter can occur when a domain name lapses and is reregistered by another party. Other reasons for broken links include:

Prevalence

The 404 "Not Found" response is familiar to even the occasional web user. A number of studies have examined the prevalence of link rot on the web, in academic literature, and in digital libraries.[1] In a 2003 experiment, Fetterly et al. discovered that about one link out of every 200 disappeared each week from the Internet. McCown et al. (2005) discovered that half of the URLs cited in D-Lib Magazine articles were no longer accessible 10 years after publication, and other studies have shown link rot in academic literature to be even worse (Spinellis, 2003, Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital libraries and found that about 3% of the objects were no longer accessible after one year. In 2014, bookmarking site Pinboard's owner Maciej Cegłowski reported a “pretty steady rate” of 5% link rot per year.[2]

A 2014 Harvard Law School study by Jonathan Zittrain, Kendra Albert and Lawrence Lessig, determined that approximately 50% of the URLs in U.S. Supreme Court opinions no longer link to the original information.[3] They also found that in a selection of legal journals published between 1999 and 2011, more than 70% of the links no longer functioned as intended. A 2013 study in BMC Bioinformatics analyzed nearly 15,000 links in abstracts from Thomson Reuters’ Web of Science citation index and found that the median lifespan of web pages was 9.3 years, and just 62% were archived.[4] In August 2015 Weblock analyzed more than 180'000 links from references in the full-text corpora of three major open access publishers and found that overall 24.5% of links cited were no longer available.[5]

Discovering

Discovering broken links might be done manually or automatically. Automated methods, including plug-ins for WordPress, Drupal and other content management system can be used to detect the presence of broken URLs. An alternative is using a specific broken link checker like Xenu's Link Sleuth. However, if a URL returns an HTTP 200 (OK) response, it may be accessible, but the contents of the page could have changed and may no longer be relevant. So manual checking links seems to be a must. Some web servers also return a soft 404, reporting to computers that the link works even though it doesn't. Bar-Yossef et al. (2004) [6] developed a heuristic for automatically discovering soft 404s.

Combating

There are numerous solutions for tackling broken links: Some work to prevent them in the first place, while others trying to resolve them when they have occurred. There are also numerous tools that have been developed to help combat link rot.

Authoring

Server side

User side

Web archiving

Main article: Web archiving

To combat link rot, web archivists are actively engaged in collecting the Web or particular portions of the Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. The goal of the Internet Archive is to maintain an archive of the entire Web, taking periodic snapshots of pages that can then be accessed for free via the Wayback Machine. In January 2013 the company announced that it had reached the milestone of 240 billion archived URLs.[11] National libraries, national archives and other organizations are also involved in archiving culturally important Web content.

Individuals may use a number of tools that allow them to archive web resources that may go missing in the future:

However, such preserving systems may encounter on and off service interruption so that the preserved URLs are not available now and then.[16]

See also

Further reading

Link rot on the Web

In academic literature

In digital libraries

References

  1. 1 2 Habibzadeh, P.; Sciences, Schattauer GmbH - Publishers for Medicine and Natural (2013-01-01). "Decay of References to Web sites in Articles Published in General Medical Journals: Mainstream vs Small Journals". Applied Clinical Informatics. 4 (4). doi:10.4338/aci-2013-07-ra-0055.
  2. Cegłowski, Maciej (9 September 2014). "Web Design: The First 100 Years". Retrieved 22 July 2015.
  3. 1 2 Zittrain, Jonathan; Albert, Kendra; Lessig, Lawrence (12 June 2014). "Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations". Legal Information Management. Retrieved 16 January 2015.
  4. Hennessey, Jason; Xijin Ge, Steven (2013). "A Cross Disciplinary Study of Link Decay and the Effectiveness of Mitigation Techniques". BMC Bioinformatics. Retrieved 16 January 2015.
  5. "All-Time Weblock Report". August 2015. Retrieved 12 January 2016.
  6. Bar-Yossef, Ziv; Broder, Andrei Z.; Kumar, Ravi; Tomkins, Andrew (2004). "Sic transit gloria telae: towards an understanding of the web's decay". Proceedings of the 13th conference on World Wide Web - WWW '04. p. 328. doi:10.1145/988672.988716. ISBN 158113844X.
  7. 1 2 Kille, Leighton Walter (8 November 2014). "The Growing Problem of Internet "Link Rot" and Best Practices for Media and Online Publishers". Journalist’s Resource, Harvard Kennedy School. Retrieved 16 January 2015.
  8. Rønn-Jensen, Jesper (2007-10-05). "Software Eliminates User Errors And Linkrot". Justaddwater.dk. Retrieved 5 October 2007.
  9. Tim Berners-Lee (1998). "Cool URIs don't change". Retrieved 7 October 2013.
  10. Mueller, John (2007-12-14). "FYI on Google Toolbar's Latest Features". Google Webmaster Central Blog. Retrieved 9 July 2008.
  11. "Wayback Machine: Now with 240,000,000,000 URLs | Internet Archive Blogs". Blog.archive.org. 2013-01-09. Retrieved 2014-04-16.
  12. "Internet Archive: Digital Library of Free Books, Movies, Music & Wayback Machine". Archive.org. 2001-03-10. Retrieved 7 October 2013.
  13. "Hiberlink". Hiberlink.org. Retrieved 15 January 2015.
  14. "Memento: Time Travel for the Web". Memento. Retrieved 15 January 2015.
  15. "Harvard University's Berkman Center Releases Amber, a "Mutual Aid" Tool for Bloggers & Website Owners to Help Keep the Web Available | Berkman Center". cyber.law.harvard.edu. Retrieved 2016-01-28.
  16. Habibzadeh, Parham (2015-07-30). "Are current archiving systems reliable enough?". International Urogynecology Journal: 1–1. doi:10.1007/s00192-015-2805-7. ISSN 0937-3462.

External links

The Wikibook Authoring Webpages has a page on the topic of: Preventing link rot
This article is issued from Wikipedia - version of the 11/23/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.