Proposal:Have WMF involved in archiving online citations
Please note: This proposal appears to be a duplication of meta:WikiScholar. - 16:24, 8 February 2011 (UTC)
The English Wikipedia alone is estimated to contain somewhere on the order of 17 million external links. They are pervasively used for referencing, for further reading, and countless other purposes; some Wikipedia readers have mentioned that they find the external links more useful than the actual articles. Unfortunately, dead external links (i.e WP:LINKROT) in citation templates are a major problem on Wikipedia, affecting its reliability and validity; the Internet Archive estimates that the average lifespan of a link is no more than 2 years, and so the number of dead links will grow significantly over the coming decades.
Many efforts are undertaken to combat this problem, but most of them rely on outside companies/non-profits/etc. that are not under our control and a far from an ideal solution.
The Wikimedia Foundation should be actively involved with this issue in an effort to have more control over this problem rather than being at the mercy of other institutions.
Initial financial cost estimates range from $1,600 to $15,000 for hardware and $70 to $500 per month for operations.
- See WebCiteBOT Replacement Task Force under WikiProject External links
- Local discussion is at village pump#Wikimedia_Foundation_adoption_of_WebCiteBot_7046
Do you have a thought about this proposal? A suggestion? Discuss this proposal by going to Proposal talk:Have WMF involved in archiving online citations.
Internet archive stop access to old archived pages when a robot stops access to the site. This means that web sites that die and are taken over suddenly have the old content removed by someone other than the original publisher. This happens quite frequently because the links to the sites have value for ads.
If we were involved in archiving we would be able to avoid this business. However we probably should still have a method for the original author of a site to redact content. Hopefully it should be very little used but the reasoning at Archive-it should be inspected and some sort of mechanism will almost certainly be needed. 126.96.36.199 22:27, 6 February 2011 (UTC)
I think we should archive content as soon as possible after a new citation is found and then put a mark on the citation so the archive can be viewed. This archive should not then be overwritten I believe by later archives of the citation unless an editor explicitly requests a new archive somehow, and in that case it should still be possible to get back to the earlier archive if the later one was requested wrongly. 188.8.131.52 22:27, 6 February 2011 (UTC)
Web page capture failure
Some content might be difficult to capture. If the archiver doesn't capture what an editor wants captured a backup plan for them might be to print the web page to pdf and upload that with some annotation to the archiver. 184.108.40.206 22:27, 6 February 2011 (UTC)
Want to work on this proposal?
- .. Sign your name here!