Why this Site?

  • Our Mission:
  • We exist to shine the light of scrutiny into the dark crevices of Wikipedia and its related projects; to examine the corruption there, along with its structural flaws; and to inoculate the unsuspecting public against the torrent of misinformation, defamation, and general nonsense that issues forth from one of the world’s most frequently visited websites, the “encyclopedia that anyone can edit.”
  • How you can participate:
  •  Visit the Wikipediocracy Forum, a candid exchange of views between Wikipedia editors, administrators, critics, proponents, and the general public.
  • 'Like' our Wikipediocracy page on Facebook.
  •  Follow Wikipediocracy on Twitter!

Press Releases

  • Please click here for recent Wikipediocracy press releases.

Google Search

Dusty, Forgotten and Neglected

An Analysis of Wikipedia’s Least-Updated Articles

Wikipedians often tout the website’s article count (currently over 5.7 million on the English version) as one of its advantages over traditional encyclopedias. Many of the pages included in this count are actually short “stubs,” lists, disambiguation pages, templates, and (depending on who you ask) redirects, but in addition, large numbers of article pages are simply added, often en masse by automated processes, and rarely updated or viewed by humans. The 500 most egregious examples of this are listed in a report, helpfully produced each week by Wikipedia developers, called “Forgotten articles” — and their fates (or lack thereof) may provide a preview of Wikipedia’s possible decline.

 

by T.M. Ming

The tracking of Wikipedia articles which haven’t been recently edited goes back to its use as a component of the Neglected articles effort in 2005. That was a general cleanup project though, and a report specifically listing the oldest articles in terms of most-recent edit did not appear until October 2008, when the first “Dusty articles report was run. This report was run every day, and it listed the 100 least-recently edited articles, ignoring redirects and eventually disambiguation pages.

It seems to have had some errors, and in February 2010 it stopped running altogether and was replaced by a new version of at the end of June. This version was run more or less weekly, and ran until June 2015. It was then replaced by the “Forgotten Articles” report which was first run in March 2012, though consistently accurate runs did not begin until April 2016 (and have been run weekly ever since). This report finds the 500 articles with the oldest most-recent update, excluding redirects and disambiguation pages (this last being not entirely accurate, as will be explained a bit later). In the initial report, the most recent update was on 16 April 2009, whereas the oldest update in the 3 Sept 2018 report (which was used as the basis for this analysis) was on 16 Dec 2009 — so no article appears on both lists.

Nonetheless, there is some commonality: There is a series of stubby articles on Swiss towns in both reports, all created by User:Geschichte, so that in the most recent report we find Dippishausen, and in the first report we find Uesslingen, both created on the same day. But the latter was tagged as unreferenced a month after that first report, and was then updated this Spring in a drive-by edit which was then reverted, so that it still reads now as it did in 2009.

The first report also shows a long run of place names in Senegal, all created by User:Dr. Blofeld, which were obviously templated in from a list generated elsewhere. These were all edited in late 2017 by a bot which objected to the formatting of the coordinates, thus taking them out of later listings. In many cases, this is the only update to these articles after their creation. That’s a theme that will recur in the analysis of the most recent report — a long series of cookie-cutter texts inserted by one user and largely ignored ever since.

The report lists the articles, their last-edit date, and the total edits done on each article. It does not list the creation date, but the edit dates put a low ceiling on that. The study report gives last-edit dates between 16 Dec 2009 and 1 Aug 2010; the median lies in June, which reflects a paucity of edits in the first two months of the year, but also long strings of edits on the same day to articles of the same type. The number of edits per article ranges from 2 to 49, and the mean is slightly higher than the median of 11, but a graph shows that few have over 20 edits. These are perhaps not so much “forgotten” as neglected.

To be fair, it’s also possible that there isn’t anything further to do with some of these articles. The biggest single chunk of articles by subject is a set of 106 lists of US Supreme Court decisions, by volume as recorded in the United States Reports, constituting 21% of the total. Most of these contain a table of cases, almost all red-linked, with the date and a link to a summary of the decision in openjurist.org under the docket number. Articles for those red links might eventually be written, if anyone summons up the will to do so — but unless there is some change in categorization, there may be no reason to make further changes to the article itself.

There are 21 other list articles in the report, including a set of ten surname articles which are essentially disambiguation pages without the tag, bringing this class to a bit over a quarter of all articles reported. The next largest subject class is for organizations, led off by 33 boilerplate articles on scouting by country and 30 on unions in more-or-less obscure places. Somewhat surprisingly, there are only eleven articles on companies. After the organizations, we have places, mostly towns in African countries or Switzerland, though there are 16 articles on mountains and other features in British Columbia.

At this point we’ve accounted for about two-thirds of the articles; the other groups are smaller, though there are runs of musical subjects (mostly albums), obscure mathematics, elections, and comic book characters. One subject which doesn’t make a significant showing is biography, with only six articles on people, taken extremely freely (many are mythological or legendary). There are also few articles on buildings, and none on bands.

Almost without exception there is little text in any of these articles: they tend to be either short stubs or contain mechanically-tabulated listings (all those albums again). They tend to lack references — indeed, one way they escape from the list is through being tagged for that very reason. Then there’s the matter of turnover. For this, I went back and identified the 32 articles (6.4% of the total) which dropped from the report during the preceding four weeks. Aside from the fact that all of the list articles dropped were for individual surnames, there doesn’t appear to be much commonality in terms of what these articles were about. What is more illuminating is the means by which they exited the list. Two were deleted (one by PROD), and one survived a deletion attempt; a fourth was turned into a redirect. The others were mostly minor edits: eight were tagged for some fault, two were untagged, four had category changes, and all but two of the rest had various copy-edits, either of the text or the markup. None of the edits were done by bots.

A month later, and thirty-five more articles have fallen off the list, and again this was not because of substantial improvements, nor through bot edits. Two articles had links added; three articles had references tagged as “dead.” Three Russian place names were detagged as no longer orphaned. One article had an abbreviation spelled out. Three had a wikilinks resolved to bypass redirects. One had a category added, and another had a category update due to a CfD. One had formatting changes on its images. Two were tagged for notability. One had a stub tag added. One was incorrectly tagged for a project in the article itself, which was reverted, though the reverter didn’t bother to move the tag to the talk page. One was converted to a redirect. One was converted from a name article to a proper disambiguation. One of the surname articles was cleaned up. One was tagged for references, which were added a week later; another had coordinates added through a GNIS link. A list of mayors for a town in Luxembourg had the incumbent added; he took office in 2011, but the article had been last updated the previous year. (All the entries are red-links, by the way.) One name list had a red-link delinked. Finally, one article which had been added to the end of the list as others were removed due to updates got some extremely minor edits from an IP user, and was removed.

As you can see, none of this was terribly substantial. No article saw significant expansion, and most of the changes could be classed as “housekeeping.” Of their replacements, the only article of any length is List of United States Supreme Court cases, volume 383, which like similar lists is a sea of red links, clearly generated from some listing.

It should be no surprise that the age of the oldest article in the report is increasing. Back when the first “dusty articles” report was run, the oldest article was about four years old; now, eight years later, the oldest article is twice that age, so that each year the oldest article reported is about six months further along than in the last. The processes for removing articles are not adequate to deal with the queue of articles that might be in need of updates. Of course, there is no guarantee that this linear aging will continue, and there have been short periods when the age of the oldest article did decrease, but never more than a few months.

All in all, the report provides a picture of the future of the lower end of Wikipedia, particularly as editing declines. Eventually, one has to think, the report will almost cease to change, as the few reasons people find to make minor updates to these articles will come up more and more rarely.

Comments are closed.