For more than six years, Wikipedia named an innocent man, Joe Streater, as a key culprit in the 1978–79 Boston College basketball point shaving scandal. Thanks to the detective work of Ben Koo at sports blog Awful Announcing, the world now knows (again!) that Joe Streater had no involvement in the affair. He couldn’t have, because he didn’t even play for the team in the 1978–79 season.
Entering the Wikipedia wormhole
In his article, Guilt by Wikipedia: How Joe Streater Became Falsely Attached To The Boston College Point Shaving Scandal, Ben Koo describes how he fell “down this wormhole” that ended at an anonymous Wikipedia edit made over six years ago.
It began like this: Koo had reviewed a 30 for 30 documentary on the Boston College point shaving scandal for Awful Announcing. In this review, he remarked on the curious fact that one of the four players eventually tied to the scandal wasn’t mentioned in the film at all.
This prompted a puzzled email inquiry from a former Boston College player who’d been involved in the affair: Which player did Koo mean? Koo replied that he had found it curious that Joe Streater hadn’t been mentioned in the documentary, given that all the articles he had read as part of his background research had named Streater as one of the sportsmen involved. The reply he got from the former Boston player astonished him:
“Joe Streater wasn’t even on the team that infamous year as he had left school the year before.”
At first, Koo was incredulous. How could this be? Streater was mentioned in Wikipedia and so many other articles on the web. But the player’s personal testimony could not be discounted: he’d been there. So Koo decided to investigate. He checked the Boston College Men’s Basketball Guide. Sure enough, Streater was only listed as a player in the 1977–78 season. The 1981 Sports Illustrated article that first broke the story did not mention Streater. Contemporaneous news clippings confirmed: Streater took part in only 11 games in the 1977–78 season, and after that never played for the team again. And finally it dawned on Koo: the reason Streater was mentioned in Wikipedia and in every other article he had read was – because it was in Wikipedia.
Koo tried to locate Streater; his searches were unsuccessful. But he established that Streater’s name had been inserted into the Wikipedia article on the scandal in August 2008, by an anonymous user using a mail.goodwillmass.org IP address. Koo satisfied himself that none of the books and press articles published on the incident before August 2008 ever mentioned Streater’s name. Yet since then, Streater had become widely associated with the scandal through newspaper and TV reports as well as countless blogs and fan sites. Even an Associated Press article, carried by Yahoo! for example, to this day mentions Streater as one of the culprits, among many other publications listed in Koo’s article.
Spurious Wikipedia facts entering other sources has grown so common that the process has become immortalized in a famous xkcd cartoon that coined the word “citogenesis” to describe it. People may think it’s a joke. It isn’t.
A recent blog post on this site covered a multitude of documented cases, from a Wikipedia article on a wholly fictitious war that won a Wikipedia quality award (and retained it for five years) to the invention of a new name for the coati: the “Brazilian aardvark”, memorably debunked in The New Yorker.
A week after our blog post appeared, E. J. Dickson at The Daily Dot reported on the Amelia Bedelia hoax – a piece of wholly spurious information she herself had added to Wikipedia five years ago, as a stoned sophomore, only to find it quoted on Twitter this summer by Jay Caspian Kang. Kang, ironically, is the science and technology editor at The New Yorker, demonstrating that not even the journalists in charge of the publication with the best reputation for fact checking in the world are immune to the Wikipedia bug.
Wikipedia insiders have long been aware of the “circular referencing” problem. The site has for years now had a dedicated policy section for such cases, WP:CIRCULAR. But when a “fact” has the stamp of approval of an authority like the Associated Press, who would doubt its veracity?
So fix it!
Wikipedians are usually quite sanguine about any errors found in Wikipedia, based on the true fact that once an error is identified, it can be corrected instantly, converting dismay into satisfaction. Wikipedia has been improved!
It’s an article of faith with Wikipedians that Wikipedia is “always improving”. But Koo noted that the article on the Boston College point shaving scandal had actually deteriorated in some ways over the years:
[…] one thing that sticks out is the original Wikipedia page back in 2007 listed several great sources including Porter’s book, court documents, and coverage from the Globe […]. Since 2008, the Wikipedia article has somewhat regressed in terms of sourcing and has this message at the very top now:
“This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (September 2008)”
Koo went on to make the salient observation that –
Streater’s name stayed attached to the scandal for over six years and would have likely persisted if not for the documentary’s airing and perhaps some of my outreach on the matter.
The standard Wikipedian response greeting the critic who points out an inaccuracy on Wikipedia is, “So fix it!” But the fact that an error can be easily fixed should not excuse a reference site from hosting it in the first place, for more than half a decade. After all, it is just as easy for an anonymous user to insert an error as it is to fix one!
This particular instance of libel took over six years to discover (in what was surely no coincidence, it was corrected by an anonymous IP editor the day before Koo’s article appeared). That it has been remedied now is little consolation to all the readers who read and believed it during the past six years, and to all the journalists who propagated it, compromising their reporting. And like other spurious facts spawned on Wikipedia, it is bound to live on on the internet for years to come.
Who tracks changes to Wikipedia?
Wikipedia’s volunteer contributors keep track of articles they have an interest in by means of a “watchlist” that alerts them to any recent changes to these entries. It’s a little-known fact that among the English Wikipedia’s 4.7 million articles, there are hundreds of thousands that no editor has on their watchlist. These articles, which are sitting ducks for subtle vandalism as well as the insertion of well-meaning, but erroneous content, are in a special category, Special:UnwatchedPages.
This category is inaccessible to ordinary users, and mostly inaccessible even to the site’s administrators, because it is truncated (possibly for performance reasons). Administrators can only see the first 3,000 entries in this alphanumerical list. The 3,000th entry is an article beginning with the characters “2000”.
In other words, the 3,000 entries that are visible to administrators don’t even reach the letter A. In fact, they don’t even reach the number 3, making the list pretty well useless for quality control purposes.
Based on extrapolation from other large article categories, the total number of articles nobody is watching is probably well over half a million. This August, an administrator suggested,
[…] the number is doubtlessly somewhere between 100,000 & one million.
These are articles that are on nobody’s watchlist. At all. But this is not the end of it, because in practice, users tend to have so many articles on their watchlists that they ignore most of their notifications. Other users may not log in for weeks; even the watchlists of retired users who have not checked them for months or years still exist, waiting for their owners to return. Bearing this in mind, the number of effectively unwatched articles is likely to be far greater still.
This is partly a consequence of the fact that while the number of Wikipedia articles continues to grow, the number of active Wikipedia contributors continues to drop. In May 2007, when the point shaving article was created, the English-language Wikipedia contained 1.7 million articles and had 4,736 “very active” contributors (defined as contributors making more than 100 article edits a month). By August 2014, the number of articles had risen to 4.7 million, while the number of “very active” contributors had dropped to 3,130. While there were 2.8 very active editors per 1,000 articles in 2007, there are less than 0.7 now – the ratio has dropped to less than a quarter of what it was.
There is one other safety net against subtle vandalism: Wikipedia’s recent changes display, showing all edits made to Wikipedia as they occur. But recent changes patrollers often review hundreds of edits an hour. They do not have the time to check sources, and generally catch only edits that are very obviously problematic, even to someone who has no familiarity with an article’s subject matter. Many edits are never looked at, simply because they’re coming in so thick and fast.
The German-language Wikipedia and a few others have a system, known as “Pending Changes” or “Flagged Revisions”, whereby any edit from an IP address has to be checked by a more experienced contributor before it is accepted and displayed to the public. This system might well have stopped the unsourced addition of Streater’s name, for example. But the English Wikipedia chose not to implement this system, fearing that it might create bottlenecks and reduce participation. It traded reliability for quantity.As accurate as Britannica?
Even in the face of mounting evidence to the contrary, tech writers are still fond of citing a 2005 “study” by Nature that found “Wikipedia (almost) as accurate as Britannica”. No one seems to remember these days that the Nature piece was not a rigorous peer-reviewed study, but a journalistic piece that only looked at a small sample of articles on science topics – including some fairly obscure ones like the “kinetic isotope effect” or “Meliaceae” that might just elude the grasp of the average vandal. And in some cases, Nature compared excerpts of Britannica articles to their Wikipedia counterparts, and then counted “omissions” in Britannica as errors – even though the “missing” facts were contained in article sections Nature had discarded. As Britannica pointed out in their rebuttal:
One Nature reviewer was sent only the 350-word introduction to Encyclopædia Britannica’s 6,000-word article on lipids. For Nature to have represented Britannica’s extensive coverage of the subject with this short squib was absurd, and it invalidated the findings of omissions alleged by the reviewer, since those matters were covered in sections of the article he or she never saw.
Nature rejected these complaints, many of which hinged on fine points of detail and emphasis. And Encyclopædia Britannica did acknowledge there had been some errors on its pages. But while Britannica may be imperfect, it is quite safe to say that it did not and does not contain false information inserted by anonymous people for fun or for financial gain, that it does not contain anonymous hatchet jobs written by people’s rivals, and that it is not full of puff-pieces companies and individuals have written about themselves.
Quality problems? What quality problems!
One response that critics of Wikipedia often face when they point out errors in Wikipedia is this: that nobody ever claimed that Wikipedia is perfect, and that people are regularly warned that Wikipedia may contain errors. It has a disclaimer! Everyone knows that it is free, crowdsourced and curated by volunteers who work on whatever it is they feel like working on.
Spokespersons for the Wikimedia Foundation (WMF), the non-profit organization operating the Wikipedia website, rarely express dissatisfaction with Wikipedia’s lack of reliability. Its public messages focus on the feel-good factor of its vision statement: “Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.” Studies pointing to reliability problems are commonly rubbished among Wikipedians, while those reporting positively are accepted uncritically and highlighted.
Jimmy Wales for example recently tweeted, in response to a Twitter user making a dismissive statement about Wikipedia’s reliability,
Actually academic studies show we are about as accurate as traditional encyclopedias and improving all the time.
At the recent Wikimania conference, a slide triumphantly announced a survey finding that British people now trust Wikipedia more than news organizations (the irony here being that Wikipedia articles are commonly based on news articles). Wikipedians cheered.
It often feels like this: if you complain about reliability problems, you are told you have no right to expect perfection, and moreover, you can fix them yourself; while in all other contexts, Wikipedia is styled as the first real wonder of the digital world and one of the greatest advances in human history. There is a distinct whiff of self-serving doublethink about this attitude.
Where does the Wikimedia Foundation stand on reliability?
Content reliability has not really been a discernible priority for the Wikimedia Foundation for a long time. In fact, to this day, the WMF does not even measure content quality – their staff admit freely that they have no idea how to do it. Instead, the Foundation measures and reports quantitative metrics such as the number of volunteer editors, articles, article edits and page views. In that sense it is no different from Facebook, except that it has another metric: the amount of donations flowing into its coffers.
With content generation and quality control being left squarely in the hands of Wikipedia’s unpaid, self-selecting and largely anonymous volunteers, the Wikimedia Foundation sees itself merely as a “technology and grantmaking organization”. Its priorities currently are to expand its software engineering staff and modernize the user interface of its sites, especially for mobile users, in order to prevent readers from flocking to rival portals and providers like Wikiwand that offer the same free, Creative Commons-licensed Wikipedia content in a more visually appealing setting. The Wikimedia Foundation’s newly-hired Vice-President of Engineering, Damon Sicore, put it like this in his first IRC Office Hour:
I see us having to scale to a size that enables us to compete with the engineering shops that are trying to kill us. That means we need to double down on recruiting top talent, and steal the engineers from the sources they use… because… well… they are REALLY GOOD.
While the Wikimedia Foundation’s fundraising banners say they need funds to keep Wikipedia “online and ad-free”, I reckon that this is where most of the money raised in the fundraising drives will be going.
Ben Koo – whose article is a phenomenal read – pointed out that the IP address who added Streater’s name to the Wikipedia article on the Boston College point shaving scandal had not made any edit since 2009. That is so. But mail.goodwillmass.org has other IP addresses, too. One of them is 188.8.131.52. On 3 July 2013, it changed the Massachusetts Lottery win of Whitey Bulger, a Boston crime figure who once ranked next to Osama bin Laden on the FBI’s most-wanted list, from $14 million to $69 million, contradicting the cited source.
That change didn’t last six years. This time, it was just three weeks.