By greybeard and Kelly Martin
For the largest audience, one has to be careful about the definition of the word “crowdsourcing“.
Wikipedia is a failed example of crowdsourcing, but there are also successful examples. The failure of Wikipedia as a crowdsourcing project is very interesting, but if one is to be — or are perceived as — decrying crowdsourcing more generally, one walks into a tarpit of contradictory evidence and conclusions that weaken one’s primary point.
Wikipedia’s model fails for a number of reasons. One we can call “entropy”. No fact on Wikipedia is ever fully-established. If we crowdsource (e.g.) a catalog of birds or a map of actual-vs-scheduled train times, then the facts are never (or seldom) in dispute. These projects rely on individual and precise datapoints submitted by individuals, either volitionally or automatically. The crowdsourcing of earthquake data on people’s phones is considered successful as well. While an individual can “game” that system, that data gets drowned out in the larger datastream and becomes “experimental error”. On Wikipedia no fact is ever final, no page ever complete, and the data is forever mutable, at the finest granularity. If someone enters that Ludwig van Beethoven was born in 1770, that fact is never locked down, and someone can change it at any time to 1707 or 1907 or 7707. As we know, people may patrol the page, but more sparsely watched pages can exist in erroneous states indefinitely. Entropy prevails.
A second reason Wikipedia’s model fails is also well-known here. Wikipedia attracts zealots, partisans, and extremists from all parts of the spectrum, each wishing to see his or her opinion (or “version of the facts”) memorialized in an “encyclopedia”. Thus we get partisans on nationalist topics, ones on matters of taste and morality, and even partisans on nominally scientific topics. Fanboys of particular popular culture are a subset here. Sometimes these partisans manage to control a set of topics, sometimes they simply make page after page of trivia concerning their obsession, and sometimes they simply war away indefinitely on established topics. In each of these cases the result is that Wikipedia skews away from commonly-accepted academic, historical, scientific, or cultural dogma on any given topic, and toward the extremes. In some cases — the warred-over pages — you get a kind of soft mush of opposing opinions of the “on the one hand / on the other hand” variety that no scholarly book would accept. Successful crowdsourced projects almost universally have some editorial control at the top. Linus Torvalds controls Linux absolutely. On Wikipedia, “WP:OWNership” prevails.
A third reason Wikipedia’s model fails is the lack of what in Computer Science we would call a “goal function“. There is no objective measure of the success of an online encyclopedia. As a result, Wikipedia substitutes available measures — completeness becomes number of pages, despite the lack of apparent correlation between the two. Success is also defined as page views and engagement is defined as number of editors. Any initiative, change, or environmental factor seen as likely to diminish those measures is swiftly defeated. Thus Wikipedia does not remove problematic biographies — because that would reduce the number of pages, and the perception of completeness. They will not prevent anonymous editing, because that would reduce the perception of user engagement. And they won’t institute safety practices like flagged revisions because that would make the site less mutable, and arguably diminish both page views and editors. So conservatism prevails.
None of these reasons (and there are others) — entropy, ownership, and conservatism — are endemic to crowdsourced projects. While they’re not unique to Wikipedia, it is arguably the largest and most visible project that suffers from them.
Finally, the popular and online media tend to conflate “crowdsourced” with “crowdfunded“. They are, of course, completely different. Once a crowdfunder parts with his or her money, their editorial control over the result is finished, except to the degree that they proselytize the product. They don’t contribute circuit design or software or artwork, and they certainly don’t war over it. Additionally, somewhat savvy large corporations have adopted PR and marketing campaigns under the rubric of “crowdsourcing”, which are basically contests that people enter, giving their free talent (such as it is) for a chance at winning something, whether a modicum of fame or something more tangible. This isn’t really crowdsourcing.
One needn’t look farther than the galaxy-identifying systems, the human OCR systems, the protein-folding systems, and other things like that to find good, useful, and successful examples of crowdsourcing, so one would be unwise to either paint those efforts with the same brush as Wikipedia, or weaken one’s argument by over-generalization.
The lack of an objective measure of quality is a key part and needs to be remembered. One of us has volunteered, in the past, for CoCoRaHS, a crowdsourced precipitation measurement project for the United States. Pretty much anyone can volunteer and submit observations after completing a very simple training program. The CoCoRaHS coordinators, who are qualified hydrologists (the coordinator for one area is a professor of meteorological sciences at a local university, and the state level coordinator is also the retired director of the state climate research center), gather the data and validate them by comparing them to one another, to National Weather Service observations, and to radar data to determine if the reported precipitation is “reasonable”. Someone who reports 4 inches of rain when their neighboring stations all report no rain will get an email from the coordinator asking for an explanation. Someone who reports 11 inches of rain in the middle of a hurricane, though, will probably not, but they will likely get mentioned in the monthly newsletter.
The algorithms used for this validation process are similar to the ones the NWS uses to validate its own datasets, and there is a solid mathematical basis behind them. Because of this procedural rigor, CoCoRaHS has been incredibly successful in gathering a large corpus of fairly reliable precipitation data for large parts of the US that were previously not well-covered, and that data is used by hundreds of organizations for all sorts of purposes. It’s an example of successful crowdsourcing. It works because it targets people who have an interest in the field, provides training and easy access to resources, makes the process simple and nonconfrontational for volunteers, and provides qualified professional resources to supervise volunteers and validate the quality of volunteer data. Contrast this to what Wikipedia does in these areas, and you’ll see why Wikipedia fails.
The problem isn’t crowdsourcing itself. Rather, it’s how Wikipedia does crowdsourcing.
(This blog post was originally published on February 23, 2015)