Why this Site?

  • Our Mission:
  • We exist to shine the light of scrutiny into the dark crevices of Wikipedia and its related projects; to examine the corruption there, along with its structural flaws; and to inoculate the unsuspecting public against the torrent of misinformation, defamation, and general nonsense that issues forth from one of the world’s most frequently visited websites, the “encyclopedia that anyone can edit.”
  • How you can participate:
  •  Visit the Wikipediocracy Forum, a candid exchange of views between Wikipedia editors, administrators, critics, proponents, and the general public.
  • 'Like' our Wikipediocracy page on Facebook.
  •  Follow Wikipediocracy on Twitter!

Press Releases

  • Please click here for recent Wikipediocracy press releases.

Weasel Words Worry Wales

By Gregory Kohs

Wikipedia co-founder Jimmy Wales, like many other people, is very disappointed with the current state of professional journalism. But unlike other people, Mr. Wales has the reputational clout to be able to launch his own crowd- and Google-funded news organization called Wikitribune (formerly, Jimmy Group Ltd). Even though people and corporations willingly donate money to Wikitribune (as if it were a non-profit organization) with no direct consumer services in return, it is a for-profit corporation owned wholly by Jimmy Wales.

Recently, Mr. Wales was found complaining on Medium.com, a platform he’s elected to co-opt because even after months of time and hundreds of thousands of dollars gathered, Wikitribune still hasn’t developed an online publishing platform of its own. Wales lamented how modern journalism, to him, no longer seems to produce evidence-based news stories. He asked, “How often do we see phrases like this in the news?

Weasel Words. Image: Commercial use allowed. No attribution required.

•’Experts claim’
•’Studies show’
•’Top officials say’
•’According to a person familiar with the matter'”

And he continued, “On Wikipedia, people call these ‘weasel words’ because they weasel out of telling you the unadorned truth. These kinds of formulations are misleading, and I believe they should be avoided wherever possible.”

One might conclude that Jimmy Wales is laying down the challenge for himself — that he’ll want his Wikitribune content to be more like Wikipedia, and less like the mainstream media. But before that path goes too far traveled, how about a little bit of evidence-based analysis on the premise itself? To wit, just how often *do* we see phrases like “experts claim” and “studies show” in journalism? And how often do we see those same phrases in Wikipedia? Assuming Wikipedia is better at avoiding weasel words wherever possible, the incidence of weasel words in Wikipedia should be much lower than what’s seen in mainstream news media.

We decided to whip out the old calculator and fire up the web browser to find out. We tracked our findings in a spreadsheet, which interested readers can review. Taking Wales’ first example of “experts claim” — that phrase, according to a 30-day search of Bing news service, appeared 246 times in the news media in the month prior to July 5th, which frankly is quite a lot. But, the phrase endures in Wikipedia article pages, too — 98 pages, to be exact. The ratio is even more of a concern for “studies show”, a phrase that hit the news 35 times in a month, but adorns some 2,375 article pages on the English-language Wikipedia. If Mr. Wales believes that we should avoid these weasel-word phrases wherever possible, it appears he has a lot of work ahead of him to clear his most famous project of the weaselly text.

It’s all relative

Admittedly, a ratio that compares the frequency of words or phrases found in both Bing news searches over 30 days and in Wikipedia articles that have been written over the course of years (and could have been altered at any given moment) is hardly scientific. It’s an arbitrarily-chosen measuring stick. Yet, there is a kernel of directional truth in such an evidence-based examination, isn’t there? A serious study would not only compare the incidence of “weasel words” in both news and Wikipedia mediums, it would also compare the incidence of ordinary or more “random” words in both platforms, in order to see if Wikipedia does sort of a better job or a worse job than the news media in terms of limiting weasel words. We collected some interesting findings in the data related to this investigation.

For example, there’s the conundrum of how to select a representative “random phrase” to even check incidence against Wikipedia, and then against the Bing news search. Well, Wikipedia does have a nifty feature called the “random article” button that produces… you guessed it, a random Wikipedia article. From there, how to find a random phrase? In this case, we thought it was impartial enough to jump to the second paragraph of the article, then begin at the fourth word (seems impartial enough?) and check for a phrase that would appear at least three times in both the English Wikipedia and in a Bing news search of the previous 30 days of news articles. Using this method, over and over again for 25 different phrases, we produced a corpus of “random phrases”, using Wikipedia as the source material. Thus, we obtained phrases like “arts degree in Psychology”, “fourth tallest”, and “he swam for”.

With this small corpus of 25 phrases, we were able to see which phrases are more likely to appear in mainstream news than in Wikipedia (e.g., “global nuclear”), along with those phrases that Wikipedia is much more likely to present to the reader than a news site would (e.g., “the three most” or “the development of”). All in all, though, we were interested in the average ratio of Wikipedia-to-Bing-news for a typical phrase. We removed the three lowest and three highest outliers from the list, to arrive at a mean ratio for 19 random Wikipedia phrases: 62.5 appearances in Wikipedia for every one appearance in Bing news.

Snapshot of our analysis. Source: Google Sheets.

But wait, that’s using Wikipedia as the base from which the random phrases are chosen. What if we used Bing news sources as the base to choose phrases? We tried that, too, using the same method of jumping to the second paragraph and trying phrases from the fourth word onward — and noticed quite a difference. Once again, there were phrases much more likely to appear in the news than in Wikipedia (things like “shout-out to” and “Asheville police”), and phrases that dominate more in Wikipedia than in the news (such as “confirms that” or “had hoped”). All in all, the mean ratio for this corpus of 19 random news phrases was much lower than the Wikipedia-based corpus: 6.4 appearances in Wikipedia for every one appearance in Bing news.

So, if we blend the two sources, we get an overall mean of 34.45. That’s how much more likely you’ll find an ordinary phrase in all of Wikipedia than in all of Bing’s news archives for the past 30 days.

What about those weasel words that Jimmy Wales warned us about? We generated a list of 25 such phrases as “analysts suggest”, “critics claim”, and “it has been found that”, and we started running the numbers. Weasel words that appear a lot more in the news than on Wikipedia include “according to insiders” and “according to sources”. But there are a good number of weasel words that appear on Wikipedia even more prolifically than we would expect based on the overall mean ratio (34.45) of ordinary phrases. The following phrases all surpassed the expected mean: “others say”, “studies show”, “according to officials”, “studies suggest”, “some people say”, “it has been claimed”, and “studies have found”. In fact, when you crunch the numbers the way we did, you’ll find that weasel-word phrases (with their 37.6 ratio) have just as much likelihood to appear on Wikipedia (than in the mainstream news) as any ordinary, randomly-selected phrase.

A qualitative difference

Granted, there is a crucial difference in intent and quality if a publication says “studies have found” but doesn’t elaborate on the specific study names or authors, versus one that does go on to cite the various studies by name. We predict that Wikipedia probably does a better job than the standard news of providing such detail — after all, Wikipedia has endnotes, most news stories do not. However, picking one weasel phrase from our analysis, “according to an unnamed source”, out of the sixteen Wikipedia articles where this phrase is used, fourteen have merely linked to a newspaper or magazine source that described an unnamed source. How is that any better than the news media, if Wikipedia is just regurgitating the news media? Of the remaining two Wikipedia mentions of “according to an unnamed source”, one has not been linked to any reference source at all, and one has provided a link, but one that is entirely broken and doesn’t point anywhere anymore.

Whatever the case, all of this talk about how bad weasel words are seems to lack a real understanding of how investigative journalism gets done. Would news journalism remain nearly as successful as it is, if every leaky human source was warned by the reporter, “you know, I’m going to have to publish your name, because I don’t want to use any weasel words”. How many Watergates, Enrons, and White House staffing shuffles would ever get reported soon enough to matter, if no weasels were willing to spill the beans on the record? One hopes that Jimmy Wales has figured out this problem, and that he’s hired an expert staff of seasoned journalists who will be able to simultaneously get the scoop and reveal their sources. But then again, judging by some of the earlier output of Jimmy’s crack team of reporters, there is considerable doubt about just how serious a news organization Wikitribune will actually be.

Maybe one of Wikitribune’s first stories when they finally start publishing could be an examination of just how weaselly Wikipedia has become, and how Wikitribune will accomplish greater triumphs of evidence-based journalism without the use of weasel words.

8 comments to Weasel Words Worry Wales

  • i can tell that these are not the cute kind of weasels. Nice analysis, Greg!

  • For me a new point of view, but interesting. I like the Grec analyses too, Zoloft, and I believe in the power of blog posts. Regards, Graaf Statler.

  • What’s the thesis of this piece again? It seems like you lost the message in the methodology — or at least I did. Certainly there are better lines of attack for the preposterous WikiTribune than incidence of a couple throwaway phrases uttered by St. Jimmy.

    We can’t analyze the disparity between theory and practice of WikiTribune on the matter of “weasel words” BECAUSE THERE IS AS YET NO WIKITRIBUNE — as you point out.

    We can analyze the disparity between theory and practice of JW claiming that his new cash cow is intended to be a news site with a hard agenda of debunking “fake news” on the one hand, while slowly hiring several grossly inexperienced and barely qualified part-timers to staff his still non-existent commercial news entity. He talks of “journalists” but has hired a couple of freelance writers and a political activist, by my count.

    He has intentionally, in the aftermath of Wikileaks-Wikipedia confusion, named is new “publication” in a manner designed to obfuscate the difference, thereby enabling him to personally cash in on donations that might ostensibly be targeted for WMF.

    THOSE are the obvious lines of attack.


  • Kingsindian

    A nice little calculation, clearly described. Reminds me of a “Fermi estimate”.

    Of course, not a proof, but sufficient to me to demonstrate that Wales doesn’t know what he’s talking about.

  • Vance Frickey

    The issue, of course, is that getting the weasel words out of Wikipedia articles is a volunteer activity. Someone’s got to figure its worth the work, personal time and effort to do so. Then, after you make the change, if another editor decides to revert it, that’s a content dispute, and a determined editor can go to the mat with you three times and make it a three-revert rule change, and only THEN do admins usually get involved.

    Jimmy Wales isn’t directly to blame for any of that, but he did sit benignly back and let wikipediocracy evolve to the point that wikipedia represents less the sum of human knowledge gleaned from secondary and tertiary sources, and more the outcome of political fights (what RfCs have devolved into). WP:OWN is a bad thing, supposedly, but no one enforces the guideline, and a clique can guard a given wikipedia article from any changes they don’t like – and if you don’t have the time available to go to AN/I and ask for a consensus there (or the clique who own that article is made of influential people) the WP:OWN is as much a dead letter as WP:CIVIL if a popular editor or admin’s involved.

  • “However, picking one weasel phrase from our analysis, “according to an unnamed source”, out of the sixteen Wikipedia articles where this phrase is used, fourteen have merely linked to a newspaper or magazine source that described an unnamed source. How is that any better than the news media, if Wikipedia is just regurgitating the news media? Of the remaining two Wikipedia mentions of “according to an unnamed source”, one has not been linked to any reference source at all, and one has provided a link, but one that is entirely broken and doesn’t point anywhere anymore.”

    That’s not a “bug”, that’s a “feature”. WP:THETRUTH counsels us to prefer verifiability in “reliable sources” to “truthiness” (the quality of seeming or being felt to be true, even if not necessarily true) – or the actual truth, apparently. The recent coverage of alleged Russian influence in last year’s US Presidential elections assures that the Wikipedia article “Russian interference in the 2016 United States elections” regurgitates what the larger newspapers and electronic media outlets say on the subject. Cite a non-majority source such as the Miami Herald which contradicts the consensus the “owners” of the article desire said, and suddenly a McClatchy affiliate in a major national market isn’t “notable” – or so the guideline-gamers say. And patrolling for dead links is what the rest of us do – we’re actually discouraged from making new articles by the short list of editors who we’re asked to consult before moving our articles into namespace.

    I’m actually ashamed of myself on that particular article. I dropped out because the article’s “owners” allowed a few of their own to make a travesty of WP:CIVIL and WP:WIAPA instead of standing up for those guidelines no matter who broke them – and it’s just not worth MY time and effort to point out that the guidelines apply to everyone (it seems that the minority viewpoint editors were lectured every time they broke a guideline). Since among wikipedia editors, Trump-hating is what the cool kids do and anyone who speaks up for objectivity gets called a Trump supporter, we’re going to very soon have wikipedia become a mirror site for Democrats.org.

    Three things occur after that: the perception of wikipedia’s reliability as a source of information (a) goes dramatically up among Democrats (and points left), (b) crashes among Republicans (and points right) and (c) begins to be unrelated to its actual reliability and heads back down to the wikipedia of the 1990s. Many of us who patrolled for dead links and worked free for Jimmy Wales to locate the Wayback Machine links for those sites are going to say “let the cool kids do it – not my problem any more”.

    • thekohser2

      But technically, shouldn’t verifiability command the editor to write instead of “according to an unnamed source” something along the lines of “according to a New York Gazette story that cited an unnamed source”?