Why this Site?

  • Our Mission:
  • We exist to shine the light of scrutiny into the dark crevices of Wikipedia and its related projects; to examine the corruption there, along with its structural flaws; and to inoculate the unsuspecting public against the torrent of misinformation, defamation, and general nonsense that issues forth from one of the world’s most frequently visited websites, the “encyclopedia that anyone can edit.”
  • How you can participate:
  •  Visit the Wikipediocracy Forum, a candid exchange of views between Wikipedia editors, administrators, critics, proponents, and the general public.
  • 'Like' our Wikipediocracy page on Facebook.
  •  Follow Wikipediocracy on Twitter!

Press Releases

  • Please click here for recent Wikipediocracy press releases.

Google Search

Wikipedia: Sources & Methods

How tweet it is…

by sashi

It all started when I noticed a badly-spun tweet being added to a biography on Wikipedia, sourced to a click-baity headline from Politico. Now, a month later, the decontextualized tweet has been removed after much discussion, and an exclusive article the subject of the biography had written for the Daily Mail has been disappeared without any discussion. The biographical entry remained on full-protect lockdown all throughout, because earlier manipulation of the article had led to bad press for Wikipedia and an Arbitration Committee case.1

This affair — along with recent highly-publicized furors about public figures’ pithy snark — got me wondering just how many tweets were sufficiently notable to be included in Wikipedia. A fellow exile taught me the proper syntax for searching inside of citation templates (insource:”web.site”), and ever since I’ve enjoyed watching the unexpected portrait of an elephant emerge as I investigate the source-linking data.

Blind monks examining an elephant, Hanabusa Itchō (1652–1724)

There were 35,735 links to Twitter in the elephant’s belly that day. Since then, it has been fed just under a dozen tweets a day, so by now the number will have grown to over thirty-six thousand. No worries, though: the internal pressure has simultaneously been reduced each day by shedding a half-dozen references to the Daily Mail. (This is because 50 people back in February 2017 decided that publication should be banned from Wikipedia, at least in part because of their click-baity headlines.)

The English-language Wikipedia indulges in tweets much more than most other languages do. While the Spanish Wikipedia does link to Twitter almost 30% as often, both the German and French Wikipedias have limited themselves to fewer than a tenth of the Twitter-links currently included in the English site.

But just how important a source is Twitter, with those 36,000 links, in the pecking order of sources on the English Wikipedia overall?

(Note: Comparisons have been rounded to the nearest whole number, and are based on statistics generated during the July-August 2018 time-frame.)

WikiSource Index

  • Factor by which theguardian.com is more linked to than twitter.com: 3
  • Factor by which nytimes.com is more linked to than theguardian.com: 2
  • Percentage by which links to nytimes.com outnumber its nearest competing news outlet (bbc.co.uk): 31
  • Factor by which the total number of links to nytimes.com is dwarfed by the most frequently linked source (archive.org): 7
  • Ratio of twitter.com links to links to the Australian educational domain (edu.au): 1:1
  • Percentage by which youtube.com is more frequently pointed to than the UK educational domain (edu.uk): 23
  • Ratio of links to tronc.com papers as compared to twitter.com: 7:2
  • Percent of the 124,483 links to tronc.com newspapers represented by the LA Times : 50
  • Percent of the 124,483 tronc.com targets that Europeans can read without using a proxy: 0
  • Percent of NY Daily News photographers fired in tronc.com‘s July 2018 budget slash: 100 (§)
  • Number of links to the Church of England’s website: 660
  • Relative frequency of links to the The Daily Beast and to the C of E: 8:1
  • Number of links (in thousands) to 3 Mormon organizations: 12
  • Factor by which this exceeds the number of links to vatican.va: 3
  • Number of links (in thousands) to 3 brands located at the same address in Lehi, UT: 49
  • Number of links (in thousands) pointing to 1 company at 1 Hacker Way, Menlo Park: 56
  • Number of links (in thousands) pointing to 1 company at 1 Infinite Loop, Cupertino: 31
  • Ratio of apple.com links to:
    • links targeting samsung.com: 60:1
    • links targeting ten of its legacy competitors: 2:1
    • links to Amazon: 1:2
    • links to Amazon if the Internet Movie Database is added to Amazon’s numbers: 2:9
    • links to Google: 1:20
  • Relative population of the People’s Republic of China (CN) and the Republic of China (TW): 59:1
  • Relative frequency of links to gov.cn and to gov.tw: 2:1
  • Factor by which Google Books is more frequently linked to than the Library of Congress: 15
  • Percentage by which links to Breitbart exceed the number of links to icij.com (the consortium responsible for the Panama Papers & the Paradise Papers): 244
  • Relative population of Russia (RU) and the Ukraine (UA): 16:5
  • Relative frequency of links to gov.ru and to gov.uk: 2:5
  • Percentage by which links to nato.int outnumber links to tass.com: 35
  • Relative population of India (IN) and Canada (CA): 36:1
  • Relative frequency of links to gov.in and to gc.ca: 1:1
  • Relative population of Singapore (SG) and Bangladesh (BD): 1:29
  • Relative frequency of links to gov.sg and to gov.bd: 3:2
  • Number of links (in thousands) by which a dozen video-game sites taken together surpass the entire UK government domain (gov.uk): 3
  • Factor by which the number of links to the US armed forces exceed those pointing to the House, Senate, White House & Supreme Court: 2
  • Factor by which the number of links to the 10 most cited social media sites exceed those pointing to the US armed forces: 7
  • Percentage by which links to these same 10 sites outnumber links to the nytimes.com: 30
  • Number of Wikipedia entries (in thousands) tagged as completely unsourced: 1962
  • Number of times wiki/Wikipedia:Wikipedia_Signpost was cited in a template in mid-August 2018: 18
  • Number of times both Wikitribune and The Gateway Pundit were (a bit earlier in August): 7
  • Number of times Wikipediocracy was: 3
  • Number of days Wikipedia pointed to its page “Enemy of the People” in a special “see also” section of a public official’s BLP: 14 (§)

Social Media

This is what the relevant Wikipedia editorial guideline has to say about blogs, tweets, Facebook posts, and other user-generated content:

[S]elf-published media are largely not acceptable. Self-published books and newsletters, personal pages on social networking sites, tweets, and posts on Internet forums are all examples of self-published media.

[…]

Content from websites whose content is largely user-generated is also generally unacceptable. Sites with user-generated content include personal websites, personal blogs, group blogs, internet forums, the Internet Movie Database (IMDb), Ancestry.com, content farms, most wikis including Wikipedia, and other collaboratively created websites.

Wikipedia:Identifying Reliable Sources (commonly abbreviated WP:RS)

The four sites specifically mentioned above (Twitter, IMDb, ancestry.com, and Wikipedia itself) are all among the 100 most frequently linked sites on English Wikipedia — ranked #13, #37, #91 and #19 respectively, according to an insource Wikipedia query. Wikipedia links to itself much more often than any of the others, but not necessarily as a reference. Upon looking at the occurrences turned up by that insource query, one forum member characterized them as:

A collection of stuffed up templates, hidden comments, and circular citations, with a few bollocked internal links in the wrong format for good measure!

Dysklyver: source

Die Zwitscher-Maschine, Paul Klee, 1922

It is primarily — though not exclusively — for this reason that this source has been struck through in the spreadsheet accompanying this article (like so: Wikipedia). Google says Wikipedia points to itself 15.8 million times, which is much more realistic, given the syntax of [[internal links]]. This sort of Wikipedia-as-social-media-source is frequently gamed, as the last member of the index above (“enemy of the people”) suggests. Other social media addresses, like those in the next paragraph, lead to citation templates, rather than to hidden comments, so I consider the results from the insource:”wikipedia.org” search to be an anomaly due to the internal linking syntax.

The top-ten social media sources on the Index represent nearly 300,000 links, and adding such sites as LiveJournal (3,111), Google Groups (2,961), Baidu (2,541), Medium (2,472), Wikia (1,864), Reddit (1,815), Patheos (732), TV Tropes (587), Deviant Art (547), and Yelp (532) should help to bring that figure over the 300K mark quite soon. So it would seem there is either a problem with Wikipedia’s sourcing, or a problem with the policy not reflecting current practices.

But I’ve said enough: I would prefer that you save some energy for the source-list! In it, you will find answers to all your sourcing data questions. What’s more, it’s free, and someday I might even transform it into a sortable Wiki-table, or copy it into a google doc, so it can be sorted by country and keyword.

So, to whet your appetite:

  • Which sport do you think is the best represented on Wikipedia? Basketball? Football! It’s football already! Insider baseball? Cricket? Curling? Watch for the pale yellow highlights as you scroll.
  • Which churches? Scientology? AME? Watch for the pale orange-rose highlights.
  • Which benefactors? GlaxoKleinSmith? Mozilla? Goldman Sachs? Apple? Google?3
  • Which American university is the most linked to? Harvard? UC Berkeley? Brigham Young? Stanford? MIT? Watch for the green highlights.

References

The complete list of Wikipedia sources studied so far (categorized version). Feel free to comment on the article with any major (or minor) oversights (the latter version helps to spot them).

Methodology

For each website, three queries have been run: an insource:”web.site” search at Wikipedia (article namespace), a basic search at Wikipedia for “web.site” (article namespace), and a search at Google for site:en.wikipedia.org +”web site”. This last search includes all namespaces that Google is allowed to search, including most article talk pages and their archives, & the Wikipedia namespace. Throughout this article I’ve spoken about “links” and not “references” because this is a brute search for a string of characters: some occurrences of the string may appear in the “source” field of a reference, while another may occur in the “url” field. Insofar as the search process is far from infallible I’ve avoided searching for terms that could occur frequently in-text (e.g. scoop.it), though the data concerning some sites like gov.in should be viewed with some skepticism.

It is well, too, to keep in mind that you can’t query the same Wikipedia twice: on average, 32 citation templates per hour are added to the site.

Finally, with the exception of Oxford & Cambridge university presses (included for the sake of comparison), I have only searched for web addresses. Many of the best entries are referenced primarily to published books which, regardless of publisher, tend to be linked to Google books. (A look at the citations for the featured entry on the Balfour Declaration should serve as a useful point of comparison.)

Postscript

On June 18th 2018, tronc.com announced that the sale of the LA Times and the San Diego Union-Tribune had been completed, arguing that the relief from pension liabilities put them in a considerably stronger financial position. On August 24, 2018, Europeans still landed on the GDPR page hosted by tronc.com if they tried to follow any links.

Footnotes

1 A BBC interview with George Galloway (§); the ArbCom evidence phase of the Philip Cross “BLP issues” case (§); coverage of the case on the Wikipediocracy forum (§); and an “outlaw” thread about the Daily Mail (HTD headgear recommended) (§).
2 According to “Category: All Articles lacking source” (§).
3 Anony-zakat (§).

Zeitgeist: Opération Blockhaus

Peter Hitchens, Spectator, “War of Words: my battle to correct Wikipedia” (§), August 2018.

Annalisa Merelli, Quartz, “Seeking Disambiguation: Running for office is hard when you have a porn star’s name. This makes it worse” (§), 18 August 2018

Acknowledgements

I would like to thank the members of the Wikipediocracy forum, the Wikipedia Sucks (and so do its critics) forum, and the Gender Desk blog for either encouraging me in this enterprise, offering suggestions on earlier drafts, or for including sources in their posts that ended up being among those listed in this study. Finally, I would like to close by acknowledging that, concerning methods, I have left much of the work to the imagination of the reader.

3 comments to Wikipedia: Sources & Methods

  • […] would like to thank those members of Wikipediocracy, (where this article is also hosted), of Wikipedia Sucks (and so do its critics), and of the Gender Desk blog who, either encouraged me […]

  • Kingsindian

    Lots of good work, sashi.

    I looked at the links to major UK media, sorted in descending order of trust in the latest Reuters poll. You might want to add these to your dataset. (All queries are of the form insource:”newsoutlet”)

    bbc.co.uk and bbc.com: 189,000
    itv.com: 4253
    channel4.com: 4608
    thetimes.co.uk: 6804
    news.sky.com: 3026
    theguardian.co.uk and theguardian.com: 108,660
    independent.co.uk: 50,000
    telegraph.co.uk: 61,000
    huffpost.co.uk: 3000
    mirro.co.uk: 7,387
    thecanary.co: 21
    dailymail.co.uk: 27,219
    thesun.co.uk: 5,508

  • Thanks for the feedback, Kingsindian. I’ve updated the categorized version (now at 15 pages) to include all of these. I should probably delete the title rows and resort the spreadsheet by raw numbers again (maybe I’ll get that done today). Discussion about the HuffPost on the forum led me to look into its cross-linguistic & cross-wiki-circulation of links in articles (§). Similarly, I had a look at how Wiki-labels might be predictive of “circulation”. As a general rule (Brazil so far being the main exception), regardless of physical circulation numbers, center left publications appear to have the highest currency in terms of WP linking. (Germany / Japan)