Page view -to- Article length ratios

User avatar
thekohser
Majordomo
Posts: 13410
kołdry
Joined: Thu Mar 15, 2012 5:07 pm
Wikipedia User: Thekohser
Wikipedia Review Member: thekohser
Actual Name: Gregory Kohs
Location: United States

Page view -to- Article length ratios

Unread post by thekohser » Thu Jan 19, 2017 4:53 pm

Are there any Wikipedia tech gurus who would be able to run an analysis of all Wikipedia articles, taking the number of average daily page views per article (throwing out the weird outliers that get something like a million views one day, then twelve views the next day), and dividing that by the length of the article in words or in bytes?

I would be interested to know which are the articles on Wikipedia that some editors have greatly expanded to enormous lengths, but nobody cares to read them. And likewise, which articles get tons of traffic, but editors have barely fleshed them out.
"...making nonsensical connections and culminating in feigned surprise, since 2006..."

User avatar
Randy from Boise
Been Around Forever
Posts: 12277
Joined: Sun Mar 18, 2012 2:32 am
Wikipedia User: Carrite
Wikipedia Review Member: Timbo
Actual Name: Tim Davenport
Nom de plume: T. Chandler
Location: Boise, Idaho

Re: Page view -to- Article length ratios

Unread post by Randy from Boise » Thu Jan 19, 2017 7:20 pm

VIEWS (#) / LENGTH (bytes)

How is "views per byte" meaningful?

Doesn't every popular article have a high "views per byte" number?

Doesn't every unpopular article have a low "views per byte" number?


An article with a million views in a year that is 100K long would be 10,000 views per byte.

An article with ten views in a year that is 100K long would be 0.1 views per byte.

That's just another way of saying that an article with 1 million views gets 100,000 times more views than one with 10 views.

So what? What hypothesis are you trying to test?


RfB

User avatar
thekohser
Majordomo
Posts: 13410
Joined: Thu Mar 15, 2012 5:07 pm
Wikipedia User: Thekohser
Wikipedia Review Member: thekohser
Actual Name: Gregory Kohs
Location: United States

Re: Page view -to- Article length ratios

Unread post by thekohser » Thu Jan 19, 2017 8:40 pm

Randy from Boise wrote:VIEWS (#) / LENGTH (bytes)

How is "views per byte" meaningful?

It would suggest where readers are being "over-served" and "underserved" by content versus the demand for said content.

Doesn't every popular article have a high "views per byte" number?

Doesn't every unpopular article have a low "views per byte" number?

Have you ever seen a scatter plot? Note, it may be necessary to multiply or divide one of the measures by a constant, so that the ratios become more meaningful.
Appropriate example for the shoe salesman...

Image

Wouldn't you be curious to learn more about the 54-inch person who wears a size 7 shoe? Better yet, the 51-inch person in the size 12 shoe? How about the 80-inch person who squeezes into a size 10 -- maybe they lost their toes to frostbite?

My dear Tim, I am not interested in the fat middle of the distribution. I am interested in learning more about the outliers.

Here's an example on Wikipedia:

The Empty Child (T-H-L) is about a Doctor Who episode. It is 22,046 bytes long. In a day, it receives about 240 page views. So, about 92 bytes of content per daily visitor.

IP address (T-H-L) is about the numerical label assigned to Internet devices. It is 28,921 bytes long, somewhat longer than The Empty Child. But each day, this article receives about 5,100 page views -- twenty-one times more than the article about a TV show episode.

While these are just examples that I pulled manually, it tends to inform us that (perhaps) the content and quality of the IP address article is something that Wikipedians should be encouraged to tend to, since so many more Internet users are craving an explanation of this phenomenon. However, Wikipedians don't want to be bothered with such arguments, because they are too busy building out articles about individual episodes of Doctor Who.
"...making nonsensical connections and culminating in feigned surprise, since 2006..."

User avatar
thekohser
Majordomo
Posts: 13410
Joined: Thu Mar 15, 2012 5:07 pm
Wikipedia User: Thekohser
Wikipedia Review Member: thekohser
Actual Name: Gregory Kohs
Location: United States

Re: Page view -to- Article length ratios

Unread post by thekohser » Thu Jan 19, 2017 8:54 pm

Another interesting comparison is to look at the size of the articles about the Cleveland Browns (T-H-L) and the Detroit Lions (T-H-L). I mean, really. They are both NFL teams that have been around for more than seven decades, both comparably hapless in performance on the field. The Browns article does get almost (but not quite) two times the viewership as the Lions article. Yet, Cleveland's article is nearly 8 times longer than Detroit's article. Why?

Your response may be, "Who cares? They both suck!" But, then, aren't we inspired by the Free Culture Movement to strive for knowledge, for its own sake?
"...making nonsensical connections and culminating in feigned surprise, since 2006..."

User avatar
Poetlister
Genius
Posts: 25599
Joined: Wed Jan 02, 2013 8:15 pm
Nom de plume: Poetlister
Location: London, living in a similar way

Re: Page view -to- Article length ratios

Unread post by Poetlister » Thu Jan 19, 2017 9:59 pm

If you want to look up something in Wikipedia, you probably won't know in advance how long it is. At one time, the longest article on the site was a list of hundreds of people who could claim the throne of England if everyone above them died. That article is now vastly shorter. Is it getting more or fewer views than before?
"The higher we soar the smaller we appear to those who cannot fly" - Nietzsche

User avatar
lilburne
Habitué
Posts: 4446
Joined: Thu Mar 15, 2012 6:18 pm
Wikipedia User: Nastytroll
Wikipedia Review Member: Lilburne

Re: Page view -to- Article length ratios

Unread post by lilburne » Fri Jan 20, 2017 12:42 am

Poetlister wrote:If you want to look up something in Wikipedia, you probably won't know in advance how long it is. At one time, the longest article on the site was a list of hundreds of people who could claim the throne of England if everyone above them died. That article is now vastly shorter. Is it getting more or fewer views than before?
This.

People do a Google search and get a link to wiki crap, they click the link, they locate what they were searching for and move on. What you need to look at is the time that people spend on each page and whether the longer the article is the more time they spend on it. I suspect that there is very little variation in time between a 100K and a 10K page. That page size does not correlate to length of time people spend on it.
They have been inserting little memes in everybody's mind
So Google's shills can shriek there whenever they're inclined

User avatar
Randy from Boise
Been Around Forever
Posts: 12277
Joined: Sun Mar 18, 2012 2:32 am
Wikipedia User: Carrite
Wikipedia Review Member: Timbo
Actual Name: Tim Davenport
Nom de plume: T. Chandler
Location: Boise, Idaho

Re: Page view -to- Article length ratios

Unread post by Randy from Boise » Fri Jan 20, 2017 4:38 am

thekohser wrote:
Randy from Boise wrote:VIEWS (#) / LENGTH (bytes)

How is "views per byte" meaningful?

It would suggest where readers are being "over-served" and "underserved" by content versus the demand for said content.

Doesn't every popular article have a high "views per byte" number?

Doesn't every unpopular article have a low "views per byte" number?

Have you ever seen a scatter plot? Note, it may be necessary to multiply or divide one of the measures by a constant, so that the ratios become more meaningful.
Appropriate example for the shoe salesman...

Image

Wouldn't you be curious to learn more about the 54-inch person who wears a size 7 shoe? Better yet, the 51-inch person in the size 12 shoe? How about the 80-inch person who squeezes into a size 10 -- maybe they lost their toes to frostbite?

My dear Tim, I am not interested in the fat middle of the distribution. I am interested in learning more about the outliers.

Here's an example on Wikipedia:

The Empty Child (T-H-L) is about a Doctor Who episode. It is 22,046 bytes long. In a day, it receives about 240 page views. So, about 92 bytes of content per daily visitor.

IP address (T-H-L) is about the numerical label assigned to Internet devices. It is 28,921 bytes long, somewhat longer than The Empty Child. But each day, this article receives about 5,100 page views -- twenty-one times more than the article about a TV show episode.

While these are just examples that I pulled manually, it tends to inform us that (perhaps) the content and quality of the IP address article is something that Wikipedians should be encouraged to tend to, since so many more Internet users are craving an explanation of this phenomenon. However, Wikipedians don't want to be bothered with such arguments, because they are too busy building out articles about individual episodes of Doctor Who.
Obfuscate much?

RfB

User avatar
Zoloft
Trustee
Posts: 14122
Joined: Wed Mar 14, 2012 11:54 pm
Wikipedia User: Stanistani
Wikipedia Review Member: Zoloft
Actual Name: William Burns
Nom de plume: William Burns
Location: San Diego

Re: Page view -to- Article length ratios

Unread post by Zoloft » Fri Jan 20, 2017 7:45 am

Image

My avatar is sometimes indicative of my mood:
  • Actual mug ◄
  • Uncle Cornpone
  • Zoloft bouncy pill-thing


User avatar
Johnny Au
Habitué
Posts: 2620
Joined: Fri Jan 31, 2014 5:05 pm
Wikipedia User: Johnny Au
Actual Name: Johnny Au
Location: Toronto, Ontario, Canada

Re: Page view -to- Article length ratios

Unread post by Johnny Au » Fri Jan 20, 2017 4:00 pm

Even if the Cebuano Wikipedia were twice the size of the English Wikipedia, very much nobody would read it.

User avatar
Poetlister
Genius
Posts: 25599
Joined: Wed Jan 02, 2013 8:15 pm
Nom de plume: Poetlister
Location: London, living in a similar way

Re: Page view -to- Article length ratios

Unread post by Poetlister » Fri Jan 20, 2017 4:35 pm

Johnny Au wrote:Even if the Cebuano Wikipedia were twice the size of the English Wikipedia, very much nobody would read it.
Right, so it would get very few page views. But we're only discussing the English site here.
"The higher we soar the smaller we appear to those who cannot fly" - Nietzsche

User avatar
Randy from Boise
Been Around Forever
Posts: 12277
Joined: Sun Mar 18, 2012 2:32 am
Wikipedia User: Carrite
Wikipedia Review Member: Timbo
Actual Name: Tim Davenport
Nom de plume: T. Chandler
Location: Boise, Idaho

Re: Page view -to- Article length ratios

Unread post by Randy from Boise » Fri Jan 20, 2017 4:59 pm

Let me save people the time of constructing a methodology and gathering and summarizing data.
Conclusion: There are many long articles on Wikipedia that get few readers. There are a few very popular articles that are not long.
Take, for example, Socialist Party of Washington (T-H-L) — weighing in at a hefty 116.8K and attracting a paltry 2,404 hits during all of 2016.

Indeed, literally every single popular article dwarfs the abysmal 0.02 Kohs Score™® for that piece.

We really have to stop paying content people by the word, it is not cost-effective for Wikipedia.

RfB
Last edited by Randy from Boise on Fri Jan 20, 2017 5:03 pm, edited 1 time in total.

User avatar
thekohser
Majordomo
Posts: 13410
Joined: Thu Mar 15, 2012 5:07 pm
Wikipedia User: Thekohser
Wikipedia Review Member: thekohser
Actual Name: Gregory Kohs
Location: United States

Re: Page view -to- Article length ratios

Unread post by thekohser » Fri Jan 20, 2017 5:02 pm

lilburne wrote:People do a Google search and get a link to wiki crap, they click the link, they locate what they were searching for and move on. What you need to look at is the time that people spend on each page and whether the longer the article is the more time they spend on it. I suspect that there is very little variation in time between a 100K and a 10K page. That page size does not correlate to length of time people spend on it.
Lilburne makes an excellent point.

I am still interested in some Wiki-coding nerd who has the ability and access to get me what I'm looking for. In the interest of social science.
"...making nonsensical connections and culminating in feigned surprise, since 2006..."

User avatar
thekohser
Majordomo
Posts: 13410
Joined: Thu Mar 15, 2012 5:07 pm
Wikipedia User: Thekohser
Wikipedia Review Member: thekohser
Actual Name: Gregory Kohs
Location: United States

Re: Page view -to- Article length ratios

Unread post by thekohser » Fri Jan 20, 2017 5:04 pm

Randy from Boise wrote:Take, for example, Socialist Party of Washington (T-H-L) — weighing in at a hefty 116.8K and attracting a paltry 2,404 hits during all of 2016.
Looks like someone should have written a copyrighted monograph and sold it to an academic publisher, rather than plunking it down on Wikipedia for free.
"...making nonsensical connections and culminating in feigned surprise, since 2006..."

User avatar
Randy from Boise
Been Around Forever
Posts: 12277
Joined: Sun Mar 18, 2012 2:32 am
Wikipedia User: Carrite
Wikipedia Review Member: Timbo
Actual Name: Tim Davenport
Nom de plume: T. Chandler
Location: Boise, Idaho

Re: Page view -to- Article length ratios

Unread post by Randy from Boise » Fri Jan 20, 2017 5:13 pm

thekohser wrote:
Randy from Boise wrote:Take, for example, Socialist Party of Washington (T-H-L) — weighing in at a hefty 116.8K and attracting a paltry 2,404 hits during all of 2016.
Looks like someone should have written a copyrighted monograph and sold it to an academic publisher, rather than plunking it down on Wikipedia for free.
You think academic publishers write checks, do you? Ha!!!

Let me put it another way: I spent over 8 months going full out as co-editor on a book project that was put out by an illustrious academic publisher. I believe — I need to check, but I believe — that a total of 38 copies were sold in the first year. These are locked in a few libraries and if 10 copies have even been opened by readers, I would be surprised. For my work, I received a royalty of three books, a notch in my gunbelt, and prospects of a paperback edition this year that might sell in the low hundreds.

So now you tell me what the most effective information-transmission mechanism is: academic publishers or Wikipedia?

Compare and contrast: Lovestoneites (T-H-L) — 2,831 hits in 2016 / 62.75K = 0.045 Kohs Score™®

With this:
linkhttp://www.brill.com/products/book/amer ... es-1929-40[/link]

One is easily accessible by anyone anywhere with an interest or a question, the other is locked up by a greedhead publisher who will presumably turn over the manuscript to a Trotskyist publisher in Chicago shortly...


tim

User avatar
thekohser
Majordomo
Posts: 13410
Joined: Thu Mar 15, 2012 5:07 pm
Wikipedia User: Thekohser
Wikipedia Review Member: thekohser
Actual Name: Gregory Kohs
Location: United States

Re: Page view -to- Article length ratios

Unread post by thekohser » Fri Jan 20, 2017 8:15 pm

Randy from Boise wrote:One is easily accessible by anyone anywhere with an interest or a question, the other is locked up by a greedhead publisher who will presumably turn over the manuscript to a Trotskyist publisher in Chicago shortly...
And one I can go and mess with later tonight. :evilgrin:
"...making nonsensical connections and culminating in feigned surprise, since 2006..."

User avatar
Randy from Boise
Been Around Forever
Posts: 12277
Joined: Sun Mar 18, 2012 2:32 am
Wikipedia User: Carrite
Wikipedia Review Member: Timbo
Actual Name: Tim Davenport
Nom de plume: T. Chandler
Location: Boise, Idaho

Re: Page view -to- Article length ratios

Unread post by Randy from Boise » Fri Jan 20, 2017 8:24 pm

thekohser wrote:
Randy from Boise wrote:One is easily accessible by anyone anywhere with an interest or a question, the other is locked up by a greedhead publisher who will presumably turn over the manuscript to a Trotskyist publisher in Chicago shortly...
And one I can go and mess with later tonight. :evilgrin:
I actually haven't had much trouble with that across the Wiki. One good thing about limiting oneself to esoteric shit: vandals get no bang out of their vandalism. If an idiot takes a dump in the woods, does anyone care?

RfB

User avatar
Ming
the Merciless
Posts: 3002
Joined: Wed Apr 03, 2013 1:35 pm

Re: Page view -to- Article length ratios

Unread post by Ming » Fri Jan 20, 2017 9:01 pm

thekohser wrote:Another interesting comparison is to look at the size of the articles about the Cleveland Browns (T-H-L) and the Detroit Lions (T-H-L). I mean, really. They are both NFL teams that have been around for more than seven decades, both comparably hapless in performance on the field. The Browns article does get almost (but not quite) two times the viewership as the Lions article. Yet, Cleveland's article is nearly 8 times longer than Detroit's article. Why?
It took Ming about 90 seconds to work out why: both of the articles link of to a "History of" subarticle, but in the case of the Lions someone also deleted all that material from the main article. For the Browns, nobody did the same. It's also possible that the Browns subarticle might be longer because nobody bothered to try to move the Lions to Baltimore.

User avatar
Ming
the Merciless
Posts: 3002
Joined: Wed Apr 03, 2013 1:35 pm

Re: Page view -to- Article length ratios

Unread post by Ming » Fri Jan 20, 2017 9:02 pm

The other thing is that most articles probably don't get read much beyond the lead, no matter how long they are.

User avatar
thekohser
Majordomo
Posts: 13410
Joined: Thu Mar 15, 2012 5:07 pm
Wikipedia User: Thekohser
Wikipedia Review Member: thekohser
Actual Name: Gregory Kohs
Location: United States

Re: Page view -to- Article length ratios

Unread post by thekohser » Fri Jan 20, 2017 10:22 pm

Ming wrote:The other thing is that most articles probably don't get read much beyond the lead, no matter how long they are.
Would be nice if the great content management company, The Wikimedia Foundation, had some statistically-reliable research about that, wouldn't it?
"...making nonsensical connections and culminating in feigned surprise, since 2006..."

Textnyymi
Gregarious
Posts: 650
Joined: Mon Apr 21, 2014 1:29 pm
Wikipedia Review Member: Text
Actual Name: Anonyymi

Re: Page view -to- Article length ratios

Unread post by Textnyymi » Sat Jan 21, 2017 5:30 pm

If an idiot takes a dump in the woods, does anyone care?
Groundwater contamination could be a problem.

As less and less editors participate actively in cleaning up pages from vandalism, more outlier pages will retain pieces of vandalism, and consequentially more pages in the middle will start receiving incorrect data. Just yesterday the page about right hand and left hand traffic had some incorrect text above the big template at the top, which persisted for about 9 hours. Broken windows theory - Vandalized page theory! :banana:

User avatar
lilburne
Habitué
Posts: 4446
Joined: Thu Mar 15, 2012 6:18 pm
Wikipedia User: Nastytroll
Wikipedia Review Member: Lilburne

Re: Page view -to- Article length ratios

Unread post by lilburne » Sat Jan 21, 2017 9:27 pm

thekohser wrote:
Randy from Boise wrote:One is easily accessible by anyone anywhere with an interest or a question, the other is locked up by a greedhead publisher who will presumably turn over the manuscript to a Trotskyist publisher in Chicago shortly...
And one I can go and mess with later tonight. :evilgrin:
Personally if I were interested I go for the book. If I don't really give a shit and just have a momentary interest then I'd take wikipedia, I doubt I'd remember much about it later, and would most likely have this reaction to the wikipedia article
They have been inserting little memes in everybody's mind
So Google's shills can shriek there whenever they're inclined

User avatar
Johnny Au
Habitué
Posts: 2620
Joined: Fri Jan 31, 2014 5:05 pm
Wikipedia User: Johnny Au
Actual Name: Johnny Au
Location: Toronto, Ontario, Canada

Re: Page view -to- Article length ratios

Unread post by Johnny Au » Sun Jan 22, 2017 12:26 am

Textnyymi wrote:
If an idiot takes a dump in the woods, does anyone care?
Groundwater contamination could be a problem.

As less and less editors participate actively in cleaning up pages from vandalism, more outlier pages will retain pieces of vandalism, and consequentially more pages in the middle will start receiving incorrect data. Just yesterday the page about right hand and left hand traffic had some incorrect text above the big template at the top, which persisted for about 9 hours. Broken windows theory - Vandalized page theory! :banana:
Cluebot NG isn't omniscient.

In fact, it didn't fix the vandalism in History of immigration to Canada (T-H-L) that has an inaccurate lead.

User avatar
Poetlister
Genius
Posts: 25599
Joined: Wed Jan 02, 2013 8:15 pm
Nom de plume: Poetlister
Location: London, living in a similar way

Re: Page view -to- Article length ratios

Unread post by Poetlister » Mon Jan 23, 2017 8:11 pm

Johnny Au wrote:Cluebot NG isn't omniscient.

In fact, it didn't fix the vandalism in History of immigration to Canada (T-H-L) that has an inaccurate lead.
Fortunately, as I pointed out in another thread, we have Wikipediocracybot, which fixes any errors that people here mention.
"The higher we soar the smaller we appear to those who cannot fly" - Nietzsche