Page 1 of 1

Brewster's trillions: Internet Archive strives to keep web h

Posted: Sun Apr 28, 2013 9:43 pm
by EricBarbour
http://www.guardian.co.uk/technology/20 ... et-archive

Note:
Philosophical allies include www.wikimedia.org, Mozilla, the free software community, the Electronic Frontier Foundation, a digital rights advocacy group, and the internet activist Aaron Swartz, until his death in January.
I'd like to corner Brewster Kahle someday, and ask him if he's aware that his "allies" at Wikipedia and the Wikimedia Foundation have repeatedly censored their own databases, and are using "nofollow" on all Wikimedia sites partly to keep the Internet Archive from saving copies of the censored items.

Re: Brewster's trillions: Internet Archive strives to keep w

Posted: Mon Apr 29, 2013 11:04 am
by thekohser
I don't think "nofollow" prevents any sort of scraping or archiving mechanism. Are you maybe confusing with robots.txt "Disallow"?

Re: Brewster's trillions: Internet Archive strives to keep w

Posted: Mon Apr 29, 2013 11:34 am
by Poetlister
thekohser wrote:I don't think "nofollow" prevents any sort of scraping or archiving mechanism. Are you maybe confusing with robots.txt "Disallow"?
Can a robots.txt actually prevent scraping? I know that the major search engines observe these rules, but I was under the impression that this was no more than a gentleman's agreement.

Re: Brewster's trillions: Internet Archive strives to keep w

Posted: Mon Apr 29, 2013 12:32 pm
by lilburne
Currently commented out:

# Don't allow the wayback-maschine to index user-pages
#User-agent: ia_archiver
#Disallow: /wiki/User
#Disallow: /wiki/Benutzer