New 'vandalism detection tools' announced

We examine the less than successful stories of the Wikimedia Foundation to create and use technology. The poster boy for this forum is Visual Editor.
User avatar
Vigilant
Sonny, I've got a whole theme park full of red delights for you.
Posts: 31484
kołdry
Joined: Thu Mar 29, 2012 8:16 pm
Wikipedia User: Vigilant
Wikipedia Review Member: Vigilant

New 'vandalism detection tools' announced

Unread post by Vigilant » Thu Jan 16, 2020 7:30 pm

https://lists.wikimedia.org/pipermail/w ... 94101.html

which leads to

https://meta.wikimedia.org/wiki/IP_Edit ... ving_tools

implying that IP editors are the single cause of vandalism...

There's plenty on that page to mock.


My favorite
Risks:

If we use Machine Learning to detect sockpuppets, it should be very carefully monitored and checked for biases in the training data. Over-reliance on the similarity-index score should be cautioned against. It is imperative that human review be part of the process.
Given how terrible nearly every piece of software you've developed is, I'd strongly suggest you tards stay away from automatically tagging someone as a sockpuppet.
Hello, John. John, hello. You're the one soul I would come up here to collect myself.

User avatar
Giraffe Stapler
Habitué
Posts: 3075
Joined: Thu May 02, 2019 5:13 pm

Re: New 'vandalism detection tools' announced

Unread post by Giraffe Stapler » Thu Jan 16, 2020 8:41 pm

Vigilant wrote:
Thu Jan 16, 2020 7:30 pm
https://lists.wikimedia.org/pipermail/w ... 94101.html

which leads to

https://meta.wikimedia.org/wiki/IP_Edit ... ving_tools

implying that IP editors are the single cause of vandalism...
That's because they want to hide IP addresses. They only way that this will be acceptable to the Community(tm) is if they improve the tools (which would have to be reworked anyway when IP addresses get replaced by some pseudo-random identifier).

User avatar
Poetlister
Genius
Posts: 25599
Joined: Wed Jan 02, 2013 8:15 pm
Nom de plume: Poetlister
Location: London, living in a similar way
Contact:

Re: New 'vandalism detection tools' announced

Unread post by Poetlister » Thu Jan 16, 2020 9:05 pm

What would the tools do for the recent case where an alert IP removed a huge chunk of duplicated text and a bright spark "reverted" the correction? We know that it was the reversion that was the accidental vandalism, but how could WMF-type software tell?
"The higher we soar the smaller we appear to those who cannot fly" - Nietzsche

User avatar
tarantino
Habitué
Posts: 4695
Joined: Thu Mar 15, 2012 7:19 pm

Re: New 'vandalism detection tools' announced

Unread post by tarantino » Thu Jan 16, 2020 9:23 pm

I followed a link from that meta page the other day to phab:T230436, Spike: Password Reset Project. There, MaxSem queried the database about wikimedia user accounts. One of the queries, "have a confirmed email address shared by another account" shows that there is no limit on the number of accounts tied to one email address. The first column is number of accounts. The second column is how many separate email addresses have the number of accounts in column one.

Code: Select all

select matching, count(*) num from (select count(*) matching from globaluser where gu_email is not null and gu_email<>'' and gu_email_authenticated is not null and gu_email_authenticated <> '' group by gu_email having matching > 1) t1 group by matching order by matching desc, num desc;
+----------+--------+
| matching | num    |
+----------+--------+
|      393 |      1 |
|      286 |      1 |
|      216 |      1 |
|      176 |      1 |
|      173 |      1 |
|      170 |      1 |
|      166 |      1 |
|      143 |      2 |
|      140 |      1 |
|      130 |      1 |
|      125 |      1 |
|      121 |      1 |
|      118 |      1 |
|      112 |      2 |
|      107 |      1 |
|      106 |      1 |
|      100 |      1 |
|       99 |      2 |
|       98 |      1 |
|       95 |      1 |
|       94 |      1 |
|       92 |      1 |
|       91 |      1 |
|       90 |      2 |
|       87 |      2 |
|       84 |      2 |
|       83 |      2 |
|       82 |      2 |
|       81 |      1 |
|       80 |      4 |
|       79 |      1 |
|       78 |      1 |
|       77 |      2 |
|       75 |      1 |
|       74 |      3 |
|       73 |      1 |
|       72 |      2 |
|       71 |      4 |
|       70 |      3 |
|       69 |      1 |
|       68 |      3 |
|       67 |      1 |
|       66 |      1 |
|       65 |      4 |
|       63 |      4 |
|       62 |      2 |
|       61 |      3 |
|       60 |      4 |
|       59 |      1 |
|       58 |      5 |
|       57 |      6 |
|       56 |      1 |
|       55 |      5 |
|       54 |      5 |
|       53 |      3 |
|       52 |      1 |
|       51 |      5 |
|       50 |      5 |
|       49 |      2 |
|       48 |      7 |
|       47 |      8 |
|       46 |      5 |
|       45 |      9 |
|       44 |     11 |
|       43 |     10 |
|       42 |      4 |
|       41 |      9 |
|       40 |      4 |
|       39 |      8 |
|       38 |      8 |
|       37 |      9 |
|       36 |     16 |
|       35 |      9 |
|       34 |     13 |
|       33 |     14 |
|       32 |     19 |
|       31 |     17 |
|       30 |     17 |
|       29 |     23 |
|       28 |     28 |
|       27 |     37 |
|       26 |     35 |
|       25 |     34 |
|       24 |     41 |
|       23 |     46 |
|       22 |     60 |
|       21 |     63 |
|       20 |     71 |
|       19 |     80 |
|       18 |    102 |
|       17 |    101 |
|       16 |    141 |
|       15 |    126 |
|       14 |    196 |
|       13 |    246 |
|       12 |    292 |
|       11 |    367 |
|       10 |    485 |
|        9 |    712 |
|        8 |   1018 |
|        7 |   1445 |
|        6 |   2530 |
|        5 |   5069 |
|        4 |  15037 |
|        3 |  81041 |
|        2 | 771294 |
+----------+--------+

User avatar
Poetlister
Genius
Posts: 25599
Joined: Wed Jan 02, 2013 8:15 pm
Nom de plume: Poetlister
Location: London, living in a similar way
Contact:

Re: New 'vandalism detection tools' announced

Unread post by Poetlister » Fri Jan 17, 2020 3:58 pm

Where it's only two accounts with the same address, they might be legitimate alternative accounts. Conceivably, you might have a school project where several users are using the same e-mail address. But where there are over 100, WP:AGF is stretched to breaking point!
"The higher we soar the smaller we appear to those who cannot fly" - Nietzsche

Post Reply