Page 1 of 1

New 'vandalism detection tools' announced

Posted: Thu Jan 16, 2020 7:30 pm
by Vigilant
https://lists.wikimedia.org/pipermail/w ... 94101.html

which leads to

https://meta.wikimedia.org/wiki/IP_Edit ... ving_tools

implying that IP editors are the single cause of vandalism...

There's plenty on that page to mock.


My favorite
Risks:

If we use Machine Learning to detect sockpuppets, it should be very carefully monitored and checked for biases in the training data. Over-reliance on the similarity-index score should be cautioned against. It is imperative that human review be part of the process.
Given how terrible nearly every piece of software you've developed is, I'd strongly suggest you tards stay away from automatically tagging someone as a sockpuppet.

Re: New 'vandalism detection tools' announced

Posted: Thu Jan 16, 2020 8:41 pm
by Giraffe Stapler
Vigilant wrote:
Thu Jan 16, 2020 7:30 pm
https://lists.wikimedia.org/pipermail/w ... 94101.html

which leads to

https://meta.wikimedia.org/wiki/IP_Edit ... ving_tools

implying that IP editors are the single cause of vandalism...
That's because they want to hide IP addresses. They only way that this will be acceptable to the Community(tm) is if they improve the tools (which would have to be reworked anyway when IP addresses get replaced by some pseudo-random identifier).

Re: New 'vandalism detection tools' announced

Posted: Thu Jan 16, 2020 9:05 pm
by Poetlister
What would the tools do for the recent case where an alert IP removed a huge chunk of duplicated text and a bright spark "reverted" the correction? We know that it was the reversion that was the accidental vandalism, but how could WMF-type software tell?

Re: New 'vandalism detection tools' announced

Posted: Thu Jan 16, 2020 9:23 pm
by tarantino
I followed a link from that meta page the other day to phab:T230436, Spike: Password Reset Project. There, MaxSem queried the database about wikimedia user accounts. One of the queries, "have a confirmed email address shared by another account" shows that there is no limit on the number of accounts tied to one email address. The first column is number of accounts. The second column is how many separate email addresses have the number of accounts in column one.

Code: Select all

select matching, count(*) num from (select count(*) matching from globaluser where gu_email is not null and gu_email<>'' and gu_email_authenticated is not null and gu_email_authenticated <> '' group by gu_email having matching > 1) t1 group by matching order by matching desc, num desc;
+----------+--------+
| matching | num    |
+----------+--------+
|      393 |      1 |
|      286 |      1 |
|      216 |      1 |
|      176 |      1 |
|      173 |      1 |
|      170 |      1 |
|      166 |      1 |
|      143 |      2 |
|      140 |      1 |
|      130 |      1 |
|      125 |      1 |
|      121 |      1 |
|      118 |      1 |
|      112 |      2 |
|      107 |      1 |
|      106 |      1 |
|      100 |      1 |
|       99 |      2 |
|       98 |      1 |
|       95 |      1 |
|       94 |      1 |
|       92 |      1 |
|       91 |      1 |
|       90 |      2 |
|       87 |      2 |
|       84 |      2 |
|       83 |      2 |
|       82 |      2 |
|       81 |      1 |
|       80 |      4 |
|       79 |      1 |
|       78 |      1 |
|       77 |      2 |
|       75 |      1 |
|       74 |      3 |
|       73 |      1 |
|       72 |      2 |
|       71 |      4 |
|       70 |      3 |
|       69 |      1 |
|       68 |      3 |
|       67 |      1 |
|       66 |      1 |
|       65 |      4 |
|       63 |      4 |
|       62 |      2 |
|       61 |      3 |
|       60 |      4 |
|       59 |      1 |
|       58 |      5 |
|       57 |      6 |
|       56 |      1 |
|       55 |      5 |
|       54 |      5 |
|       53 |      3 |
|       52 |      1 |
|       51 |      5 |
|       50 |      5 |
|       49 |      2 |
|       48 |      7 |
|       47 |      8 |
|       46 |      5 |
|       45 |      9 |
|       44 |     11 |
|       43 |     10 |
|       42 |      4 |
|       41 |      9 |
|       40 |      4 |
|       39 |      8 |
|       38 |      8 |
|       37 |      9 |
|       36 |     16 |
|       35 |      9 |
|       34 |     13 |
|       33 |     14 |
|       32 |     19 |
|       31 |     17 |
|       30 |     17 |
|       29 |     23 |
|       28 |     28 |
|       27 |     37 |
|       26 |     35 |
|       25 |     34 |
|       24 |     41 |
|       23 |     46 |
|       22 |     60 |
|       21 |     63 |
|       20 |     71 |
|       19 |     80 |
|       18 |    102 |
|       17 |    101 |
|       16 |    141 |
|       15 |    126 |
|       14 |    196 |
|       13 |    246 |
|       12 |    292 |
|       11 |    367 |
|       10 |    485 |
|        9 |    712 |
|        8 |   1018 |
|        7 |   1445 |
|        6 |   2530 |
|        5 |   5069 |
|        4 |  15037 |
|        3 |  81041 |
|        2 | 771294 |
+----------+--------+

Re: New 'vandalism detection tools' announced

Posted: Fri Jan 17, 2020 3:58 pm
by Poetlister
Where it's only two accounts with the same address, they might be legitimate alternative accounts. Conceivably, you might have a school project where several users are using the same e-mail address. But where there are over 100, WP:AGF is stretched to breaking point!