New tool to identify sockpuppets based on writing style

User avatar
Giraffe Stapler
Habitué
Posts: 3153
kołdry
Joined: Thu May 02, 2019 5:13 pm

New tool to identify sockpuppets based on writing style

Unread post by Giraffe Stapler » Mon Sep 13, 2021 2:38 pm

New tool to identify sockpuppets based on writing style
Checkusers on the English Wikipedia will soon have access to a new tool aimed at identifying misuse of multiple accounts based on a person's writing style. masz, developed by Ladsgroup, uses natural language processing to create an individual 'fingerprint' of a user based on the way they use language on talk pages. Checkusers can log into a web interface to compare the fingerprints of two accounts or list accounts with similar fingerprints. The tool is already live on several projects and is expected to start running on enwiki after phab:T290793 is resolved. – Joe (talk) 07:15, 13 September 2021 (UTC)
Discussion here.

No comment. ;)

User avatar
Zoloft
Trustee
Posts: 14076
Joined: Wed Mar 14, 2012 11:54 pm
Wikipedia User: Stanistani
Wikipedia Review Member: Zoloft
Actual Name: William Burns
Nom de plume: William Burns
Location: San Diego
Contact:

Re: New tool to identify sockpuppets based on writing style

Unread post by Zoloft » Mon Sep 13, 2021 2:53 pm

Giraffe Stapler wrote:
Mon Sep 13, 2021 2:38 pm
New tool to identify sockpuppets based on writing style
Checkusers on the English Wikipedia will soon have access to a new tool aimed at identifying misuse of multiple accounts based on a person's writing style. masz, developed by Ladsgroup, uses natural language processing to create an individual 'fingerprint' of a user based on the way they use language on talk pages. Checkusers can log into a web interface to compare the fingerprints of two accounts or list accounts with similar fingerprints. The tool is already live on several projects and is expected to start running on enwiki after phab:T290793 is resolved. – Joe (talk) 07:15, 13 September 2021 (UTC)
Discussion here.

No comment. ;)
It's a hammer, given to a varied group of people. Some will use it to ensure their opinion sounds legitimate.
     
Image

My avatar is sometimes indicative of my mood:
  • Actual mug ◄
  • Uncle Cornpone
  • Zoloft bouncy pill-thing


User avatar
boom
Contributor
Posts: 19
Joined: Sun Jul 04, 2021 8:21 am

Re: New tool to identify sockpuppets based on writing style

Unread post by boom » Mon Sep 13, 2021 3:46 pm

Cool. This will allow us to take evidence fabrication to a whole new level.

Will the tool be subject to the same anti-fishing guidelines as the CU itself? I'm afraid the servers might not be able to withstand the load otherwise.

User avatar
Poetlister
Genius
Posts: 25599
Joined: Wed Jan 02, 2013 8:15 pm
Nom de plume: Poetlister
Location: London, living in a similar way
Contact:

Re: New tool to identify sockpuppets based on writing style

Unread post by Poetlister » Mon Sep 13, 2021 4:45 pm

Of course, the late SlimVirgin claimed to be able to detect sockpuppets by this method. I hope that this program is more reliable.
"The higher we soar the smaller we appear to those who cannot fly" - Nietzsche

User avatar
Vigilant
Sonny, I've got a whole theme park full of red delights for you.
Posts: 31769
Joined: Thu Mar 29, 2012 8:16 pm
Wikipedia User: Vigilant
Wikipedia Review Member: Vigilant

Re: New tool to identify sockpuppets based on writing style

Unread post by Vigilant » Mon Sep 13, 2021 4:53 pm

Was it coded by Jehochmann and Durova ?
Hello, John. John, hello. You're the one soul I would come up here to collect myself.

User avatar
Midsize Jake
Site Admin
Posts: 9949
Joined: Mon Mar 19, 2012 11:10 pm
Wikipedia Review Member: Somey

Re: New tool to identify sockpuppets based on writing style

Unread post by Midsize Jake » Mon Sep 13, 2021 9:08 pm

Vigilant wrote:
Mon Sep 13, 2021 4:53 pm
Was it coded by Jehochmann and Durova ?
Even better — it was coded by this guy:
Anyway, this sort of innovation was probably inevitable, but even if Mr. Sarabadani has actual talent, it probably can't help but make joe-jobbing much easier for people who are willing to read their opponents' posts for comprehension but can't afford to subscribe to multiple VPNs at once.

User avatar
Bezdomni
Habitué
Posts: 2961
Joined: Wed Dec 28, 2016 9:07 pm
Wikipedia User: RosasHills
Location: Monster Vainglory ON (.. party HQ ..)
Contact:

Re: New tool to identify sockpuppets based on writing style

Unread post by Bezdomni » Mon Sep 13, 2021 10:42 pm

Believe it or not, the pleonasm "has (...) received positive reception" has been added to the Cambridge English dictionary. (as an example, with Wikipedia as its source: §)

I wonder how many of the 500+ en.wp occurrences were typed by Cirt. :dry:
los auberginos

User avatar
Vigilant
Sonny, I've got a whole theme park full of red delights for you.
Posts: 31769
Joined: Thu Mar 29, 2012 8:16 pm
Wikipedia User: Vigilant
Wikipedia Review Member: Vigilant

Re: New tool to identify sockpuppets based on writing style

Unread post by Vigilant » Mon Sep 13, 2021 11:30 pm

Midsize Jake wrote:
Mon Sep 13, 2021 9:08 pm
Vigilant wrote:
Mon Sep 13, 2021 4:53 pm
Was it coded by Jehochmann and Durova ?
Even better — it was coded by this guy:
Anyway, this sort of innovation was probably inevitable, but even if Mr. Sarabadani has actual talent, it probably can't help but make joe-jobbing much easier for people who are willing to read their opponents' posts for comprehension but can't afford to subscribe to multiple VPNs at once.
I will boldly predict that this will be nearly as funny as that shite tool that was supposed to determine aggression in text.


Name escapes me at the moment.
Hello, John. John, hello. You're the one soul I would come up here to collect myself.

User avatar
Poetlister
Genius
Posts: 25599
Joined: Wed Jan 02, 2013 8:15 pm
Nom de plume: Poetlister
Location: London, living in a similar way
Contact:

Re: New tool to identify sockpuppets based on writing style

Unread post by Poetlister » Tue Sep 14, 2021 10:44 am

Midsize Jake wrote:
Mon Sep 13, 2021 9:08 pm
Vigilant wrote:
Mon Sep 13, 2021 4:53 pm
Was it coded by Jehochmann and Durova ?
Even better — it was coded by this guy:
Anyway, this sort of innovation was probably inevitable, but even if Mr. Sarabadani has actual talent, it probably can't help but make joe-jobbing much easier for people who are willing to read their opponents' posts for comprehension but can't afford to subscribe to multiple VPNs at once.
We all know how good WMF developers can be. But did he also develop the algorithms? if so, does he have the necessary expertise in AI?
"The higher we soar the smaller we appear to those who cannot fly" - Nietzsche

ArmasRebane
Gregarious
Posts: 994
Joined: Wed Nov 18, 2015 7:04 pm

Re: New tool to identify sockpuppets based on writing style

Unread post by ArmasRebane » Tue Sep 14, 2021 3:09 pm

Poetlister wrote:
Mon Sep 13, 2021 4:45 pm
Of course, the late SlimVirgin claimed to be able to detect sockpuppets by this method. I hope that this program is more reliable.
It's probably marginally so. This kind of computer-synthesized analysis is a little less likely to be prone to human pattern-matching behaviors, but I'd be highly surprised if it could put out any sort of definitive link.

I imagine this will be used like other behavioral evidence. This isn't going to suddenly turn up a bunch of unknown socks, especially since if they're smart people who have been trying to evade detection should have been doing stuff to change their language anyhow.

As for enabling Joe-jobs, what, are people going to constantly run their sock accounts' edits into the machine to try and match someone else's output?

User avatar
Giraffe Stapler
Habitué
Posts: 3153
Joined: Thu May 02, 2019 5:13 pm

Re: New tool to identify sockpuppets based on writing style

Unread post by Giraffe Stapler » Tue Sep 14, 2021 3:18 pm

Poetlister wrote:
Tue Sep 14, 2021 10:44 am
We all know how good WMF developers can be. But did he also develop the algorithms? if so, does he have the necessary expertise in AI?
I said "no comment" but I'm going comment anyway. It's not clear to me if there actually is any "AI" in this. It seems like straight statistical analysis of word use, but I've only seen the same couple of graphs everyone else has. (Word distributions of two users in fawiki 1.png and Word distributions of two users in fawiki 2.png)

The talk about restricting use to Checkusers made me laugh. Google "stylometry". There are plenty of papers on machine learning and stylometry and no shortage of projects implementing those papers. If you want to do a stylometric analysis of Wikipedia editors, you can already do it without this tool. It's a good project for someone, actually.

There are some things to be considered, though. Do you use everything that the editor has written on Wikipedia, or just what they have written outside of article space? I am quite sure that a fair percentage of what gets added to articles is just cut-and-pasted from the sources with minor edits like splicing two sentences together or leaving out an unecessary clause. That's going to muddy up your "fingerprint". But if you only use non-article space edits, you might have trouble getting enough text for a meaningful analysis. Not all sockmasters are given to ranting on talk pages (although it does seem to be a common trait).

There seems to be a suggestion that they are storing the "fingerprints" that this tool generates. I don't know why they would do that unless they intended to check them repeatedly. So the tool isn't for comparing two users like an editor interaction tool, it's for identifying users. It gives you the possibility of searching stored fingerprints for matches. And since this is based on Wikipedia edits, it means that data retention is no longer an issue. Lots of potential for past bad behaviour to be uncovered...

:popcorn:

MrErnie
Habitué
Posts: 1172
Joined: Tue Jul 14, 2015 9:15 am

Re: New tool to identify sockpuppets based on writing style

Unread post by MrErnie » Tue Sep 14, 2021 5:14 pm

Let's all start saying "respectfully defer to," and "acknowledgement of my," and linking diffs by saying "at DIFF" and see how many of us get blocked as Cirtpuppets.

ArmasRebane
Gregarious
Posts: 994
Joined: Wed Nov 18, 2015 7:04 pm

Re: New tool to identify sockpuppets based on writing style

Unread post by ArmasRebane » Tue Sep 14, 2021 8:46 pm

MrErnie wrote:
Tue Sep 14, 2021 5:14 pm
Let's all start saying "respectfully defer to," and "acknowledgement of my," and linking diffs by saying "at DIFF" and see how many of us get blocked as Cirtpuppets.
This would seem the advance of computer-synthesized stylometry versus user-recognition: someone trying to joe job someone using trademark phrases is probably less likely to work because those phrases are only part of their overall corpus.

Basically, they'd have to be much better at aping someone else's style to appear indistinguishable.

User avatar
Midsize Jake
Site Admin
Posts: 9949
Joined: Mon Mar 19, 2012 11:10 pm
Wikipedia Review Member: Somey

Re: New tool to identify sockpuppets based on writing style

Unread post by Midsize Jake » Tue Sep 14, 2021 10:37 pm

ArmasRebane wrote:
Tue Sep 14, 2021 8:46 pm
This would seem the advance of computer-synthesized stylometry versus user-recognition: someone trying to joe job someone using trademark phrases is probably less likely to work because those phrases are only part of their overall corpus.

Basically, they'd have to be much better at aping someone else's style to appear indistinguishable.
You may well be right — we'll probably just have to wait and see how well (if at all) the software works. My point earlier was that if you're being reasonably subtle about it, and really trying to get someone else in trouble via joe-jobbing, you might be more likely to be successful at casting suspicion against the targeted user because the software is more likely to notice what you're doing. It's going to be processing the edit samples much faster (and therefore in much greater volume) than a human can, and of course it also doesn't sleep, and perhaps more importantly, it isn't hindered by compassion or the nice person's tendency to give people the benefit of the doubt. (IOW, "oh no, he would never do such a terrible thing on Wikipedia, of all places.")

Anyhoo, if the software in question works properly, presumably that means there will be (relatively) few false positives. It will probably "score" new-ish users as it compares them to more established ones, and only report comparisons that produce scores over a certain threshold (say, 75% likely). So if you're joe-jobbing, it just becomes a question of how similar you have to be to reach the reporting threshold, right? IMO it really depends on how good the algorithm is, and like you say, how good the joe-jobber is. So (at the risk of repeating myself repetitively) we'll just have to wait and see, I guess.

User avatar
Tarc
Habitué
Posts: 1569
Joined: Sun Mar 18, 2012 1:31 am
Wikipedia User: Tarc

Re: New tool to identify sockpuppets based on writing style

Unread post by Tarc » Wed Sep 15, 2021 1:37 am

This triggers a random ANI memory. There was a user that was dragged there a few times for refusing to communicate on talk pages, they only did occasional short words in edit summaries. They expressed a fear of being identifiable from word patterns.
"The world needs bad men. We keep the other bad men from the door."

User avatar
Poetlister
Genius
Posts: 25599
Joined: Wed Jan 02, 2013 8:15 pm
Nom de plume: Poetlister
Location: London, living in a similar way
Contact:

Re: New tool to identify sockpuppets based on writing style

Unread post by Poetlister » Wed Sep 15, 2021 3:27 pm

Midsize Jake wrote:
Tue Sep 14, 2021 10:37 pm
Anyhoo, if the software in question works properly
We're talking about stuff produced by a WMF developer. No doubt it will work quite as well as, ... let's see, ... the visual editor?
"The higher we soar the smaller we appear to those who cannot fly" - Nietzsche

User avatar
Ming
the Merciless
Posts: 2993
Joined: Wed Apr 03, 2013 1:35 pm

Re: New tool to identify sockpuppets based on writing style

Unread post by Ming » Thu Sep 16, 2021 6:12 am

Ming does feel that this needs to be tested out in the open where everyone can see how well it works before people start applying it as if it were a reliable oracle. For that matter, it needs to be entirely open-source.

User avatar
Giraffe Stapler
Habitué
Posts: 3153
Joined: Thu May 02, 2019 5:13 pm

Re: New tool to identify sockpuppets based on writing style

Unread post by Giraffe Stapler » Thu Sep 16, 2021 3:22 pm

Ming wrote:
Thu Sep 16, 2021 6:12 am
Ming does feel that this needs to be tested out in the open where everyone can see how well it works before people start applying it as if it were a reliable oracle. For that matter, it needs to be entirely open-source.
Ah. It is open-source. If you could see it, which you can't, you would be allowed to use the code or create your own version of it, but it is understood that anyone allowed to see it will not do this. Open-source is really just licensing, although people usually and reasonably expect that open-source code is freely accessible.

Post Reply