Winning the toxicity arms race

by Mike Pappas, CEO of Modulate

Automated content moderation has, ever since its earliest incarnations, always been understood to be in an arms race of sorts. A well-intentioned developer might build a tool to recognise certain hateful behaviors; and mere moments later, trolls and other bad actors will conceive of a way to work around it.

Trying to look for certain keywords? No problem, we’ll just replace the letter e with 3, o with 0, etc. Oh, you figured out common replacements? We’ll get fancier, then, and use more obscure replacement characters, like the greek letter ο (that’s an omicron, not the letter ‘o’ – totally different to a computer program, but still completely readable to a person.)

Text filters figure that one out, too? Well, we can just get fancier by strategically misspelling words; adding spaces or punctuation marks between the letters, or other fancy techniques. The best text moderation tools today handle all of these special cases and more … yet there are few forces more powerful than the ingenuity of people, even when the people in question are just really determined to type a racial slur. As such, any platform implementing text moderation needs to constantly be on watch for the next shot fired in this eternal war, and invest heavily to evolve their tools over time as new evasion techniques are developed.

Given this situation, it’s understandable that most people assume all content moderation efforts will be caught in this sort of perpetual cycle. Indeed, image and video moderation tends to suffer from the same challenges, as computers interpret pixels quite differently from the human eye, leaving plenty of room for the same sorts of tricks. But fascinatingly, there’s one medium in which this just isn’t possible – one situation where the arms race is winnable, not just something to eternally manage.


Why is voice chat so different? Voice is just spoken text; so shouldn’t the challenges of text chat exist here too? Well, consider the nature of these evasion techniques in text chat. Misspellings, character replacements, adding punctuation – the key insight is that these things have no real analogue in voice chat. Sure, you could in principle “misspell” (aka mispronounce) something – but doing so means you’re making yourself hard for other people to understand, not just hard for a computer to understand. Same if you tried to do a “character replacement” (aka significantly weird enunciation of certain syllables) or “add punctuation” (conspicuously pause in the middle of saying a word). All of these actions make your comments harder to understand – meaning that trolls and other bad actors, whose purpose is to get a rise out of others, won’t be able to leverage these techniques to their own ends.

In other words, in order for the bad actor to “hide” from the moderation tools, they’d have to modify what they say until it’s no longer recognisable as something offensive even to other humans. (In other words: mission freakin’ accomplished!)

Now, we should be clear that this doesn’t mean automated voice moderation is infallible. Transcribing spoken language alone is tricky (though enormous strides have been made in recent years, especially in ensuring that accented speakers are transcribed equally accurately as any other); adding emotion and nuance analysis to that only makes the problem harder. Indeed, even full-time human moderators, trained specifically to recognise harmful behavior, sometimes mistake friendly trash talk for something worse; or mistake a serious jab for playfulness. But the key point is that, even if automated voice moderation tools make some mistakes, they make them equally for all players; there’s no way for trolls or others to trick the system into making more errors when watching them, so there’s no way for these bad actors to ever feel safe that they’ve avoided notice of a voice moderation system.

(A fun anecdote: whenever we launch our ToxMod proactive voice moderation platform in a new game, there are inevitably a few enterprising trolls who take it upon themselves to ‘test’ the system. They’ve developed this habit with text moderation – a new text filter gets launched, so they experiment with various obfuscation methods until their post goes through. Within minutes, these trolls might break these text moderation tools. But when we deploy ToxMod, something different happens. Since the trolls can’t obfuscate what they’re speaking aloud, we inevitably catch it – quickly pinpointing those players who are trying to trick the system, and even more importantly, preventing any players from uncovering any exploitable, repeatable weakness in the way that’s so easily done in text.)

The takeaway here is that, while voice moderation is indeed more complex than text moderation, it’s also more robust. Once platforms deploy voice moderation tools, they’ll still need to keep it updated with new vocabulary or political context; but they won’t need to invest the effort text moderation tools require to continually prevent bad actors from rendering the whole thing moot. Given the increased social and empathetic impact of voice chat, the math is simple – prioritising voice as a social feature, and deploying voice moderation tools to accompany it, is the best way to foster an immersive, rich, and safe online community able to survive for the long haul.

For more insights from Mike and Modulate’s team of voice moderation experts, sign up for their Trust & Safety Lately newsletter at To learn more about ToxMod, visit

About Guest Author

Check Also

Chris Charla on the past, present and future of ID@Xbox

Chris Charla tells Richie Shoemaker and Vince Pavey all about Microsoft’s independent developer initiative ahead of his ID@Xbox keynote session at the Develop:Brighton 2023 conference