Confusable Character Detection in Erlang

Richard Kallos
&
October 7, 2025
10
min. read
Table of Contents
TABLA DE CONTENIDOS
ÍNDICE DE CONTEÚDO

Grindr is the world-leading social networking app that connects gay, bi, trans, queer, and other people in almost every country in the world. With millions of users opening the app every day, Grindr’s popularity makes it an attractive platform for spammers.

Grindr’s use throughout the world necessitates support for transmitting text in any language, so we support sending characters defined by the Unicode standard, using the UTF-8 encoding. However, the Unicode standard specifies over 140,000 characters, and there are groups of characters that resemble one another visually. This fact is well-known by spammers and other bad actors, who are known to craft messages and names by substituting letters in words with characters that are visually confusable with the letter being substituted. This is a common way of stepping around simple spam filters, as well as uniqueness checks for names.

Grindr is committed to protecting its users from spam, and we employ several tactics to try and prevent spam from being transmitted on our platform. Grindr’s chat system is built using Erlang. We’re excited to announce that we have open-sourced erl-tr39, an Erlang library for determining whether pieces of text are visually confusable.

What does it mean for text to be visually confusable?

People can usually tell that “𐌁𝕠𝖇” spells “Bob”. However, computers generally cannot see the resemblance, because computers usually compare strings of text by their representation as bytes, rather than by their visual presentation. Consider the codepoints that comprise “Bob” and “𐌁𝕠𝖇”:

What’s important to note here is that “𐌁𝕠𝖇” is not the result of any trick using fancy fonts; it is comprised of Unicode characters that are distinct from the characters that make up “Bob”. As a result, “𐌁𝕠𝖇” has a very different byte representation than “Bob”, and any procedure for testing string equality using byte representations will consider “𐌁𝕠𝖇” and “Bob” to be different strings.

Spammers can and do leverage visually confusable characters in order to bypass word filters and name checks. In one case, 37,000 people downloaded a fake version of the popular Chrome extension AdBlock Plus, called “АdВlосk Рluѕ”, where several of the Latin letters have been replaced with identical Cyrillic letters. When encoded with any of the valid Unicode encodings, these two strings do not compare equal!

Thankfully, the Unicode Consortium has been openly working on Technical Standard #39, which describes different ways of detecting and avoiding security issues related to the use of Unicode. One of the contributions of this report is a method for detecting whether two strings of Unicode characters are visually confusable. One of the steps is to map visually confusable characters to their prototypes. This mapping is specified in a text file that exists alongside the report. Here’s a line from confusables.txt:

1D6D9 ; 03C8 ; MA # ( 𝛙 → ψ ) MATHEMATICAL BOLD SMALL PSI → GREEK SMALL LETTER PSI #

This line states that the prototype or ‘exemplar’ character for character 0x1D6D9, or MATHEMATICAL BOLD SMALL PSI, is character 0x03C8, or GREEK SMALL LETTER PSI.

TR39 specifies a function skeleton(X), which is comprised of the following 3 steps:

  1. Convert the input text to Normalization Form D (NFD). This basically means to convert characters with accents, such as 0x212B (Å), into a sequence of unaccented characters and combining accents, such as 0x41 (A) and 0x30A ( ̊).
  2. Concatenate the prototype character of each character from step 1
  3. Convert to NFD once more

Two strings X and Y are confusable if skeleton(X) == skeleton(Y).

Introducing erl-tr39

erl-tr39 is the result of a short escript parsing confusables.txt and producing an Erlang module that, among other things, exports a skeleton/1 function. Here is erl-tr39 in action:

Erlang/OTP 24 [erts-12.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit]

Eshell V12.0  (abort with ^G)
1> tr39:skeleton("𐌁𝕠𝖇").
"Bob"
2> tr39:confusable("𐌁𝕠𝖇", "Bob").
true

Why Developing erl-tr39 was a treat

Developing erl-tr39 was a particularly pleasant experience thanks to some great Erlang features.

Pattern matching

Pattern matching is a very pleasant feature to have in a programming language, and Erlang has LOTS of pattern matching. In Erlang, you can both construct and match integers using different radices. In confusables.txt, characters codes are written in hexadecimal. This means the generated function heads can closely resemble lines from confusables.txt.

% 0340 ; 0300 ; MA # ( ̀ → ̀ ) COMBINING GRAVE TONE MARK → COMBINING GRAVE ACCENT #prototype(16#340) -> 16#0300;

Matching and constructing UTF-8 characters in binaries

UTF-8 is the most common encoding for Unicode characters. It uses between 1 and 4 bytes to represent a character, using the most significant bits of each byte to store some metadata. Here’s a handy table that shows how to encode characters using UTF-8.

Erlang supports constructing and matching UTF-8 in binaries. If you want to construct a UTF-8 encoded binary for character 9733 (BLACK STAR), you could write <<”★”/utf8>>, and you can use the same /utf8 suffix if you want to match a UTF-8 encoded character in a binary.

Thanks to this feature, I didn’t have to spend any of my “code beauty budget” on encoding and decoding UTF-8 characters.

Bit string comprehensions and bit string generators

Now that I can match UTF-8 characters, I am able to write a recursive function to map my tr39:prototype/1 function over every UTF-8 character in a binary.

convert(<<>>, Acc) ->    Acc;convert(<<H/utf8, T/binary>>, Acc) ->    convert(T, <<Acc/binary, (utf8_bytes(prototype(H)))/utf8>>).

(NOTE: While this code is nice and short, it might not perform very well on large binaries due to having to reallocate binaries as they grow. It might be better to construct an iolist, or to construct many small binaries and concatenate them at the end with something like iolist_to_binary/1. Benchmarking is your friend in cases like this!)

The above code is pretty clean, but it could become even cleaner with a bit string comprehension:

convert(Bin) ->    << (utf8_bytes(prototype(C))) || <<C/utf8>> <= Bin>>.

Erlang list comprehensions are pretty popular, because you can compose generators and filters on the right of the “||”, but it was a pleasant surprise to remember that bit string comprehensions exist.

What’s next?

We currently use the library in production, and, for now, it is sufficient for our purposes. That said, we’re committed to continuing development in the open, and we hope that others may find this library useful.

𝖋𝒾ո

Thanks to several nice Erlang features, creating erl-tr39 was a fun and interesting process. We’re delighted to share erl-tr39 with the FLOSS community, and we hope that others find it useful. Feedback and contributions to erl-tr39 will be met with appreciation and gratitude.

Share this article
Comparte este artículo
Compartilhe este artigo

Find & Meet Yours

Get 0 feet away from the queer world around you.
Thank you! Your phone number has been received!
Oops! Something went wrong while submitting the form.
We’ll text you a link to download the app for free.
Table of Contents
TABLA DE CONTENIDOS
ÍNDICE DE CONTEÚDO
Share this article
Comparte este artículo
Compartilhe este artigo
“A great way to meet up and make new friends.”
- Google Play Store review
Thank you! Your phone number has been received!
Oops! Something went wrong while submitting the form.
We’ll text you a link to download the app for free.
“A great way to meet up and make new friends.”
- Google Play Store review
Discover, navigate, and get zero feet away from the queer world around you.
Descubre, navega y acércate al mundo queer que te rodea.
Descubra, navegue e fique a zero metros de distância do mundo queer à sua volta.
Already have an account? Login
¿Ya tienes una cuenta? Inicia sesión
Já tem uma conta? Faça login

Browse bigger, chat faster.

Find friends, dates, hookups, and more

Featured articles

Artículos destacados

Artigos em Destaque

Related articles

Artículos relacionados

Artigos Relacionados

No items found.

Find & Meet Yours

Encuentra y conoce a los tuyos

Encontre o Seu Match Perfeito

4.6 · 259.4k Raiting
4.6 · 259.4k valoraciones
4.6 · 259.4k mil avaliações