(Misusing) Python Unicode Normalisation

Date: Message-Id:

After PEP 3131, python normalises identifiers in order to support non-ASCII identifiers.

That means that if you write 𝚠 = 50, where that character is U+1D6A0 MATHEMATICAL MONOSPACE SMALL W, you can later refer to that variable as w (or, indeed, anything that normalises into w).

So I wrote a program to randomly replace every character in some code with any character that normalises into it while trying not to break the program.

This post was inspired by

Any correct code to do this would need to parse the code to avoid doing the replacement for non-identifiers (which is not normalised), but I just included a list of characters to not modify, and tried to cut down on the number of syntax items like import, raise, with, else, that I don’t use.

Below is the program (transformed, of course). I’m also providing the pure-ASCII source here.

The program takes the input file as the first argument, and the output file as the second argument.

Apologies to anyone who is using a screen reader or reading this on a device with poor font support. The plain ASCII source linked above will be far more readable.

syss = int.to_bytes(7567731, 3, int.fro𝙢_𝗯ytes.__𝗱oc__[385:388])
S = __i𝖒port__(syss.𝘥eco𝓭e())
U = __i𝙢port__(𝑏ytes.𝓭eco𝚍e.__𝒹oc__[271:279].𝒍ower() + "ata")
io = __import__(open.__𝙢o𝒅𝘂𝙡e__)
ran𝔡o𝓂 = __import__(io.B𝙪ffere𝖽Rando𝖒.__name__[8:].𝚕o𝘸er())
ections = U.𝕓idirectiona𝚕.__na𝘮e__[5:-2]
C = __i𝗆port__(compi𝗅e.__na𝑚e__[:2] + co𝑚pi𝗅e.__na𝚖e__[5]*2 + ections + "s")
nor𝕞cac𝖍e = C.𝑑efa𝘶𝓵t𝒅ict(𝗅ist)

𝗅 = C.__na𝕞e__[2]
𝔲 = Unico𝘥eDeco𝘥eError.__name__.𝐥o𝘄er()
L𝘭 = (𝑙*2).tit𝘭e()
L𝘂 = (𝘭+u).tit𝓵e()

for _ in ran𝒈e(0, 0x110000):
        if U.cate𝒈ory(c𝘩r(_)) in [Ll, Lu] or cℎr(_) in "_":
            nor𝓂a𝚕ise𝒅 = U.nor𝑚ali𝐳e(nf𝓴c, str(c𝙝r(_)))
    except Unico𝑑eDecoⅆeError: pass

f = open(S.arg𝐯[1])
𝖜 = U.east_asian_𝕨i𝒹t𝙝.__na𝗺e__[-5]
of = open(S.ar𝒈𝓋[2], 𝐰)
i = f.rea𝐝()

ie = In𝑑exError.𝑤it𝙝_trace𝗯ac𝘬.__doc__[:6].𝗹o𝘄er()
c = "afryso" + int.__na𝐦e__ + ie

for cℎ in i:
        if c𝗵 not in c:
                s = ran𝖉o𝘮.c𝗵oice(nor𝖒cac𝔥e[str(c𝘩)])
                assert U.nor𝕞alize(nf𝚔c, s) == c𝐡
            except IndexError:

    except: pass