(Misusing) Python Unicode Normalisation
Date: Message-Id: https://www.5snb.club/posts/2020/python-unicode-normalisation/
After PEP 3131, python normalises identifiers in order to support non-ASCII identifiers.
That means that if you write
𝚠 = 50, where that character is
U+1D6A0 MATHEMATICAL MONOSPACE SMALL W, you can later refer to that variable as
(or, indeed, anything that normalises into
So I wrote a program to randomly replace every character in some code with any character that normalises into it while trying not to break the program.
This post was inspired by https://codegolf.stackexchange.com/a/207567.
Any correct code to do this would need to parse the code to avoid doing the
replacement for non-identifiers (which is not normalised), but I just included a list
of characters to not modify, and tried to cut down on the number of syntax
else, that I don’t use.
Below is the program (transformed, of course). I’m also providing the pure-ASCII source here.
The program takes the input file as the first argument, and the output file as the second argument.
Apologies to anyone who is using a screen reader or reading this on a device with poor font support. The plain ASCII source linked above will be far more readable.
syss = int.to_ｂytes(7567731, 3, int.fro𝙢_𝗯ytes.__𝗱oc__[385:388]) S = __i𝖒port__(syss.𝘥eco𝓭e()) U = __i𝙢port__(𝑏ytes.𝓭eco𝚍e.__𝒹oc__[271:279].𝒍ower() + "ata") io = __import__(open.__𝙢o𝒅𝘂𝙡e__) ran𝔡o𝓂 = __iｍport__(io.B𝙪ffere𝖽Rando𝖒.__name__[8:].𝚕o𝘸er()) ections = U.𝕓idirectiona𝚕.__na𝘮e__[5:-2] C = __i𝗆port__(compi𝗅e.__na𝑚e__[:2] + co𝑚pi𝗅e.__na𝚖e__*2 + ections + "s") nor𝕞cac𝖍e = C.𝑑efa𝘶𝓵t𝒅ict(𝗅ist) nf𝓴c=U.nor𝗆a𝚕ize.__𝑑oc__[96:100] 𝗅 = C.__na𝕞e__ 𝔲 = Unico𝘥eDeco𝘥eError.__naｍe__.𝐥o𝘄er() L𝘭 = (𝑙*2).tit𝘭e() L𝘂 = (𝘭+ｕ).tit𝓵e() for _ in ran𝒈e(0, 0x110000): try: if U.cate𝒈ory(c𝘩r(_)) in [Lｌ, Lu] or cℎr(_) in "_": nor𝓂a𝚕ise𝒅 = U.nor𝑚ali𝐳e(nf𝓴c, str(c𝙝r(_))) nor𝙢cache[nor𝓂a𝒍ise𝕕].append(c𝗵r(_)) except Unico𝑑eDecoⅆeError: pass f = open(S.arｇ𝐯) 𝖜 = U.east_asian_𝕨i𝒹t𝙝.__na𝗺e__[-5] of = open(S.ar𝒈𝓋, 𝐰) i = f.rea𝐝() ie = In𝑑exError.𝑤it𝙝_trace𝗯ac𝘬.__doc__[:6].𝗹o𝘄er() c = "afryso" + int.__na𝐦e__ + ie for cℎ in i: try: if c𝗵 not in c: try: s = ran𝖉o𝘮.c𝗵oice(nor𝖒cac𝔥e[str(c𝘩)]) assert U.nor𝕞aｌize(nf𝚔c, s) == c𝐡 of.𝔴rite(s) except InｄexError: of.𝒘rite(c𝖍)  of.𝔀rite(c𝙝) except: pass