Invisible Characters

orbital-decay · on Dec 4, 2022

My favorite is U+202E Right-to-Left Override, which doesn't appear to be listed there. A surprising amount of UIs (apps, sites) can be broken with it as they were never tested with right-to-left writing direction in mind. Even a Unicode reference website that I just used to recall the code is broken by it. [0] Entering RLO into arbitrary input forms for fun can bend spacetime, I swear.

[0] https://unicode-table.com/en/202E/

emsixteen · on Dec 5, 2022

I've blissfully ignored RTL so far, but I know that I shouldn't - I just can't imagine the pain of actually figuring out how to deal with it. :')

My favourite character is the Greek question mark;

iamevn · on Dec 5, 2022

> Even a Unicode reference website that I just used to recall the code is broken by it.

I like to imagine that sites like this have noticed the bug but have the sense of humor to choose not to fix it.

andreareina · on Dec 5, 2022

https://xkcd.com/1137/

interroboink · on Dec 4, 2022

This is another good reason to have a text editor you really trust, which can show you these things. Whether it's different line-endings or weird invisible space stuff, I know I can just open it in Vim and figure out what's really going on pretty quickly. Wasted a lot of time earlier in my life on that nonsense (:

TacticalCoder · on Dec 4, 2022

I agree with you.

I've got my Emacs set up to display in "bold, fluo foregound and a dark background underlined by a pink line" (yes, literally that obnoxious) any character which is not part of a list of characters I consider to be acceptable. And it's configured to show any "zero width" character as if it had a width. So any "invisible character" as well as any "invisible zero width character" does appear as a black square, underlined with a pink line.

And that for any buffer/file.

enchiridion · on Dec 4, 2022

Can you share that config? Sounds useful!

TacticalCoder · on Dec 4, 2022

Sure... For a start I have my scratch buffer showing a few Unicode characters, one trailing spacing character on purpose (to be sure I can see it's highlighted), a zero-width-non-joiner 0x200C and an Hangul filler 0x3164 (may add some from TFA btw). This helps me quickly verify, upon startup, that my setup is working.

I configured all that literally years ago so I don't remember where's what but here's what I've got:

    ;; probably cargo-culted from somewhere
    (update-glyphless-char-display 'glyphless-char-display-control '((format-control . empty-box) (no-font . empty-box)))
    
    ;; See https://emacs.stackexchange.com/questions/65108
    (set-face-background 'glyphless-char "purple")

And then I've got this too (requires markchars.el):

    (markchars-global-mode)

With:

    (defface markchars-heavy
      '((t :underline "magenta"))
      "Heavy face for `markchars-mode' char marking."
      :group 'markchars)

It should get you started.

(and, yup, I know it's overkill but I like it that way)

nervuri · on Dec 5, 2022

Vim does not display them all. The only program I checked which displays all such characters is `less -U`. You can test using this file:

https://gitlab.com/nervuri/nervuri.net/-/raw/master/gopher/z...

interroboink · on Dec 5, 2022

Thanks for this! Good to add to the ol' repertoire (:

Looks like the only one Vim misses is U+17B5? Though it there could be more not listed there. Unicode is a deep dark forest.

----

For others readers, here's a non-gopher version of the article linked inside: https://nervuri.net/stega

nervuri · on Dec 6, 2022

Also 061C, E0001, E0020...E007F. And probably others, yes. The list at https://invisible-characters.com/ might contain more.

> Unicode is a deep dark forest.

Oh, indeed.

userbinator · on Dec 5, 2022

I use a DOS text editor for this, where no Unicode support is an advantage. The majority of the time I'm dealing with plain ASCII anyway.

abrudz · on Dec 4, 2022

Great for doing tacit programming[1] in JavaScript:

  avg=ㅤ=>ㅤ.reduce((ㅤㅤ,ㅤㅤㅤ)=>ㅤㅤ+ㅤㅤㅤ)/ㅤ.length
  avg([3,1,4,1,5])
  2.8

[1] https://en.wikipedia.org/wiki/Tacit_programming

shepherdjerred · on Dec 5, 2022

I love point-free functions languages that support them, but please tell me you're not actually do this

csswizardry · on Dec 4, 2022

https://csswizardry.com/2014/01/use-zero-width-spaces-to-sto...

Mockapapella · on Dec 5, 2022

A while back I used these kinds of characters to encode programs into invisible text: https://www.thelisowe.com/sleeper-cell-a-method-of-embedding...

It doesn't do much on its own. I feel like it could, but the most effective use case I've come up with it you can invisibly plant a piece of code in some piece of text, then later on run another script that looks for that piece of code and runs it. I'm guessing that splitting the code up like this would make it harder to detect (not to mention that this code could even reside in other programs' comments undetected).

ttyprintk · on Dec 5, 2022

I’ve seen it as a Bacon code in capture-the-flag.

nervuri · on Dec 5, 2022

Zero-width characters can be used to covertly watermark text and to figure out who copied text from a page and pasted it somewhere else. Server software can encode a hidden number between every few words, which corresponds to a server log entry with your username (if logged in), IP address, browser fingerprint, etc. I wrote more about this here:

https://nervuri.net/stega

I think the best solution to this type of problem would be a clipboard utility that warns you when you copy text which contains hidden characters, homoglyps, rarely used whitespace characters, etc.

ludovicianul · on Dec 5, 2022

I've built a tool specifically to test if these kind of characters will reach API backends: https://github.com/Endava/cats. My idea was that APIs should explicitly reject or sanitise input containing such characters.

thirtyseven · on Dec 5, 2022

So I guess the only future-proof solution to check for this is to render user input off screen and count the number of solid pixels, at least until "falsehoods programmers believe about names" gets updated to include "Names must consist of at least one readable glyph".

voiper1 · on Dec 5, 2022

"Names must consist of at least one visible glyph".

30minAdayHN · on Dec 4, 2022

back in 90s on windows, our secret directory used to be alt+255 (it looks like a space but not space i think)

lifthrasiir · on Dec 4, 2022

Which is a non-breaking space (U+00A0), which is mapped to 0xFF in the code page 437. (You need to put a leading zero to access to Unicode code points, like alt+0255 for U+00FF ÿ.)

brunorsini · on Dec 5, 2022

I did the same thing. One more of those habits that didn't survive the transition to Mac OS X.

dezen0ts · on Dec 4, 2022

A great way for QA’s to mess with developers

squaredot · on Dec 4, 2022

The problem is maybe more when QA doesn't mess with developers, than when it does.

bombcar · on Dec 4, 2022

If QA doesn’t do it malicious users and crackers will.

8n4vidtmkvmk · on Dec 4, 2022

yeah.. please do eff around with the staging environment so that i can get traces and poke at it. prod is too locked down.

franky47 · on Dec 4, 2022

𝅷𝅶 [1]

[1] https://twitter.com/fortysevenfx/status/1599483273864187904

Minor49er · on Dec 4, 2022

numlock86 · on Dec 4, 2022

𝅷𝅸𝅹𝅺

saliagato · on Dec 4, 2022

    ⁭⁭⁭

hamiltonians · on Dec 4, 2022

usefull for impersonation scammers , like on twitter