My favorite is U+202E Right-to-Left Override, which doesn't appear to be listed there. A surprising amount of UIs (apps, sites) can be broken with it as they were never tested with right-to-left writing direction in mind. Even a Unicode reference website that I just used to recall the code is broken by it. [0] Entering RLO into arbitrary input forms for fun can bend spacetime, I swear.
This is another good reason to have a text editor you really trust, which can show you these things. Whether it's different line-endings or weird invisible space stuff, I know I can just open it in Vim and figure out what's really going on pretty quickly. Wasted a lot of time earlier in my life on that nonsense (:
I've got my Emacs set up to display in "bold, fluo foregound and a dark background underlined by a pink line" (yes, literally that obnoxious) any character which is not part of a list of characters I consider to be acceptable. And it's configured to show any "zero width" character as if it had a width. So any "invisible character" as well as any "invisible zero width character" does appear as a black square, underlined with a pink line.
Sure... For a start I have my scratch buffer showing a few Unicode characters, one trailing spacing character on purpose (to be sure I can see it's highlighted), a zero-width-non-joiner 0x200C and an Hangul filler 0x3164 (may add some from TFA btw). This helps me quickly verify, upon startup, that my setup is working.
I configured all that literally years ago so I don't remember where's what but here's what I've got:
;; probably cargo-culted from somewhere
(update-glyphless-char-display 'glyphless-char-display-control '((format-control . empty-box) (no-font . empty-box)))
;; See https://emacs.stackexchange.com/questions/65108
(set-face-background 'glyphless-char "purple")
And then I've got this too (requires markchars.el):
(markchars-global-mode)
With:
(defface markchars-heavy
'((t :underline "magenta"))
"Heavy face for `markchars-mode' char marking."
:group 'markchars)
It should get you started.
(and, yup, I know it's overkill but I like it that way)
It doesn't do much on its own. I feel like it could, but the most effective use case I've come up with it you can invisibly plant a piece of code in some piece of text, then later on run another script that looks for that piece of code and runs it. I'm guessing that splitting the code up like this would make it harder to detect (not to mention that this code could even reside in other programs' comments undetected).
Zero-width characters can be used to covertly watermark text and to figure out who copied text from a page and pasted it somewhere else. Server software can encode a hidden number between every few words, which corresponds to a server log entry with your username (if logged in), IP address, browser fingerprint, etc. I wrote more about this here:
I think the best solution to this type of problem would be a clipboard utility that warns you when you copy text which contains hidden characters, homoglyps, rarely used whitespace characters, etc.
I've built a tool specifically to test if these kind of characters will reach API backends: https://github.com/Endava/cats. My idea was that APIs should explicitly reject or sanitise input containing such characters.
So I guess the only future-proof solution to check for this is to render user input off screen and count the number of solid pixels, at least until "falsehoods programmers believe about names" gets updated to include "Names must consist of at least one readable glyph".
Which is a non-breaking space (U+00A0), which is mapped to 0xFF in the code page 437. (You need to put a leading zero to access to Unicode code points, like alt+0255 for U+00FF ÿ.)
[0] https://unicode-table.com/en/202E/