Is there any difference between Windows HANDLE and Linux file descriptor? Aren't...

cryptonector · on Feb 28, 2022

HANDLE values are opaque, and generally not reused. Imagine an implementation like this:

  typedef struct HANDLE_s {
    uintptr_t ptr;
    uintptr_t verifier;
  } HANDLE;

where `ptr` might be an index into a table (much like a file descriptor) or maybe a pointer in kernel-land (dangerous sounding!) and `verifier` is some sort of value that can be used by the kernel to validate the `ptr` before "dereferencing" it.

On Unix the semantics of file descriptors are dangerous. EBADF can be a symptom of a very dangerous bug where some thread closed a still-in-use FD then a open gets the same FD and now maybe you get file corruption. This particular type of bug doesn't happen with HANDLEs.

asveikau · on Feb 28, 2022

> This particular type of bug doesn't happen with HANDLEs.

This does not match my experience at all. Just like what you said about EBADF, Win32 error code 6 (ERROR_INVALID_HANDLE) is a huge red flag for a race condition where a HANDLE gets re-used and inappropriately called upon in some invalid context, possibly even with security or stability concerns. I used to chase these bugs a lot when I worked on Win32 code bases.

If anything this class of bug is worse in Windows because (1) multi-threaded programs are way more common on Windows and (2) HANDLEs are used for more things than file descriptors.

I guess fd reuse is more likely because they tend to get handed out by the kernel as integers in increasing order. But handle reuse absolutely does happen, and if you have this class of bug in a process with a lot of concurrent handle creation in many threads and in a commonly used program it absolutely will bite as a bug at some point.

cryptonector · on March 2, 2022

Ah, my mistake.

marwis · on Feb 28, 2022

Gotcha. But it looks like file descriptors could be made almost as safe by avoiding index reuse. Is there any reason why it is not done? Hashtable too costly costly vs array?

cryptonector · on Feb 28, 2022

File descriptor numbers have to be "small" -- that's part of their semantics. To ensure this, the kernel is supposed to always allocate the smallest available FD number. A lot of code assumes that FDs are "small" like this. Threaded code can't assume that "no FD numbers less than some number are available", but all code on Unix can assume that generally the used FD number space is dense. Even single-threaded code can't assume that "no FD numbers less than some number are available" because of libraries, but still, the assumption that the used FD number space is dense does get made. This basically forces the reuse of FDs to be a thing that happens.

For example, the traditional implementations of FD_SET() and related macros for select(3) assume that FDs are <1024.

Mind you, aside from select(), not much might break from doing away with the FDs-are-small constraint. Still, even so, they'd better be 64-bit ints if you want to be safe.

HANDLEs are just better.

zozbot234 · on March 1, 2022

io_uring allows you to associate arbitrary 64-bit data with any operation and match it on completion, so it looks like it should address these concerns.

cryptonector · on March 1, 2022

Sure, but how does that remediate existing code that uses select()?

neerajsi · on March 1, 2022

That's not true, unfortunately. Handle values are lifo without any uniquifier.