Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a professional C++ programmer I feel we, as a group, are constantly under-estimating the complexity of tasks, and over-estimating what can get done with the standard library.

C++ is a very powerful, unopinionated language, that gives you a lot of freedom to attack your problem domain the way you best see fit.

If you're writing a networked application, don't use POSIX sockets, which have an API designed for C, go and find a higher level library. If you're parsing complex text formats, don't iterate over buffers with char*'s, go pick up PEGTL[0]. If you're working on graphs, or need to properly index in-memory data, go pick up Boost[1][2]. If you need a GUI, go pick up Qt.

It's extremely common in C++, due to the lack of a universal package management solution, for people to try and "muddle through" and do shit themselves when it's far outside their core competency.

At one of my last employers, the core product was parsing JSON with std::regex, simply because they couldn't be bothered to integrate a JSON library (which can be done header-only).

[0] https://github.com/taocpp/PEGTL

[1] https://www.boost.org/doc/libs/1_76_0/libs/graph/

[2] https://www.boost.org/doc/libs/1_76_0/libs/multi_index/doc/i...



What's so wrong with POSIX sockets? It's maybe not an elegant API, but not because "it's designed for C". It's problems are for example 1) it's doing too much abstraction (sockets). 2) you have to deal with some arcane data structures and ancient formats (endian conversions).

But, it does its job well enough: allowing the user to send and receive packets from the network.

If you're writing a networked application that works, chances are you either don't give a sh*t what API to use as long as it lets you send and receive packets (and thus you go with the relatively portable POSIX sockets (at least for Linux / WinSock2 on Windows)), or you use a lower-level API (probably proprietary) to reduce syscall overhead / get more control.

If you're parsing text, chances are all you need is fread() to read in the next chunk from a file, and from this you'll build a "next_byte()" function and then a "next_token()" function on top.

(I've done a lot of network code as well as parsing code, and the I/O API is among the least of my concerns).

All these fancy bottom-up kitchen sink libraries implementing "proper abstractions" or whatever do not provide any value past being able to be combined to form barely working and un-fixable applications where you will pull your hair out when you actually need some control over what's happening.

For something better, you'll need exactly this from external libraries: a clean programmatic (function call) interface that gives you control at a reasonable level of abstraction.


It does the job so well that in latter releases of Apple, Google and Microsoft OSes, the new networking features aren't available to classical POSIX sockets.


Not disagreeing, that's kind of what I said. However it's still chugging along just fine and works well enough for basic applications (like 99% of applications?).

What features do you allude to if I may ask?

In any case, one can't solve any of these "problems" by abstracting over this API ;-)


POSIX sockets aren't portable, period.

Putting aside the super obvious problem that there's no common way to use them asynchronously across platforms, and that file descriptors are the wrong abstraction for TCP connections, they are riddled with more obscure issues:

- Linger behavior varies by platform

- Even simple non-blocking behavior varies by platform.

- Common options like enabling TCP keep-alives, or setting buffer sizes, varies by platform.

- More often than not, in modern times, you also want TLS... and that's not available portability across platforms either, and is a whole new awful API to learn (if you choose to use OpenSSL directly).

- No RAII, means resource leaks (in C++).

Using the raw BSD sockets APIs as a starting point for any portable application in 2021 is fucking insane. There's a reason why Python has the 'asyncio' module now and Go has the net module and goroutines.


Between Windows and Unix/Linux, the general approach is portable at least. Across various Unix flowers, I'd expect some more portability. Can't say much more since I've never tried to use the same code on multiple platforms.

I'd expect you can easily code one backend per supported platform since the backend specific code can start out (and most likely, stay) fairly minimal, like 100 lines or so.

> Using the raw BSD sockets APIs as a starting point for any portable application in 2021 is fucking insane

I started a Linux POSIX sockets "embedded" server project in 2019 using BSD sockets API (TCP) that is rock-solid even though it has some critical low-latency components in the data path (~10ms).

I also worked on a Windows GUI project in 2020 using WinSock2 (TCP). Then I did several experimental projects on Linux POSIX sockets in 2021, building reliable streams on top of UDP. The platform is not that important, I used non-blocking sockets and moved from recvmsg()/sendmsg() to recvmmsg()/sendmmsg() as an optimization, which is maybe 20 lines more code on the backend.

I wasted several months with the wrong approaches on Windows first. I used WinSock2 with IOCP (asynchronous completion ports) and tried to be super clever with multi-threaded designs (roughly thread-per-connection models) and lots of synchronization, even going into "Fiber" approaches with custom scheduling.

That's all wrong, and I/O is very simple. You place buffers at the connections, then you pump data to/from the buffers on a regular basis. You write plain, simple, procedural code, no threading or any other cleverness needed. All you have to do, just like with files or any other I/O, is get rid of the expectation that you can write "nice" non-blocking code in any way. You just don't do that, it won't work out (expect for scripts / batch programs).

I don't see a reason why the story with TLS should be any different (never tried though). It should just be a component that you put between the network buffers and your application code. Something arrives from the network, you shove it to the TLS module. Something arrives from the TLS module, you shove it to the network.

> No RAII, means resource leaks (in C++).

Don't worry - it's just the same as with file descriptors or most other resources. If you're declaring them inline in a stack, something is wrong. Usually there should be exactly one place in the codebase where you're creating / accepting sockets, and one place where you're closing them. There's really nothing to worry about. There's so much C++ RAII zealotry and resource leaking FUD in the wild, but with a systematic appraoch there's little that can go wrong, plus the code will be so much better structured for out.


> You write plain, simple, procedural code, no threading or any other cleverness needed

Using sockets in a synchronous fashion is one way to block for an indefinite period of time. Once a TCP connection is established, there are failure modes where nothing will notify you that the connection has been lost until you try to write(), and even then after minutes in the worst case. Using sockets without timeouts is nuts. The BSD sockets API doesn't give you timeouts.

>I wasted several months with the wrong approaches on Windows first. I used WinSock2 with IOCP (asynchronous completion ports)

If you'd used Boost ASIO you'd have gotten Windows IOCP under the covers for free.

I honestly don't see an argument here. Defaulting to these low level primitive APIs is an act of hubris. Boost has HTTP, TLS and Websockets as well, all under the same async I/o model. Even HTTP/2 is available under asio via nghttp2


> there are failure modes where nothing will notify you that the connection has been lost until you try to write()

you need to either read() or write() on a connection to be informed that the connection was terminated or half-closed. My server application works perfectly, it reacts immediately to any state change. Did not require any special code, just monitor the read and write ends, which is what one does anyway. (Yep, this is API specific behaviour of course, but it's the only sane approach IMO, since the termination event must be sent in a synchronization with the actual channel interaction).

Of course, if you're not checking for updates on both directions (read + write) because you're blocked on some blocking interface (either on the same socket or different I/O port or computation), your server won't react. The API is not to fault. The mistake was to write blocking code.

That is the difference between dirty batch scripts and systems programming.


Calling read() won't tell you anything if someone cut the wire or unplugged an Ethernet cable that wasn't directly connected to your machine.

write() won't fail until after a bunch of TCP re-transmit timeouts have passed.

TCP keepalives can help but you have to enable them and, as I said before, doing so is different on different platforms.

Honestly, if you're doing anything remotely interactive or latency sensitive on the same thread as network I/O you need to go async.


For the record, I am of course doing async (Or rather non-blocking) I/O. I explicitly said that it's a mistake to write blocking code. (I made an error somewhere though where I wrote "non-blocking" instead of "blocking").

(And as I described, the "green threads" kind of blocking code is a mistake just as well. And the event-driven kind (i.e. callbacks / like in Javascript) leads to messes as well, only way I see it not become a mess is pushing stuff to buffers to process messages later in a separate message processing loop with some better context etc.


> The BSD sockets API doesn't give you timeouts.

Of course you can get timeouts (using select() or any other standard event notification mechanism), and most importantly you can easily get non-blocking socket reads/writes, I did just that.

> If you'd used Boost ASIO you'd have gotten Windows IOCP under the covers for free.

Well, I got Windows IOCP without the covers. Even better, since now I can integrate all IOCP parts in my application, and don't have to separate the ones that are covered (or might be? hard to see when covered, right?) by library A from those that are covered by library B.

But I'd like to see first whether IOCP is strictly needed anyway, synchronous non-blocking reads/writes might give you more than enough performance for most cases.

> Boost has HTTP, TLS and Websockets as well

I don't use Boost on principle. Maybe some of these libraries are usable, but boost is a community of architecture astronauts. Another reason is that I avoid C++ if possible.

> Defaulting to these low level primitive APIs is an act of hubris.

BSD sockets is not low level, if anything it is too high-level. As said, it allows you to send and receive packets. What more could you want? Anything else is snakeoil.

Update: Yep, this seems to be some overarchitected junk that leads to unmaintainable messes: https://www.boost.org/doc/libs/1_75_0/doc/html/boost_asio/ov... The basic primitive, receiving new updates, is not readily available. Instead, you're encouraged to do callback handlers, leading to temporal coupling and ravioli code.

All in the name of optimizing for short syntax in toy examples. Look, how much you can do in just 5 lines with automatically inferred types, and pray the RAII! (Nevermind that anything moderately complex will require twice the normal amount of code just to unwrap all the insanity).


> Of course you can get timeouts (using select() or any other standard event notification mechanism)

That's just it, there's no such thing as a 'standard event notification system'. select() is terrible for performance, and all the best options are different on every single platform.

> Instead, you're encouraged to do callback handlers, leading to temporal coupling and ravioli code.

Callbacks are the simplest primitive for async code. If you're not comfortable with them then you're not going to go far with async I/O. Not to mention, ASIO also supports futures and coroutines.


By `standard` I mean a system that works on this platform for any kind of fd or handle.

select() is just fine for simple cases, but of course it has some known problems, such as MAX_FD. There are better APIs, and ultimately it was learned that ringbuffers between the user-process and the kernel (that remove the need for system calls) as an implementation of asynchronous I/O are a good idea. I.e. IOCP, io_uring, etc.

Often you don't need any of these APIs at all - in a system with a constant ("stochastic") load you don't really need any kind of event waiting system. Instead, you can process all incoming messages every N milliseconds or so.

> Callbacks are the simplest primitive for async code.

No, the simplest thing is to just use a plain old buffer. See e.g. the IOCP API or just any regular buffer code. One side pushes the message to a buffer. Some (arbitrary) time later, the other side (potentially, but not necessarily a different thread) pulls the message from the buffer and handles it.

It's just buffers, buffers are all that is needed, and buffers plainly are the best way to solve all issues related to event handling. No fancy abstract template insanity, no weird generic resource handling systems, no complex scheduling systems, not even a need to declare any kind of event handling function or interface. Just place a few statically allocated buffers at the connection points where threads of execution (OS threads, but also hardware / network etc.) meet.

Callbacks are of course theoretically equivalent, since they can be made to do the same thing as buffers. You can trivially write a callback that only pushes the message to a buffer. In practice, the difference is significant because lots of callback boilerplate is created and temporal coupling (i.e. same thread, same code path) between enqueuing a message and handling the message is encouraged. This results in a lot of overly complex code, including custom green thread runtimes. I've seen it, I've tried to do the same, I've seen others try to do the same. It turns out to be a very, very bad idea, resulting in the creation of a whole parallel universe with separate green threads I/O implementations.

This is what the term "Callback Hell" was invented for.

Look at Windows Fibers API, it's widely recognized to be a dead end. You will find some good post-mortem material on that topic on the internet.


> Using sockets in a synchronous fashion is one way to block for an indefinite period of time.

I'm not saying to use sockets in a synchronous fashion (i.e. blocking I/O). That would, of course, potentially block the thread indefinitely.

"Plain, simple, procedural" does not imply "blocking I/O". What I mean is to use no fancy types, no callbacks, no crazy automatic scheduling magic. Very simply, there is nothing special required to handle events. Just a buffer.


For example Network.framework in Apple platforms.

https://developer.apple.com/videos/play/wwdc2017/707

https://developer.apple.com/videos/play/wwdc2018/715

See slide 17 on the WWDC 2017 session.

Similarly on the UWP/WinRT based APIs, and on Android the NDK doesn't see the network APIs that are only exposed via Java APIs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: