Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I read through the Rust book, and the problem I was having with it and the other docs is that it was hard to map the Rust concepts with what actually runs when it is compiled. For a language that touts "uncompromising performance", it was difficult for me to find performance characteristics of the underlying abstractions and std library (for example, are algebraic data structures just tagged unions or does the compiler do more fancy things with them? What about iterators?). I'd really like to see a "Rust for C/C++ devs" guide that helps you figure out if you were using [some C++ feature] the way to get that behavior/performance with idiomatic Rust.

Another thing that is still tricky for me is figuring out when I should use 'unsafe' blocks in my code. Is it to be avoided if at all possible, or should I go there any time the 'safe' part of the language is making it difficult to express what I want? The meme that Rust is C++ without SegFaults and or race conditions is a bit misleading since the actual guarantee is that you don't get SegFaults or Race conditions outside of Unsafe blocks, and any nontrivial project will make use of unsafe blocks.



> for example, are algebraic data structures just tagged unions

They're tagged unions with no implicit heap allocations. I guess we should at least document that in the reference (though we don't want to overspecify, because we do some tricks in the compiler to try to avoid leaving space for the tag if we can). But I don't think it'd be a good idea to document this straight away in the book: the goal is to make Rust easy to pick up, and adding more information than necessary when introducing enums (which a lot of folks will only see for the first time in Rust) isn't going to do people any favors.

> What about iterators?

The documentation for Iter explains this pretty well, I think: https://doc.rust-lang.org/stable/std/iter/

It even shows the exact implementation of (basically) Range, to give you an idea.

That said, it would probably be worth calling out that most iterators are guaranteed not to allocate. Note, though, that that isn't a hard-and-fast constraint that implementors of the trait have to abide by—you can make your own iterators and implement them however efficiently or inefficiently you like.

> The meme that Rust is C++ without SegFaults and or race conditions is a bit misleading since the actual guarantee is that you don't get SegFaults or Race conditions outside of Unsafe blocks, and any nontrivial project will make use of unsafe blocks.

They'll make use of unsafe blocks transitively, by using unsafe code in the standard library or well-tested crates. Think of these unsafe blocks as part of the compiler: you trust the compiler to generate correct machine code when you type "a + b", so likewise you trust the standard library to do the right thing when you say "HashMap::new()".

It is not the case that most projects should use unsafe themselves everywhere: that usually just makes life harder. The primary exception is if you're binding to C code, in which unsafe is unavoidable.


Speaking of "guaranteed not to allocate", is there a way that you could express that in a type? Seems like that might be nice to have.


Not in the type system itself, but you could write a lint to forbid heap allocation. This way, you could annotate a function (with e.g. `#[forbid(allocations)]`) to get a compile error when your function (or code your function calls) tries to allocate. This might not be easy, though :)


Not in Rust's type system. In a pure language, you could have some sort of "Heap" monad similar to the "IO" type in Haskell.


So Rust has no way to mark side effects and global dependencies of functions? Allowing singletons in a language that is supposed to be safe sounds like a huge design flaw.


Mutable statics are unsafe to access or update. You can use interior mutability with something like a Mutex to get a mutable-but-not-to-rustc value, which is safe.

Systems programming languages need this kind of functionality.


Putting a mutex around a global variable doesn't change the fact that it is still a global variable.

Memory access might be safe but you get spaghetti code and combinatorial state explosion due to all the potential side effects.

Allowing singletons for edge cases is fine but with no proper way to enforce it except code review you really have now idea what the underlying code might potentially do.


I agree with you that using globals as sparingly as possible is good, but your original claim was about safety, so that's what I focused on.


I am not sure if you have come across this "Rust tutorial for c/c++ programmers" https://github.com/nrc/r4cppp but I found it to be nice when I was first exploring Rust (I had prior experience with C++).

I haven't had to resort to "unsafe" blocks in the Rust I have written so far but "ffi" is one use case for unsafe blocks. Another resource that I have yet to read is "https://doc.rust-lang.org/nomicon/" which seems to explain how to write unsafe Rust code.


Is there an equivalent guide to the compile-time representation of important constructs for those trying to learn C/C++? I haven't seen anything like that and it seems like most devs in that realm instead rely on experience and tribal knowledge (which is, AFAICT, how it often works in Rust-land right now). I agree it'd be great for Rust to have clearer official docs about some of these things, but it doesn't seem to me like this is readily available for most languages or runtimes.

Re: unsafe, I think that's tough. My personal feeling is that many of Rust's selling points rely on minimizing the use of unsafe (i.e. limiting segfault-relevant portions of the code), and that there are frequently ways to make things work and also make them fast without using unsafe. What's an example where you found yourself thinking about using unsafe instead of a more complex safe construct?

(Somewhat related to this, and especially for anyone reading who might try Rust, I cannot recommend getting on IRC strongly enough. The Rust IRC channels are by and large incredibly friendly and helpful, and for better or worse that's where a lot of the knowledge in the community is currently collected, not as much SO or blogs)


> Is there an equivalent guide to the compile-time representation of important constructs for those trying to learn C/C++?

Definitely. It was taught in school and there's pretty good guides for it online (maybe not caught up to c++11 and beyond, but the fundamentals are there). You're right that it is not readily available for most languages, but when you need to get serious about performance you either are going to have a guide or spend a lot of time looking at assembly/bytecode. To be fair, I'd probably still have to inspect generated code sometimes, but it's nice to have good instincts for how things run to guide your design/implementation so you can spend less time looking at assembly.

http://www.agner.org/optimize/optimizing_cpp.pdf


I definitely haven't seen anything as comprehensive as the linked PDF for Rust (although that shouldn't be surprising given the extreme thoroughness of that guide and the age of C++). Probably a good project!

When doing very performance sensitive things in Rust, I usually find myself asking questions a lot on IRC and looking at disassembly in perf.

To answer some specific examples you cited above: I'm pretty sure that enums are (almost?) always equivalent to tagged unions. If you have an enum which doesn't contain any data, then I believe it's representation is just the tag. Iterators are just structs with methods, the various generic functions they implement are monomorphized and then optimized by LLVM.


> any nontrivial project will make use of unsafe blocks.

Sure! But that's okay. Just don't use 'quantity of unsafe blocks' as a metric of quality and you'll be all set. Think of it like so: don't use it until you have to and try not to have to. For me, that means consulting experts on IRC (etc), "How can I express this goal in idiomatic rust?" No different from learning C/C++ for the first time, IMO. And if no good way exists you may have to use unsafe blocks.

Unsafe blocks aren't bad, just like #pragma-disable-this-warning and --static-checker-I-did-it-this-way-by-design aren't bad. They mean that you've thought critically about the pros and cons and you are going into this decision well aware of the risk. On the flip side they should be the first blocks to closely examine in the face of failures like segfaults/races/etc.


> > any nontrivial project will make use of unsafe blocks.

I don't think that's actually true? Most projects make use of no unsafe outside of stdlib and a handful of crates.io crates.


This has been my experience also. I've written at least 40kloc of rust over the past couple years (including complex graphs with cycles, low-level DSP) and I could probably count the number of unsafe blocks I've needed on one hand.

edit: This is not counting FFI though.


Excluding FFI I might just be able to count the number on unsafe blocks I’ve needed on one hand. But honesty compels me to declare that for reasons of performance micro-optimisation, I’ve written a lot more.


How do you handle cycles? I've seen discussions on places like /r/rust where people didn't seem to have any pleasant answers.


I tend to use petgraph[1] when I need a graph-like data structure (the only time I've needed cycles). Super fast, distinguishes `Node`s from `Edge`s, lots of useful items for different kinds of traversal.

[1]: https://github.com/bluss/petgraph


About the only time I end up using it is for ffi and uninitialized arrays on the stack.


stdlib has plenty of unsafe blocks inside it, as do many crates. Claiming one doesn't use unsafe blocks because none are visible in one's lib.rs or whatever doesn't mean they aren't there.


If you're going to consider unsafe blocks in other libraries (and especially the standard library) as just as "bad" as ones in your own, you have to include the compiler itself too ("oh your compiler generates machine code, that's unsafe!!!"). This logic of course applies to every language, as everything bottoms out in machine code/hardware, and thus it is a fairly uninteresting point.

The power of Rust is the ability to wrap dangerous code into safe abstractions without cost, and unsafe blocks are essentially a flag for "this is dangerous, make sure it's contained".


I think of libstd as basically what would be part of the compiler or runtime in other languages. In some languages (e.g. Go), things like hash maps are in the language and implemented directly with unsafe code, and nobody thinks the language is less safe because of it.


Every language that exists, compiled or interpreted, typed or untyped, ultimately relies on code which could violate every guarantee that language makes. Most often, that code is written in C or C++, and is a part of the language's runtime or compiler toolchain.


There's certainly a difference between writing one's own unsafe blocks and relying on functionality which was implemented using unsafe in a community project (which has hopefully been vetted by some community members).


Most projects use c bindings; it's not unfair to say that the quality of many of the c bindings on crates.io doesn't match the quality of stuff in the standard library.

(ie sure, maybe you're not writing unsafe yourself, but you'll quite possibly hit an issue where you have to dig into a crate that does)


Most crates don't make use of unsafe, the stuff that I see that does use unsafe is mostly either embedded applications or stuff like the std lib.

And even if you do use unsafe, you can still use it to build 'safe' abstractions on top of. the idea is that your unsafe code is quarantined and abstracted, and you build on top of it. std::collections is a great example of this.


This is an area in which I haven't quite figured out how to communicate properly; I feel like I have a good understanding of how Rust maps to asm, but I don't know how to transfer that understanding to other people.

I'll certainly be reading your link below, thanks for that!


Would an updated and expanded Rust for C++ Programmers make sense as a companion to the book with references to concepts in the book along with low-level details on data structures or implementation? It would nice to see that and Nomicon more closely related and fully up-to-date.

It soulds like an informal "specification" of #[repr(c)] or #[repr(packed)] for common platforms would also be useful for FFI.


Yeah it'd be great to have.


> it was hard to map the Rust concepts with what actually runs when it is compiled.

This. The primary reason to choose rust is performance -- that is, you want more advanced abstraction/safety capabilities than C++, and you want that with the same or better performance. And performance implies CPU cost and memory usage/layout control. There really isn't any point otherwise.

Therefore, going into at least a little bit of detail on the idioms and performance impact of those idioms is important. Rust is a supposed to be a systems programming language to replace C, do not pretend it's as abstract as ML in the documentation.

What throws me for a loop when learning Rust isn't high level details like the borrow checker or sum types, it's what is happening with cpu & memory[1]. When things are copied, when they aren't, how much do the std derives cost, matching cost, sum type storage cost, etc. Because while the semantics are similar to C++, they are not the same. And you don't have to go nuts specifying it (compilers will differ), but at least give a general understanding/hint of what to do and what to avoid.

To be fair, the doc does have this scattered about, but it doesn't feel a priority (There are many things I've had to search the web for or ask on #rust, or just look at disassembly for).

[1] To take an extremely simple example, RVA is something Rust supposedly does much more consistently than C++, thus returning a new struct by value to be placed wherever is idiomatic. However this isn't called out very clearly (the last time I checked) in the doc, and to a C programmer, it feels very wrong. Stuff like this is extremely important, otherwise we'd just ignore performance and use a JVM based language with the same (or better) abstraction features and a faster compiler. :-)


> . Is it to be avoided if at all possible, or should I go there any time the 'safe' part of the language is making it difficult to express what I want?

The main reasons to use unsafe code are when you're doing FFI and have to regrettably talk to C/++ libraries, or when designing new abstractions with a safe API boundary. It's tricky to ensure that the former is safe since you eventually have to trust C++ (but then, that's not Rust's fault). It's not hard to ensure that the latter is safe. Looking at a page of code and ensuring that it can't cause segfaults is a much easier task than doing it for the entire codebase.

This is almost all the unsafe code out there. There's a bit of it used for doing manual optimizations. When Rust doesn't let you do what you want, often there are abstractions like RefCell that have a small cost that you can use (and they contribute to the overall safety). In case this happens in performance critical code, you can use unsafe again, but this is very rare.

In Servo, for example, almost all of the unsafe code is of the first two kinds. I've been hacking on Servo for years and didn't write much unsafe code at all -- when I did, it had to do with talking to Spidermonkey, and even that was pretty rare. More recently I'm working on integrating Servo's style system into Firefox (which is C++), and only now have I been regularly writing unsafe code. Even for this project the unsafe code I'm writing abstracts away the inherent unsafety of Firefox's C++ so that others can talk to Firefox with safe Rust code.

But many projects have no unsafe code at all. It's not that common to have unsafe code.

> it was difficult for me to find performance characteristics of the underlying abstractions and std library (for example, are algebraic data structures just tagged unions or does the compiler do more fancy things with them? What about iterators?).

Note that a C++ book won't help here for C++ too. What is a switch compiled down to? Does it use a jump table? :)

But yeah, it would be nice to have a thing for this. I don't think it belongs in the official book, but it should exist :)

ADTs are tagged unions. When non-nullable pointers are involved sometimes the tag is stored as a null pointer (e.g. `Option<Box<Foo>>` is a single pointer, and is None when null. Aside from that, nothing fancy.

Iterators compile down to the equivalent for loops. I can't think of any stdlib iterator which implicitly allocates; they all operate on the stack. In general these are just zero-cost abstractions, they will compile down the the code you would have written with manual loops. This is a recurring theme with the stdlib and even crates from the ecosystem. "extra" costs for abstractions are eschewed in Rust and will often be documented when they exist. So as a rule of thumb assuming that a random abstraction doesn't have an overhead unless explicitly mentioned is good.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: