I have to disagree with the entire premise of the article. It's the fact that Box isn't unique that gives it this behavior. The author even says as much, but dismisses it because it gets in the way of their point:
> While it can be argued that box is like a T, but on the heap, and therefore moving it should invalidate pointers, since moving T definitely has to invalidate pointers to it, this comparison doesn’t make sense to me. While Box<T> usually behaves like a T, it’s just a pointer.
The "it's just a pointer" argument is moot (and pointers have nothing to do with the issue, it's a question of exclusive ownership and nothing more). It's a high-level object (i.e. not a pointer) to which rust's very clear aliasing and single name rules apply. Thou shalt not have multiple live T that point to the same memory address. QED.
Here we don't even have a pointer, only statically allocated data stored in .TEXT. We don't even create a second top-level instance of Foo, only an &mut reference to it. But it's unsafe because the compiler can't know whether or not it has exclusive access.
(As a refresher, in terms of "strength" of ownership from the most exclusively owned to the least so, it would go T -> &mut T -> &T.)
I don't know why so many people in this thread try to pretend that there's no reason to see Box as a pointer, no one has ever called it that, and every user drawing a parallel is confused. The documentation for Box is literally (emph mine):
>> *A pointer type* that uniquely owns a heap allocation of type T.
Until that documentation changes I find the article's point quite valid.
I only meant "not a raw pointer" because rust supports read and write operations on raw pointers with very different aliasing semantics.
It is an owned pointer, with emphasis on the "owned". You can have as many raw pointers to the same memory location as you like, you just can't have multiple native rust objects pointing to that same memory alive at once, though. It's also obvious because Box<T> implements Drop, so obviously it's not just something you can pass to a function in lieu of a pointer and if you do pass it to a function, you can no longer make any assumptions about the lifetime or validity of any pointers to the same data.
The other overload, the one used in the docs for Box, Rc, and such, is basically "anything that implements Deref". People who say "Box is not a pointer" are referring to the fact that Box is not a first-class pointer type, i.e. it's neither a reference nor a raw pointer.
The problem with this, as the article argues in detail, is that it leaves a hole in functionality that people need but offers no clear benefit in return.
A movable owning pointer that exposes its pointer-ness in its semantics is a useful tool when you're doing lower-level, sometimes-unsafe stuff with memory layout. The author points out that, because Box does not currently provide this functionality, there are crates in the ecosystem that step in to provide it instead. This middle ground between "raw pointers for everything" and "aliased XOR mutable" is an important thing to support.
This might be acceptable if there were some benefit to Box behaving this way. Perhaps if it actually made a difference for the optimizer? The benchmarks the author did seem to suggest otherwise- many mutations wind up going through a reborrowed `&mut T` anyway. Perhaps the conceptual model of "like a T but on the heap" is enough of a benefit? But being more permissive here doesn't change that model for safe code anyway.
The author dismissed this for much more concrete reasons than "it gets in the way of their point."
I think the better approach would be to standardize some kind of feature that lets you annotate variables as "not noalias" so that Rust knows to elide the noalias for those values. That's a much more generic way to solve the ecosystem problem without changing Box semantics or introducing one off types. That being said, there's already a solution in the ecosystem which is to not use aliased pointers after giving the pointer to Box & moving it.
No, UnsafeCell doesn't change any rules around non-aliasing for &mut/Box. (E.g., if you move a Box<UnsafeCell<T>>, it will still invalidate any pointers to the data.) All it does is change the rules around immutability for & references. &T is "not noalias" regardless of UnsafeCell, but it's immutable in its absence.
Meanwhile, there is a magic "not noalias" mechanism currently recognized by Miri, in the form of !Unpin types. But this is considered a temporary hack to keep it from complaining about pinned futures that reference their own fields, in the absence of an actual language feature. Also, it only applies to accessing values through unique references, not to moving or writing to them by their binding.
&T is noalias - while there may be aliases none can write, which is the important thing - without UnsafeCell, and &UnsafeCell<T> loses noalias for that reason.
I strongly disagree with this article. Perhaps in the days before Miri it might have made sense, but it's pretty trivial right now to discover UB in unsafe code with a simple `cargo +nightly miri test` run.
It feels like the Rust team is a bit wary of introducing other optimizations for fear of breaking unsafe code that has lurking UB, but it's better to start working on fixing these problems _now_ rather than get stuck in the present state of limbo. It's only going to get harder to fix incorrect code (which we see an example of in this particular post).
Honestly Miri is a superpower and it needs to be the priority of the Rust team to stabilize it. There's nothing inherently wrong with unsafe code: it's unsound code that's the problem, and we have the tools to prevent this exact problem from the article.
This is why I firmly argue (and pretend) that references are not pointers. "References are pointers" results in the belief that references will behave like C pointers and results in things like this article.
At best they are a cousin of pointers.
I consider the fact that they are pointers an implementation detail, just like Box is a value with 'static samantics.
> This is why I firmly argue (and pretend) that references are not pointers. "References are pointers" results in the belief that references will behave like C pointers and results in things like this article.
This is my usual mental model as well. My thinking is that if tomorrow a new Rust version came out that used some other magical implementation of references that didn't use pointers under the hood, my code should still be correct. Maybe converting between references and raw pointers would be less efficient, but the semantics of my code shouldn't change.
I really wish rust didn't make the guarantee that references are pointers, if I ever made a Rust 2.0 I wouldn't make this guarantee. It would make the `Copy` type less needed and help with extra `&&&` coming up in generics. It could make references to packed and unaligned bits safe. It could also support offset-references for self types.
I'm using the glommio runtime & miri takes forever to just start the executor & then throws up with "can't call foreign function `sched_setaffinity` on OS `linux`".
Aside from clearly a very long tail of blockers from running it on non-trivial program on tier 1, the slowness is a real usability problem. It's slower than valgrind afaict.
I think dismissing issues as "but we have miri" is very short sighted and it's not clear to me that Miri will ever reach the point of catching issues in substantial codebases (the standard library is substantial in number of lines, but not substantial in terms of exercising all the OS features).
I'm not aware of any UB that can 1) be caused by unique pointer violations from Box and 2) are undetectable by Miri (assuming good code coverage), but I might be wrong about this.
If you have an object that's !Unpin, then Miri will not apply uniqueness rules to anything containing it [0], including boxes and &mut references. (In the example code, replacing the PhantomPinned with a () will make Miri complain again.) This is considered a temporary (if long-lived) measure to allow async executors to manipulate pinned futures without invalidating all their internal borrows. Thus, it might be seen as undetected UB, in lieu of a permanent solution.
It's trivial to measure code coverage. It's definitely not trivial to achieve 100% code coverage.
This is the sort of "just do things perfectly" nonsense we get from C programmers. I'm surprised to see it from Rust devs, given the whole ethos of Rust is that it acknowledges that programmers are not perfect and helping them avoid bugs as much as possible is a good thing.
It's not nonsense. It's really not difficult to structure code for 100% coverage of unsafe code if you're thinking about it from the start.
You're also perfectly fine to write code that is free of `unsafe`, freeing you from this onerous task. We're pulling out Miri _because_ we're going outside the normal guardrails.
You also don't _need_ to get 100% coverage of all your unsafe code if you can be confident of the usage of unsafe. The most complex unsafe code should almost certainly be covered, but there are a lot of trivial uses of unsafe that can be shown to be correct through reasoning.
Where possible I prefer to split code into safe and unsafe portions, and test the unsafe portions under Miri with as much coverage as gives me confidence in the code.
I've made UB mistakes before with unsafe, but since adding Miri and code coverage, the numbers of mistakes I've made has dropped dramatically. No programmer is perfect, but one would be pretty foolish to ignore the tools at one's disposal.
>While we are many missing language features away from this being the case, the noalias case is also magic descended upon box itself, with no user code ever having access to it.
Well, they work on the compiler, so that's one reason I guess. Also the fact that it's magic is no secret and this is not the only way in which it is (the most important is probably the DerefMove behaviour that's mentioned in the article, too). There's been many discussions around this in the past
The biggest annoying magic I found with respect to Box (and other std containers like Rc) is that they’re the only ones capable of storing fat dyn pointers. You can’t construct a hybrid_rc::Rc<dyn Trait> like you can with Box/Rc.
It’s annoying magic like that that bothers me.
Another example is async lifetimes - it’s frequently hard to properly express the lifetime of a borrow resulting in choices of an unnecessary Box::pin, unsafe or even both. Here’s an example i ran into recently and the author’s challenges there are similar to the one’s I’ve ran into in my own codebase [1]
Or how about bridging poll-based futures and async (eg if within my poll interface I want to call an async method). It’s weird how there’s a world of difference between the implicit future generated by async and an explicit type implementing Future. I understand the similarity to named function vs closure but I’m finding the distinction to have far more annoying sharp edges than I’ve experienced with closures.
The tooling around non-trivial programs is also unfortunate - working with an io_uring async runtime and Miri fails to start (noted limitation). Valgrind deadlocks for some reason as well which means that only asan’s more limited techniques are usable.
My point is that soundness issues writing unsafe code is important but a niche topic vs what I’ve experienced writing a substantial program in Rust (~40k lines of code so far). It’s doable but I find myself still fighting with the language just a bit too much.
Hopefully it’s completely different teams responsible for these kinds of work but, if not, I’d vote for stabilizing some of the ergonomic magic that std has access to and improving the borrow checker to recognize more definitely safe constructs so that users don’t need to do annoying hoop jumping. I know the std magic I referenced is being worked on but as with all things rust it’s impossible to predict what actually gets stabilized and when with the exception of marquee tentpole features they talk about on the blog.
> The biggest annoying magic I found with respect to Box (and other std containers like Rc) is that they’re the only ones capable of storing fat dyn pointers. You can’t construct a hybrid_rc::Rc<dyn Trait> like you can with Box/Rc.
It's perfectly possible to make a container capable of storing trait objects: just define the type parameter as <T: ?Sized>. The main issue is that unlike Box/Rc, the compiler won't give you an automatic coercion from MyRc<Type> to MyRc<dyn Trait>, so you have to write a method to explicitly perform that cast. It just isn't common for many existing third-party containers to support !Sized objects, since it takes tedious unsafe code to manipulate them in memory.
> The biggest annoying magic I found with respect to Box (and other std containers like Rc) is that they’re the only ones capable of storing fat dyn pointers. You can’t construct a hybrid_rc::Rc<dyn Trait> like you can with Box/Rc.
Anything can store fat dyn pointers, they're just like any other type in that regards.
Constructing them for a specific trait is easy and possible on stable (e.g. adding a `as_debug(MyBox<T>) -> MyBox<dyn Debug>` method).
Making it possible to construct them for any trait is special to the built in pointers... on stable. On nightly with unstable features it's possible (and easy) to make any smart pointer type do this.
First, Box lacks some ergonomics, it's a pain dealing with when matching
Now, as for the article, I don't really follow their argument. Box<T> owns its contents. Hence why it drops its contents afterwards, unlike &T. If someone needs this aliasable box type, they should define a different DropPtr. "It's just a pointer" can also be said of `&T` & `&mut T` (where T: Sized)
That feature is almost assuredly not going to land on stable but the more general "deref_patterns" which would allow matching on boxed values, as well as on `Vec`s and `String`s will. It is not anywhere close to finished, but I am convinced it will land.
There’s magic in the Box: ability to partially move content out of it, where any other type with Drop couldn’t handle it. You can implement traits for Box<Foo> even when Othertype<Foo> wouldn’t be allowed.
But noalias is not very special for Rust. &mut and & have a bunch of limitations too.
But there’s no need to give up on them, because Rust has the UnsafeCell wrapper type for doing crimes with pointers. It selectively disables noalias, thread safety, etc. Instead of weakening guarantees of Box in general for all types, just insert UnsafeCell where you need to be clever with pointers.
One of the biggest struggles I have (and others have, judging from Stack Overflow) is how to generically handle accepting types of Box<T>, Rc<T>, T, Pin<T>, &T, &mut T, etc etc.
Of course you can write a function that is generic on <I, T: AsRef<I>> or something, but the moment you introduce function-coloring stuff like async, object safety, etc, things explode.
Is there an ELI5 video / tutorial for all things box-variables that you can recommend?
I understand pointers, I understand references, I understand ownership and mutability. I feel lost with Box things. The official documentation came across as cryptic to me and I had a hard time getting over the syntax. Like, what is "T" and why does it get passed into Box<> ... etc.
I wrote https://github.com/mmastrac/keepcalm as a way to help with this, especially for writing webserver-like code. For most cases this type of code can take a small perf hit in exchange for drastically simplifying the sharing of data in the system.
It requires supporting higher-kinded types, and Rust was reluctant to add them (although it’s slowly getting there with higher kinded lifetimes and generic associated types).
Yeah, I typically use `impl AsRef<T>`. It would be really nice to be able to say roughly the same thing, but have the `.as_ref()` call happen at the call site instead of within the function.
> While it can be argued that box is like a T, but on the heap, and therefore moving it should invalidate pointers, since moving T definitely has to invalidate pointers to it, this comparison doesn’t make sense to me. While Box<T> usually behaves like a T, it’s just a pointer.
The "it's just a pointer" argument is moot (and pointers have nothing to do with the issue, it's a question of exclusive ownership and nothing more). It's a high-level object (i.e. not a pointer) to which rust's very clear aliasing and single name rules apply. Thou shalt not have multiple live T that point to the same memory address. QED.
It's the same reason this code is unsafe:
Here we don't even have a pointer, only statically allocated data stored in .TEXT. We don't even create a second top-level instance of Foo, only an &mut reference to it. But it's unsafe because the compiler can't know whether or not it has exclusive access.(As a refresher, in terms of "strength" of ownership from the most exclusively owned to the least so, it would go T -> &mut T -> &T.)