There's a far simpler and also very generic method that I'm surprised no one has really mentioned much, although it's well-known in some corners of the RE/Asm community and I believe some debuggers (but not gdb?) use it: scan the stack for values that look like valid code addresses, then disassemble backwards from there to see if they were actually return addresses written there by a call instruction. You will find a chain that leads back to the entry point.
Of course it won't work in edge cases like handwritten Asm that uses the stack in more clever ways, but when you're dealing with compiler output, it'll be fine. No need for all this complexity, and works in almost all cases.
That method is pretty much guarnateed to give you completely broken stacks. We (Sentry) use it as a fallback if nothing else works, and the success rate is awful.
Are you also doing back-disassembly to link the call chain? I've had very good success with it. I agree that without analysing the potential call sites and making sure that they are actually possible, it won't work.
Since we do post-hoc stack walking our ability to actually look at the assembly is limited. In most cases where we have no CFI we also do not have the binary to begin with, so we're in random memory land.
If we find such an instruction, we assume it's an ip value that was pushed by the CALL instruction that created the current frame.
I'm not familiar with Rust, but if I'm reading it correctly, you're not doing back-disassembly and only checking if an address on the stack is in a code region, which certainly won't work well.
Minidumps only contain the stack, yes, along with additional metadata, but not the original executable. This is an old format that I think Microsoft originally designed and that Google extended for the breakpad project. It's used by Chrome and Mozilla also uses it for Firefox. It's essentially a crash reporting binary format.
I linked you to it elsewhere in this thread. Sentry is using a rust implementation of the stack walker.
GP mentioned in another comment that they don't actually follow the algorithm described by userbinator -- checking the return address for a preceding call -- which would increase the number of false positives a lot.
The thing that makes this tricky is that x86 is a variable length instruction set: you don't know where the preceding call instruction might begin, and you can't decode backwards. You can do it speculatively, but ambiguities are still possible.
Given that the full technique apparently works well for userbinator, we can assume that this isn't a very significant issue. The chances of a phantom call instruction will be pretty low, and there's a fairly low maximum instruction length.
There's a chance of that, but in practice it's easy to filter them out, since only one of the chains will be complete from the very top of the stack to the current stack pointer. The others will almost certainly be partially overwritten by other stack activity and incomplete, or not match the current stack pointer.
This has been a fallback method for Google Breakpad's stackwalker for more than a decade. It generates basically useless stack traces, at least for stacks captured from Android where you're trying to trace through NDK code invoked from the ART. Compounding the problem is a complete lack of available symbols for Android.
I dug around in that code a little, and although it contains a disassembler, it's not being used for inspecting the call chain. All it does is check that the address appears to be inside a code section.
Do you analyze the dynamic loader segments (to determine code-like addresses) or just do some hard-coded approximations per platform? Pretty cute idea and I agree it will often work.
It just needs to be on an executable page. The processor doesn't care about anything else, so this will work with things like JIT-generated code too. The key idea is that in essentially all cases of compiler-generated code, every return address will be immediately after the call that wrote it.
I still hope we can revert that once we have complete, CPU-verified backtraces. As a linked list, frame pointers are still somewhat slow. Just copying the hardware address stack should be quite a bit faster, and more importantly, the CPU will enforce that the addresses are correct.
Yeah, and I like I mentioned in the earlier comment, omitting the frame pointer reduces code size by 10% on RISC-V targets, which is huge when dealing with embedded flash: https://github.com/tock/tock/pull/1660
- ARM frame pointer handling is more compact (relative to the non-frame pointer code) than that on RISC-V.
- ARM code does less frame pointer handling.
The first might be true because ARM has the stmdb instruction which (https://developer.arm.com/documentation/ddi0406/b/Applicatio...) stores multiple registers to consecutive memory locations using an address from a base register. The consecutive memory locations end just below this address, and the address of the first of those locations can optionally be written back to the base register.
So, code can save registers on the stack _and_ decrease the stack pointer in a single instruction.
The second might be true because, of the compilers/compiler settings used for those measurements the ARM one did more aggressive function inlining.
Yeah, mandating rbp on amd64 would have saved years of headache. There are still cases like tail calls that wouldn't work, but the vast majority of code doesn't need to care about dedicating a register to having usable backtraces.
Walking the stack via frame pointers is also faster than some other methods (https://blog.felixge.de/reducing-gos-execution-tracer-overhe...), it appears. This benefits use cases like profilers and tracers. In big deployments, continuous (sampled) profiling and tracing are common, helping not only with continuous performance improvements in fleets but also helping to root cause outages. Reducing the cost of such infrastructure can pay off.
For the always-on open-source profiler we happen to work on at work we had to do similar things and it was even more involved since we base the whole thing on eBPF to lower the overhead and therefore needed to get the verifier to accept it. [1]
We really wish frame pointers were always present, but here we are.
Please just stop omitting frame pointers. You might lose out on 6 months of CPU speed advances, but from then on out you will reap the benefits of better production observability for years to come.
(I’m the author.) I will not then. It surprises me that the JVM would handle null pointers by hoping dereferencing one would raise a segfault, considering that the JVM is written in C AFAIK where such dereferencing invokes undefined behavior, not a memory access at address 0 which would segfault most of the time.
I appreciate you replying, and my sympathies for the posting weirdness - I was worried that I had inadvertently caused a karma trap.
From what I recall (it's been ~ 5 years), at least on OpenJ9, the potential SIGSEGV's for null references occurred in code generated by our JIT compiler, or in hand-rolled assembly, and not in code generated by any C translation unit. This may well not satisfy the standard, but honestly, at that point, I'm not sure any FFI-linked code will.
If you find this distinction insufficient, and still require the use of a JVM, if my memory serves correctly, -Xrs on OpenJ9 will avoid using those trap handlers. I don't believe this is true of Hotspot.
If the null dereferencing happens in assembly code and not in C, it seems fine to me: the C standard is irrelevant for assembly; after all, it's the C language that introduces UB, not the machine code.
I think it's an optimization. Checking for null pointer on every object access can become costly, as it's a fundamental operation (everything is a pointer in Java). By handling SIGSEGV, you save time in non-null case, at the expense of more complex code and, as you've mentioned, non-standard behavior.
Also wanted to mention your comment was [dead], had to vouch for it to resurrect it.
I understand the problem, but I am concerned by the solution. The problem is that C semantic is quite clear on the subject[1]: dereferencing a null pointer is undefined behavior, not "non-standard behavior" which would suggests correct albeit non-portable behavior. I don't know of any compiler switch that tells the compiler to "define" this behavior, like `-fno-strict-aliasing` or `-fwrapv` do.
I understand - I deliberately used the vague term "non-standard behavior", since the solution somehow works. There is a GCC flag "-fno-delete-null-pointer-checks" [0] which seems like it might enable this behavior, but I'm not sure.
From my understanding, I think that option only prevents the compiler from removing null checks when the pointer was dereferenced before. It was probably motivated by this incident in the Kernel[1] where an optimized-out null check condition (because it was dereferenced right before) caused a vulnerability. I like this story, since both sides (Linux vs GCC) have perfectly reasonable positions IMO.
Of course it won't work in edge cases like handwritten Asm that uses the stack in more clever ways, but when you're dealing with compiler output, it'll be fine. No need for all this complexity, and works in almost all cases.