See also Bubblewrap[1]. I use Bubblewrap all the time, because it's just useful, and a bit easier than doing things with unshare directly.
To compare and contrast: bubblewrap is lower level, and is great for embedding or using in one-off invocations, whereas firejail is more oriented towards adding a bit of hardening to everyday applications like Firefox via built-in and custom profiles.
For example, you could temporarily override a path doing something like this:
to, say, override the KDE plugins while testing. This is useful for me since it's rather challenging during development to actually get KDE apps to reliably load my plugins on NixOS: I think kio slaves are probably wrapped and getting other environments injected into them. Rather than bother with any tricky hacks, Linux namespaces make it relatively easy to test regardless.
Bubblewrap is used internally by Flatpak and others.
I've been using it to wrap nodejs package managers in response to all malware shipped through them over the past few years. A simple wrapper script like this works fine (symlinked to `npm`/`yarn`/`pnpm` into whatever directory is the first in $PATH):
It gets read-write access to the current path, and to its own copy of ~/.cache and ~/.local/share. It doesn't see anything else from your home directory, including any adjacent projects. It can easily be locked down further by allowing read-only access to the project, and read-write access to node_modules and nothing else.
node-gyp/C extensions will work with whatever you have installed, which is a nice plus. In containers you obviously only start with what is in the OCI image, and the node_modules will probably not work in the host machine unless the container closely matches the host environment.
My understanding: It's sandboxing to protect against exploits delivered via supply chain attacks, which often use low hanging fruit like hooks on install to steal tokens/etc. It's definitely not perfect, but it does not hurt either.
It does not, supply chain attack is not that afaik. This is to defend against npm executing arbitrary nefarious code as your user on install of a compromised package.
Also sandoxing npm really means sandboxing node running npm (which is js).
Sandboxing capabilities of operating systems is surprisingly bad. It's very strange to me that there isn't an idiot-proof user-space function call for "start a subprocess, limited to these files/directories, this port, this much drive space, this much CPU, and this much memory".
Of course you can do all these things (on Linux at least), but it involves a dizzying array of technologies.
To get the best isolation you need to patch the source — the application needs to go through initial setup and then drop privileges to the absolute possible minimum. But it's easy to make custom wrappers for third-party applications — the above profiles taken from the OpenBSD ports tree are the proof.
That's because it tends to break every program that wasn't built to run in a sandbox. There are (bad, ultimately) reasons why sandboxing is hard unless the ecosystem and toolchains have been built from the ground up to run a program in a sandbox.
For example, any dynamically linked program that needs to spawn a subprocess using a command that it doesn't ship alongside itself is going to cause problems in your sandbox. That turns out to be the vast majority of programs that have been shipped in the FHS universe for the last few decades. It's only recently that sandboxing has come into vogue.
And the OS's that do a good job (like MacOS) get a bad rap from both developers and users who are mad their programs can't do what they want them to do because they don't have unfettered access to the filesystem any more.
But we can just not sandbox the programs that break inside a sandbox. We don't lose anything we currently have. Meanwhile, more and more programs will start supporting running in a sandbox.
Yes but sudo. You shouldn't have to escalate privileges to decrease privileges. Docker has the same problem. This has improved somewhat with user namespaces but it's still comicated and requires root to set up.
Yes, I don't understand why systemd doesn't support it sudo-less. It's even designed to run units configured by unprivileged users, but those can't access sandboxing features at all.
The aforementioned user namespaces and accompanying complexity come into play here, and you're going to take a performance hit using FUSE for fs and slirp for networking. I'm not saying these are bad tools, just that we need something simpler without sacrificing performance.
There might be complexity under the hood, but on a fresh Fedora install I can do, out of the box:
$ podman run -it myimage bash
Fuse is rather irrelevant since I will mount volumes if I need to do IO. And slirp has been benchmarked to 9+ Gbps. So sure there's always room for improvement but the current situation is pretty sweet imo.
Capability based OS like Fuchsia do that by default. Actually I'm not sure about memory or CPUs but for other stuff you can't access it unless the parent process explicitly grants you access.
This is what happens when you bolt new security models onto a massive heap of millions of lines of code.
When I started using Linux, the security model had features like root being allowed to pwn the kernel and just about anything else. Now /dev/mem is gone, lockdown exists, namespaces exist, there are many more capabilities than there used to be, seccomp and seccomp-bpf exist, etc.
A lot of security ideas on Linux also rely on trusting the software, a model that may very well make sense in the context of Linux distributions with repositories of open source software. In this case, software itself opts into sandboxing to harden itself. This is pretty much what goes on with stuff like Flatpak. This also sounds like the idea with Pledge.
Can we do better? Maybe. I think adding true, practical sandboxing onto existing kernels not designed with this in mind is possible with tradeoffs. gVisor is an interesting approach: it's not a panacea but certainly a step in the right direction, using a usermode kernel that is itself separated into modules that are heavily locked down using ordinary Linux security mechanisms. This adds an additional layer to the moat that in theory is quite strong.
Another approach is virtual machines. And yes, this approach has some reasonable critique: typical VMMs and virtual machine software are very complex and involve needing to trust that the hardware was implemented correctly that the design isn't flawed, something that some people never trusted and others have lost some faith in with many hardware flaws and Qemu bugs coming to light. However, I still think lightweight virtual machines ought not be ignored entirely. Improvements have been made over time that make the idea of using virtual machines for this purpose a bit more within reason. For example, Firecracker and Ignite are pretty interesting tools for running software in a more sandboxed manner. And, even with virtual machines potentially having flaws, a lot of the time it may at least require root privileges or more to actually exploit some flaws, which makes this not entirely useless from a defense-in-depth standpoint.
But you just want an idiot proof way to call some binary and provide some limits, and have it be guaranteed to actually enforce those limits as a proper security boundary. Do we have it?... No. Will it be simple, something that can be done with some simple syscalls? Probably not. Could it be done with an unfortunately necessarily complicated piece of code? Probably yeah. I'd love to see a tool that can give you basically seamless sandboxing with simple options and backends for different sandboxing approaches like using VMMs or gVisor. Provide easy ways to run X11 or Wayland apps and automatically give you a sandboxed Xephyr instance or something to provide some sandboxing for Xorg apps. Will it happen? Dunno.
As for me, I'd really like a way to create a namespace where only some hosts on the network are accessible. It's possible to do today using various technologies, but it'd be a lot easier using something like the gVisor model with a usermode networking stack, I think. Maybe some day.
Last time I tried this, the overlayfs was disabled in response to a CVE. Once that's re-enabled, I do think this could be a useful tool with some better wrappers; running every application in its own firejail (assuming easy to set up) would be pretty sweet.
Agree, though I think it can be far simpler. The UI that they recommend is kind of janky, and 50% of users won't want to wade through configuration files. For example, I could imagine a UI that shows you all of your .Desktop files, and lets you drag anything you like into a "jailed" zone in the UI, then gives you basic options. Right click on any of the jailed icons for further configuration.
Isn't this the SUID binary that trusted the USER env variable for instant privilege escalation? As the kind of person who likes replacing sudo with doas, this is not something I'd install.
I’ve been using this regularly on arch linux and found out the hard way that the profile included for Firefox disabled hardware acceleration. To get it re-enabled I ended up creating a custom profile that extended from the main Firefox profile. Overall I’m impressed with the level of granularity. I haven’t bothered to look more into why that profile had hardware acceleration disabled. I guess as a way to protect from crypto mining exploits using your gpu.
I played with this for an evening and in the end decided that if I needed this level of security (assuming browser as the target) I was better off running a ChromeOS Flex VM in qemu.
It works very well, boots quickly and is definitely well isolated. (Probably maybe)
If you want something different, but aren't as concerned about performance you can use User-Mode Linux to do this, which I used to use to isolate package building as well as working around bugs in BtrFS. I have a simple script here [0] (though it relies on some setup here [1]). Although I haven't looked back at it for a bit (2016).
Yes, buuut the way to do it without root is user namespaces, which have a history of causing their own security problems because the kernel historically assumed that the root user wasn't a threat and only root could access certain interfaces that just got exposed to any user who wanted them. It's getting better with time, thankfully.
bubblewrap does this and has been used by flatpak for years. firejail has a lot more features though (which is questionable for a security-critical application, sure), like setting up separate iptables rules, which you would have to do manually with bubblewrap.
The sandbox itself is a very small process. The setup is fast, typically several milliseconds. After an application is started, the sandbox process goes to sleep and doesn’t consume any resources. All of the security features invoked are implemented inside the kernel, and run at kernel speed with minimal overhead.
TL;DR: Firejail has much more comprehensive features than Flatpak (Bubblewrap). Firejail also has more comprehensive network support, support for AppArmor and SELinux, and easier seccomp filtering.
Compared to Snap (which uses AppArmor), Firejail is not only compatible with AppArmor but again goes above and beyond with a lot of additional features.
It isn't just linux namespaces, you have mounted namespaces, cgroups, seccomp filters etc...
From what I remember it certainly isn't as simple as you present it and additional features tends to involve additional surface area.
Other aspects in the past was that Firejail's approach to capabilities and namespaces enabled it to act as a setuid binary to gaining root privileges on the host.
Snaps I seem to recall had defined mounted namespaces that didn't need this privilege escalation. Their apparmor profiles are well defined and tunable via the connections mechanism. Although I seem to recall that too having some CVEs present.
In terms of trust though, I rather do trust snaps a lot more than firejail even if the latter is more tunable. I see that latter aspect as a downside, relative to snaps which already come confined and tuned.
There is also systemd itself which can be used in a similar way, and also has quite a nice tool for checking the exposure of a given sandbox.
starting with the Ubuntu 20.04 package base, the Chromium package is indeed empty and acting, without your consent, as a backdoor by connecting your computer to the Ubuntu Store. Applications in this store cannot be patched, or pinned. You can’t audit them, hold them, modify them or even point snap to a different store. You’ve as much empowerment with this as if you were using proprietary software, i.e. none. This is in effect similar to a commercial proprietary solution, but with two major differences: It runs as root, and it installs itself without asking you.
The snap daemon runs as root, is a resources hog, and really only works well in Ubuntu. Oh and snaps in general suck sooo much, but that's of course just an opinion.
Urgh...that list of security features gives me a headache though. That's the problem with all these systems: Linux keeps growing more. Some of them overlap, some don't. "Best practice" tends to be "use all of them" but what's "all"?
We've got a real problem that the scope of things which have to be controlled keeps growing, and the ways to do that keep increasing as well.
And perhaps worse: there's precious little documentation of the implications of turning some of these features off. But that's far and away the most common thing you need to do because usually you have something that's broken, unless you disable a "security" feature.
To compare and contrast: bubblewrap is lower level, and is great for embedding or using in one-off invocations, whereas firejail is more oriented towards adding a bit of hardening to everyday applications like Firefox via built-in and custom profiles.
For example, you could temporarily override a path doing something like this:
to, say, override the KDE plugins while testing. This is useful for me since it's rather challenging during development to actually get KDE apps to reliably load my plugins on NixOS: I think kio slaves are probably wrapped and getting other environments injected into them. Rather than bother with any tricky hacks, Linux namespaces make it relatively easy to test regardless.Bubblewrap is used internally by Flatpak and others.
https://github.com/containers/bubblewrap