If you want the AI to do anything useful, you need to be able to trust it with the access to useful things. Sandboxing doesn't solve this.
Full isolation hasn't been taken seriously because it's expensive, both in resources and complexity. Same reason why microkernels lost to monolithic ones back in the day, and why very few people use Qubes as a daily driver. Even if you're ready to pay the cost, you still need to design everything from the ground up, or at least introduce low attack surface interfaces, which still leads to pretty major changes to existing ecosystems.
Microkernels lost "back in the day" because of how expensive syscalls were, and how many of them a microkernel requires to do basic things.
That is mostly solved now, both by making syscalls faster, and also by eliminating them with things like queues in shared memory.
> you still need to design everything from the ground up
This just isn't true. The components in use now are already well designed, meaning they separate concerns well, and can be easily pulled apart.
This is true of kernel code and userspace code.
We just witnessed a filesystem enter and exit the linux kernel within the span of a year. No "ground up" redesign needed.
> If you want the AI to do anything useful, you need to be able to trust it with the access to useful things. Sandboxing doesn't solve this.
By default, AI cannot be trusted because it is not deterministic. You can't audit what the output of any given prompt is going to be to make sure its not going to rm -rf /
We need some form of behavioral verification/auditing with guarantees that any input is proven to not produce any number of specific forbidden outputs.
Determinism is an absolute red herring. A correct output can be expressed in an infinite amount of ways, all of them valid. You can always make an LLM give deterministic outputs (with some overhead), that might bring you limited reproducibility, but that won't bring you correctness. You need correctness, not determinism.
>We need some form of behavioral verification/auditing with guarantees that any input is proven to not produce any number of specific forbidden outputs.
You want the impossible. The domain LLMs operate on is inherently ambiguous, thus you can't formally specify your outputs correctly or formally prove them being correct. (and yes, this doesn't have anything to do with determinism either, it's about correctness)
You just have to accept the ambiguousness, and bring errors or deviation to the rates low enough to trust the system. That's inherent to any intelligence, machine or human.
This comment I'm making is mostly useless nitpicking, and I overall agree with your point. Now I will commence my nitpicking:
I suspect that it may merely be infeasible, not strictly impossible. There has been work on automatically proving that an ANN satisfies certain properties (iirc e.g. some kinds of robustness to some kinds of adversarial inputs, for handling images).
It might be possible (though infeasible) to have an effective LLM along with a proof that e.g. it won't do anything irreversible when interacting with the operating system (given some formal specification of how the operating system behaves).
But, yeah, in practice I think you are correct.
It makes more sense to put the LLM+harness in an environment which ensures you can undo whatever it does if it messes things up, than to try to make the LLM be such that it certainly won't produce outputs that would mess things up in a way that isn't easily revertible, even if it does turn out that the latter is in principle possible.
You need both. And there AI models where it's input+prompt+seed that are 100% deterministic.
It's really not much to ask that for the exact same input (data in/prompt/seed) we get the exact same output.
I'm willing to bet that it's going to be the exact same as 100% reproducible builds: people have complained for years "but timestamps about build time makes it impossible" and whatnots but in the end we got our reproducible builds. At some point logic is simply going to win and we'll get more and more models that are 100% deterministic.
And this has absolutely no relation whatsoever to correctness.
Full isolation hasn't been taken seriously because it's expensive, both in resources and complexity. Same reason why microkernels lost to monolithic ones back in the day, and why very few people use Qubes as a daily driver. Even if you're ready to pay the cost, you still need to design everything from the ground up, or at least introduce low attack surface interfaces, which still leads to pretty major changes to existing ecosystems.