I understand there's a difference between the stated values and actual values of individuals and organizations, and so I want to ask this in the most pragmatic and consequentialist way.
I know that labs, institutions, and so on have safety teams. I know the folks doing that work are serious and earnest about that work. But at this point are these institutions merely pandering to the notion of safety with some token level of investment? In the way that a Casino might fund programs to address gambling addiction.
I'm an outsider and can only guess. Insider insight would be very appreciated.
It doesn't mean much to me if a safe model is one that does not output the recipe for mustard gas, that information is trivially available elsewhere.
Or, is a safe model one that doesn't come off as racist? Ok but i would classify that as unoffensive instead of safe but I admit definitions of words can be fluid and change.
Is a safe model one that refuses to produce code for a weapons system? Well.. does a PID controller count? I can use that to keep a gun pointed at a target or i can use that to prevent a baby rocker from falling over.
Maybe they're giving up on "safe" because there's no definitive way to know if a model is safe or not. I've always held the opinion that ai safety was more about brand safety. Maybe now the model providers can afford some bad press and it not be the death of their company.
reply