What better way to drive the point home, to demonstrate that corporate claims of...

ben_w · on June 14, 2024

Well, those specific corporate claims of safety and oversight when the model is downloadable.

I know OpenAI gets a lot of flak for not letting people download their model weights, but this is kinda why I agree with them in principle. In practice, so far, it seems that even the best model isn't a threat; but if the models are downloadable, we'll only know that any given model is a threat when it's too late to do anything about it.

I think the only way to be sure a sufficiently powerful model is "safe" to distribute is something which might be impossible: unless and until we know how to make a model such that its concept of good and evil* cannot be removed even by someone who has read and write access to all the weights, I expect someone to be able to find the equivalent of the "good/evil" switch and change it between them whenever they feel like it.

* for the purpose of this discussion: it does not matter whose concept of good and evil the AI is aligned with, given the point is that I expect it can be deleted regardless.