Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What better way to drive the point home, to demonstrate that corporate claims of safety and oversight are empty lies and fundamentally futile, than to take a SOTA OSS LLM and break it open, shortly after its release, using a simple method that likely generalizes to all generative models, language or otherwise?


Well, those specific corporate claims of safety and oversight when the model is downloadable.

I know OpenAI gets a lot of flak for not letting people download their model weights, but this is kinda why I agree with them in principle. In practice, so far, it seems that even the best model isn't a threat; but if the models are downloadable, we'll only know that any given model is a threat when it's too late to do anything about it.

I think the only way to be sure a sufficiently powerful model is "safe" to distribute is something which might be impossible: unless and until we know how to make a model such that its concept of good and evil* cannot be removed even by someone who has read and write access to all the weights, I expect someone to be able to find the equivalent of the "good/evil" switch and change it between them whenever they feel like it.

* for the purpose of this discussion: it does not matter whose concept of good and evil the AI is aligned with, given the point is that I expect it can be deleted regardless.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: