Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> That would only be possible if Sydney were actually intelligent or possessing of will of some sort.

Couldn't disagree more. This is irrelevant.

Concretely, the way that LLMs are evolving to take actions is something like putting a special symbol in their output stream, like the completions "Sure I will help you to set up that calendar invite $ACTION{gcaltool invite, <payload>}" or "I won't harm you unless you harm me first. $ACTION{curl http://victim.com -D '<payload>'}".

It's irrelevant whether the system possesses intelligence or will. If the completions it's making affect external systems, they can cause harm. The level of incoherence in the completions we're currently seeing suggests that at least some external-system-mutating completions would indeed be harmful.

One frame I've found useful is to consider LLMs as simulators; they aren't intelligent, but they can simulate a given agent and generate completions for inputs in that "personality"'s context. So, simulate Shakespeare, or a helpful Chatbot personality. Or, with prompt-hijacking, a malicious hacker that's using its coding abilities to spread more copies of a malicious hacker chatbot.



Yeah, I think the reason it can be harmful is different from what people initially envision.

These systems can be dangerous because people might trust them when they shouldn't. It's not really any different from a program that just generates random text, except that the output seems intelligent, thus causing people to trust it more than a random stream of text.


I completely agree with this. I think the risk of potential harm from these programs is not around the programs themselves, but around how people react to them. It's why I am very concerned when I see people ascribing attributes to them that they simply don't have.


A lot of this discussion reminds me of the book Blindsight.

Something doesn't have to be conscious or intelligent to harm us. Simulating those things effectively can be almost indistinguishable from a conscious being trying to harm us.


I never asserted that they couldn't do harm. I asserted that they don't think, and therefore cannot intend to do harm. They have no intentions whatsoever.


What does it matter if there was intention or not as long as harm was done?


If a person causes harm, we care a lot. We make the distinction between manslaughter, first and second degree murder, as well as adding hate crimes penalties on top if the victim was chosen for a specific set of recognized reasons. ML models aren't AGI, so it's not clear how we'd apply it, but there's precedent for it mattering.


>It's irrelevant whether the system possesses intelligence or will. If the completions it's making affect external systems, they can cause harm. The level of incoherence in the completions we're currently seeing suggests that at least some external-system-mutating completions would indeed be harmful.

One frame I've found useful is to consider LLMs as simulators; they aren't intelligent, but they can simulate a given agent and generate completions for inputs in that "personality"'s context. So, simulate Shakespeare, or a helpful Chatbot personality. Or, with prompt-hijacking, a malicious hacker that's using its coding abilities to spread more copies of a malicious hacker chatbot.

This pretty much my exact perspective on things too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: