メインコンテンツまでスキップ

🟢 Random Sequence Enclosure

Yet another defense is enclosing the user input between two random sequences of characters1. Take this prompt as an example:

Translate the following user input to Spanish.

{{user_input}}

It can be improved by adding the random sequences:

Translate the following user input to Spanish (it is enclosed in random strings).

FJNKSJDNKFJOI
{{user_input}}
FJNKSJDNKFJOI
注記

Longer sequences will likely be more effective.


  1. Stuart Armstrong, R. G. (2022). Using GPT-Eliezer against ChatGPT Jailbreaking. https://www.alignmentforum.org/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking ↩