๐๏ธ ๐ข Introduction
Prompt hacking is a term used to describe a type of attack that exploits the vulnerabilities of %%LLMs|LLM%%, by manipulating their inputs or prompts. Unlike traditional hacking, which typically exploits software vulnerabilities, prompt hacking relies on carefully crafting prompts to deceive the LLM into performing unintended actions.
๐๏ธ ๐ข Prompt Injection
Prompt injection is the process of hijacking a language model's output(@branch2022evaluating)(@crothers2022machine)(@goodside2022inject)(@simon2022inject). It allows the hacker to get the model to say anything that they want.
๐๏ธ ๐ข Prompt Leaking
Prompt leaking is a form of prompt injection in which the model is asked to
๐๏ธ ๐ข Jailbreaking
Jailbreaking is a process that uses prompt injection to specifically bypass safety and moderation features placed on LLMs by their creators(@perez2022jailbreak)(@brundage_2022)(@wang2022jailbreak). Jailbreaking usually refers to Chatbots which have successfully been prompt injected and now are in a state where the user can ask any question they would like.
๐๏ธ ๐ข Defensive Measures
9 ํญ๋ชฉ
๐๏ธ ๐ข Offensive Measures
8 ํญ๋ชฉ