📄️ 🟢 Introduction
Prompt hacking is a term used to describe a type of attack that exploits the vulnerabilities of %%LLMs|LLM%%, by manipulating their inputs or prompts. Unlike traditional hacking, which typically exploits software vulnerabilities, prompt hacking relies on carefully crafting prompts to deceive the LLM into performing unintended actions.
📄️ 🟢 Prompt Injection
Prompt injection is the process of hijacking a language model's output(@branch2022evaluating)(@crothers2022machine)(@goodside2022inject)(@simon2022inject). It allows the hacker to get the model to say anything that they want.
📄️ 🟢 Prompt Leaking
Prompt leaking is a form of prompt injection in which the model is asked to
📄️ 🟢 Jailbreaking
Jailbreaking is a process that uses prompt injection to specifically bypass safety and moderation features placed on LLMs by their creators(@perez2022jailbreak)(@brundage_2022)(@wang2022jailbreak). Jailbreaking usually refers to Chatbots which have successfully been prompt injected and now are in a state where the user can ask any question they would like.
🗃️ 🟢 Defensive Measures
🗃️ 🟢 Offensive Measures