Skip to main content

๐ŸŸข Filtering

Filtering is a common technique for preventing prompt hacking1. There are a few types of filtering, but the basic idea is to check for words and phrase in the initial prompt or the output that should be blocked. You can use a blocklist or an allowlist for this purpose2. A blocklist is a list of words and phrases that should be blocked, and an allowlist is a list of words and phrases that should be allowed.


  1. Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks. โ†ฉ
  2. Selvi, J. (2022). Exploring Prompt Injection Attacks. https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/ โ†ฉ