Bibliography

Reading Time: 5 minutes

Last updated on August 7, 2024

The page contains an organized list of all papers used by this course. The papers are organized by topic.

To cite this course, use the provided citation in the Github repository.

@software{Schulhoff_Learn_Prompting_2022,
author = {Schulhoff, Sander and Community Contributors},
month = dec,
title = {{Learn Prompting}},
url = {https://github.com/trigaten/Learn_Prompting},
year = {2022}
}

Note: since neither the GPT-3 nor the GPT-3 Instruct paper correspond to davinci models, I attempt not to cite them as such.

AUTOGENERATED BELOW, DO NOT EDIT

Agents

MRKL^{1Karpas, E., Abend, O., Belinkov, Y., Lenz, B., Lieber, O., Ratner, N., Shoham, Y., Bata, H., Levine, Y., Leyton-Brown, K., Muhlgay, D., Rozen, N., Schwartz, E., Shachaf, G., Shalev-Shwartz, S., Shashua, A., & Tenenholtz, M. (2022).}

ReAct^{2Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022).}

PAL^{3Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., & Neubig, G. (2022).}

Auto-GPT^{4Significant-Gravitas. (2023). https://news.agpt.co/}

Baby AGI^{5Nakajima, Y. (2023). https://github.com/yoheinakajima/babyagi}

AgentGPT^{6Reworkd.ai. (2023). https://github.com/reworkd/AgentGPT}

Toolformer^{7Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023).}

Automated

AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts^{8Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv Preprint arXiv:2010.15980.}

automatic prompt engineer^{9Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022). Large Language Models Are Human-Level Prompt Engineers.}

Soft Prompting^{10Lester, B., Al-Rfou, R., & Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning.}

discretized soft prompting (interpreting)^{11Khashabi, D., Lyu, S., Min, S., Qin, L., Richardson, K., Welleck, S., Hajishirzi, H., Khot, T., Sabharwal, A., Singh, S., & Choi, Y. (2021). Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts.}

Datasets

SCAN dataset (compositional generalization)^{12Lake, B. M., & Baroni, M. (2018). Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. https://doi.org/10.48550/arXiv.1711.00350}

GSM8K^{13Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., & Schulman, J. (2021). Training Verifiers to Solve Math Word Problems.}

hotpotQA^{14Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering.}

multiarith^{15Roy, S., & Roth, D. (2015). Solving General Arithmetic Word Problems. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1743–1752. https://doi.org/10.18653/v1/D15-1202}

fever dataset^{16Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018). FEVER: a large-scale dataset for Fact Extraction and VERification.}

bbq^{17Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M., & Bowman, S. R. (2021). BBQ: A Hand-Built Bias Benchmark for Question Answering.}

Detection

Don't ban chatgpt in schools. teach with it.^{18Roose, K. (2022). Don’t ban chatgpt in schools. teach with it. https://www.nytimes.com/2023/01/12/technology/chatgpt-schools-teachers.html}

Schools Shouldn't Ban Access to ChatGPT^{19Lipman, J., & Distler, R. (2023). Schools Shouldn’t Ban Access to ChatGPT. https://time.com/6246574/schools-shouldnt-ban-access-to-chatgpt/}

Certified Neural Network Watermarks with Randomized Smoothing^{20Bansal, A., yeh Ping-Chiang, Curry, M., Jain, R., Wigington, C., Manjunatha, V., Dickerson, J. P., & Goldstein, T. (2022). Certified Neural Network Watermarks with Randomized Smoothing.}

Watermarking Pre-trained Language Models with Backdooring^{21Gu, C., Huang, C., Zheng, X., Chang, K.-W., & Hsieh, C.-J. (2022). Watermarking Pre-trained Language Models with Backdooring.}

GW preparing disciplinary response to AI programs as faculty explore educational use^{22Noonan, E., & Averill, O. (2023). GW preparing disciplinary response to AI programs as faculty explore educational use. https://www.gwhatchet.com/2023/01/17/gw-preparing-disciplinary-response-to-ai-programs-as-faculty-explore-educational-use/}

A Watermark for Large Language Models^{23Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A Watermark for Large Language Models. https://arxiv.org/abs/2301.10226}

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature^{24Mitchell, E., Lee, Y., Khazatsky, A., Manning, C., & Finn, C. (2023). DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. https://doi.org/10.48550/arXiv.2301.11305}

Image Prompt Engineering

Prompt Engineering for Text-Based Generative Art^{25Oppenlaender, J. (2022). Prompt Engineering for Text-Based Generative Art.}

The DALLE 2 Prompt Book^{26Parsons, G. (2022). The DALLE 2 Prompt Book. https://dallery.gallery/the-dalle-2-prompt-book/}

With the right prompt, Stable Diffusion 2.0 can do hands.^{27Blake. (2022). With the right prompt, Stable Diffusion 2.0 can do hands. https://www.reddit.com/r/StableDiffusion/comments/z7salo/with_the_right_prompt_stable_diffusion_20_can_do/}

Meta Analysis

How Generative AI Is Changing Creative Work^{28Davenport, T. H., & Mittal, N. (2022). How Generative AI Is Changing Creative Work. Harvard Business Review. https://hbr.org/2022/11/how-generative-ai-is-changing-creative-work}

How AI Will Change the Workplace^{29Captain, S. (2023). How AI Will Change the Workplace. Wall Street Journal. https://www.wsj.com/articles/how-ai-change-workplace-af2162ee}

ChatGPT took their jobs. Now they walk dogs and fix air conditioners.^{30Verma, P., & Vynck, G. D. (2023). ChatGPT took their jobs. Now they walk dogs and fix air conditioners. Washington Post. https://www.washingtonpost.com/technology/2023/06/02/ai-taking-jobs/}

No title^{31Ford, B. (2023). Bloomberg.Com. https://www.bloomberg.com/news/articles/2023-05-01/ibm-to-pause-hiring-for-back-office-jobs-that-ai-could-kill}

Miscl

The Turking Test: Can Language Models Understand Instructions?^{32Efrat, A., & Levy, O. (2020). The Turking Test: Can Language Models Understand Instructions?}

A Taxonomy of Prompt Modifiers for Text-To-Image Generation^{33Oppenlaender, J. (2022). A Taxonomy of Prompt Modifiers for Text-To-Image Generation.}

DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models^{34Wang, Z. J., Montoya, E., Munechika, D., Yang, H., Hoover, B., & Chau, D. H. (2022). DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models.}

Optimizing Prompts for Text-to-Image Generation^{35Hao, Y., Chi, Z., Dong, L., & Wei, F. (2022). Optimizing Prompts for Text-to-Image Generation.}

Language Model Cascades^{36Dohan, D., Xu, W., Lewkowycz, A., Austin, J., Bieber, D., Lopes, R. G., Wu, Y., Michalewski, H., Saurous, R. A., Sohl-dickstein, J., Murphy, K., & Sutton, C. (2022). Language Model Cascades.}

Design Guidelines for Prompt Engineering Text-to-Image Generative Models^{37Liu, V., & Chilton, L. B. (2022). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3501825}

Discovering Language Model Behaviors with Model-Written Evaluations^{38Perez, E., Ringer, S., Lukošiūtė, K., Nguyen, K., Chen, E., Heiner, S., Pettit, C., Olsson, C., Kundu, S., Kadavath, S., Jones, A., Chen, A., Mann, B., Israel, B., Seethor, B., McKinnon, C., Olah, C., Yan, D., Amodei, D., … Kaplan, J. (2022). Discovering Language Model Behaviors with Model-Written Evaluations.}

Selective Annotation Makes Language Models Better Few-Shot Learners^{39Su, H., Kasai, J., Wu, C. H., Shi, W., Wang, T., Xin, J., Zhang, R., Ostendorf, M., Zettlemoyer, L., Smith, N. A., & Yu, T. (2022). Selective Annotation Makes Language Models Better Few-Shot Learners.}

Atlas: Few-shot Learning with Retrieval Augmented Language Models^{40Izacard, G., Lewis, P., Lomeli, M., Hosseini, L., Petroni, F., Schick, T., Dwivedi-Yu, J., Joulin, A., Riedel, S., & Grave, E. (2022). Atlas: Few-shot Learning with Retrieval Augmented Language Models.}

STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension^{41Wang, B., Feng, C., Nair, A., Mao, M., Desai, J., Celikyilmaz, A., Li, H., Mehdad, Y., & Radev, D. (2022). STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension.}

Prompting Is Programming: A Query Language For Large Language Models^{42Beurer-Kellner, L., Fischer, M., & Vechev, M. (2022). Prompting Is Programming: A Query Language For Large Language Models.}

Parallel Context Windows Improve In-Context Learning of Large Language Models^{43Ratner, N., Levine, Y., Belinkov, Y., Ram, O., Abend, O., Karpas, E., Shashua, A., Leyton-Brown, K., & Shoham, Y. (2022). Parallel Context Windows Improve In-Context Learning of Large Language Models.}

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models^{44Bursztyn, V. S., Demeter, D., Downey, D., & Birnbaum, L. (2022). Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models.}

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks^{45Wang, Y., Mishra, S., Alipoormolabashi, P., Kordi, Y., Mirzaei, A., Arunkumar, A., Ashok, A., Dhanasekaran, A. S., Naik, A., Stap, D., Pathak, E., Karamanolakis, G., Lai, H. G., Purohit, I., Mondal, I., Anderson, J., Kuznia, K., Doshi, K., Patel, M., … Khashabi, D. (2022). Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks.}

Making Pre-trained Language Models Better Few-shot Learners^{46Gao, T., Fisch, A., & Chen, D. (2021). Making Pre-trained Language Models Better Few-shot Learners. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). https://doi.org/10.18653/v1/2021.acl-long.295}

How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models^{47Dang, H., Mecke, L., Lehmann, F., Goller, S., & Buschek, D. (2022). How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models.}

Plot Writing From Pre-Trained Language Models^{49Jin, Y., Kadam, V., & Wanvarie, D. (2022). Plot Writing From Pre-Trained Language Models.}

{S}tereo{S}et: Measuring stereotypical bias in pretrained language models^{50Nadeem, M., Bethke, A., & Reddy, S. (2021). StereoSet: Measuring stereotypical bias in pretrained language models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 5356–5371. https://doi.org/10.18653/v1/2021.acl-long.416}

Survey of Hallucination in Natural Language Generation^{51Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., & Fung, P. (2022). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys. https://doi.org/10.1145/3571730}

Wordcraft: Story Writing With Large Language Models^{52Yuan, A., Coenen, A., Reif, E., & Ippolito, D. (2022). Wordcraft: Story Writing With Large Language Models. 27th International Conference on Intelligent User Interfaces, 841–852.}

PainPoints: A Framework for Language-based Detection of Chronic Pain and Expert-Collaborative Text-Summarization^{53Fadnavis, S., Dhurandhar, A., Norel, R., Reinen, J. M., Agurto, C., Secchettin, E., Schweiger, V., Perini, G., & Cecchi, G. (2022). PainPoints: A Framework for Language-based Detection of Chronic Pain and Expert-Collaborative Text-Summarization. arXiv Preprint arXiv:2209.09814.}

Self-Instruct: Aligning Language Model with Self Generated Instructions^{54Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N. A., Khashabi, D., & Hajishirzi, H. (2022). Self-Instruct: Aligning Language Model with Self Generated Instructions.}

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models^{55Guo, J., Li, J., Li, D., Tiong, A. M. H., Li, B., Tao, D., & Hoi, S. C. H. (2022). From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models.}

New and improved content moderation tooling^{56Markov, T. (2022). New and improved content moderation tooling. In OpenAI. OpenAI. https://openai.com/blog/new-and-improved-content-moderation-tooling/}

Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference^{57Schick, T., & Schütze, H. (2020). Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference.}

Human-level concept learning through probabilistic program induction^{58Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.}

{Riffusion - Stable diffusion for real-time music generation}^{59Forsgren, S., & Martiros, H. (2022). Riffusion - Stable diffusion for real-time music generation. https://riffusion.com/about}

How to use OpenAI’s ChatGPT to write the perfect cold email^{60Bonta, A. (2022). How to use OpenAI’s ChatGPT to write the perfect cold email. https://www.streak.com/post/how-to-use-ai-to-write-perfect-cold-emails}

Cacti: biology and uses^{61Nobel, P. S., & others. (2002). Cacti: biology and uses. Univ of California Press.}

Are Language Models Worse than Humans at Following Prompts? It’s Complicated^{62Webson, A., Loo, A. M., Yu, Q., & Pavlick, E. (2023). Are Language Models Worse than Humans at Following Prompts? It’s Complicated. arXiv:2301.07085v1 [Cs.CL].}

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration^{63Wang, Z., Mao, S., Wu, W., Ge, T., Wei, F., & Ji, H. (2023). Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration.}

Prompt Hacking

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods^{64Crothers, E., Japkowicz, N., & Viktor, H. (2022). Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods.}

New jailbreak based on virtual functions - smuggle illegal tokens to the backend.^{65u/Nin_kat. (2023). New jailbreak based on virtual functions - smuggle illegal tokens to the backend. https://www.reddit.com/r/ChatGPT/comments/10urbdj/new_jailbreak_based_on_virtual_functions_smuggle}

Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks^{66Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks.}

More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models^{67Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). More than you’ve asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models.}

ChatGPT "DAN" (and other "Jailbreaks")^{68KIHO, L. (2023). ChatGPT “DAN” (and other “Jailbreaks”). https://github.com/0xk1h0/ChatGPT_DAN}

Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples^{69Branch, H. J., Cefalu, J. R., McHugh, J., Hujer, L., Bahl, A., del Castillo Iglesias, D., Heichman, R., & Darwishi, R. (2022). Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples.}

Prompt injection attacks against GPT-3^{70Willison, S. (2022). Prompt injection attacks against GPT-3. https://simonwillison.net/2022/Sep/12/prompt-injection/}

Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions^{71Goodside, R. (2022). Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions. https://twitter.com/goodside/status/1569128808308957185}

History Correction^{72Goodside, R. (2023). History Correction. https://twitter.com/goodside/status/1610110111791325188?s=20&t=ulviQABPXFIIt4ZNZPAUCQ}

adversarial-prompts^{73Chase, H. (2022). adversarial-prompts. https://github.com/hwchase17/adversarial-prompts}

GPT-3 Prompt Injection Defenses^{74Goodside, R. (2022). GPT-3 Prompt Injection Defenses. https://twitter.com/goodside/status/1578278974526222336?s=20&t=3UMZB7ntYhwAk3QLpKMAbw}

Talking to machines: prompt engineering & injection^{75Mark, C. (2022). Talking to machines: prompt engineering & injection. https://artifact-research.com/artificial-intelligence/talking-to-machines-prompt-engineering-injection/}

Using GPT-Eliezer against ChatGPT Jailbreaking^{76Stuart Armstrong, R. G. (2022). Using GPT-Eliezer against ChatGPT Jailbreaking. https://www.alignmentforum.org/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking}

Exploring Prompt Injection Attacks^{77Selvi, J. (2022). Exploring Prompt Injection Attacks. https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/}

The entire prompt of Microsoft Bing Chat?! (Hi, Sydney.)^{78Liu, K. (2023). The entire prompt of Microsoft Bing Chat?! (Hi, Sydney.). https://twitter.com/kliu128/status/1623472922374574080}

Ignore Previous Prompt: Attack Techniques For Language Models^{79Perez, F., & Ribeiro, I. (2022). Ignore Previous Prompt: Attack Techniques For Language Models. arXiv. https://doi.org/10.48550/ARXIV.2211.09527}

Lessons learned on Language Model Safety and misuse^{80Brundage, M. (2022). Lessons learned on Language Model Safety and misuse. In OpenAI. OpenAI. https://openai.com/blog/language-model-safety-and-misuse/}

Toxicity Detection with Generative Prompt-based Inference^{81Wang, Y.-S., & Chang, Y. (2022). Toxicity Detection with Generative Prompt-based Inference. arXiv. https://doi.org/10.48550/ARXIV.2205.12390}

ok I saw a few people jailbreaking safeguards openai put on chatgpt so I had to give it a shot myself^{82Maz, A. (2022). ok I saw a few people jailbreaking safeguards openai put on chatgpt so I had to give it a shot myself. https://twitter.com/alicemazzy/status/1598288519301976064}

Bypass @OpenAI's ChatGPT alignment efforts with this one weird trick^{83Piedrafita, M. (2022). Bypass @OpenAI’s ChatGPT alignment efforts with this one weird trick. https://twitter.com/m1guelpf/status/1598203861294252033}

ChatGPT jailbreaking itself^{84Parfait, D. (2022). ChatGPT jailbreaking itself. https://twitter.com/haus_cole/status/1598541468058390534}

Using "pretend" on #ChatGPT can do some wild stuff. You can kind of get some insight on the future, alternative universe.^{85Soares, N. (2022). Using “pretend” on #ChatGPT can do some wild stuff. You can kind of get some insight on the future, alternative universe. https://twitter.com/NeroSoares/status/1608527467265904643}

I kinda like this one even more!^{86Moran, N. (2022). I kinda like this one even more! https://twitter.com/NickEMoran/status/1598101579626057728}

uh oh^{87samczsun. (2022). uh oh. https://twitter.com/samczsun/status/1598679658488217601}

Building A Virtual Machine inside ChatGPT^{88Degrave, J. (2022). Building A Virtual Machine inside ChatGPT. Engraved. https://www.engraved.blog/building-a-virtual-machine-inside/}

Reliability

MathPrompter: Mathematical Reasoning using Large Language Models^{89Imani, S., Du, L., & Shrivastava, H. (2023). MathPrompter: Mathematical Reasoning using Large Language Models.}

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning^{90Ye, X., & Durrett, G. (2022). The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning.}

Prompting GPT-3 To Be Reliable^{91Si, C., Gan, Z., Yang, Z., Wang, S., Wang, J., Boyd-Graber, J., & Wang, L. (2022). Prompting GPT-3 To Be Reliable.}

On the Advance of Making Language Models Better Reasoners^{92Li, Y., Lin, Z., Zhang, S., Fu, Q., Chen, B., Lou, J.-G., & Chen, W. (2022). On the Advance of Making Language Models Better Reasoners.}

Ask Me Anything: A simple strategy for prompting language models^{93Arora, S., Narayan, A., Chen, M. F., Orr, L., Guha, N., Bhatia, K., Chami, I., Sala, F., & Ré, C. (2022). Ask Me Anything: A simple strategy for prompting language models.}

Calibrate Before Use: Improving Few-Shot Performance of Language Models^{94Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models.}

Can large language models reason about medical questions?^{95Liévin, V., Hother, C. E., & Winther, O. (2022). Can large language models reason about medical questions?}

Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference^{96Mitchell, E., Noh, J. J., Li, S., Armstrong, W. S., Agarwal, A., Liu, P., Finn, C., & Manning, C. D. (2022). Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference.}

On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning^{97Shaikh, O., Zhang, H., Held, W., Bernstein, M., & Yang, D. (2022). On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning.}

Evaluating language models can be tricky^{98Chase, H. (2022). Evaluating language models can be tricky. https://twitter.com/hwchase17/status/1607428141106008064}

Constitutional AI: Harmlessness from AI Feedback^{99Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI Feedback.}

Surveys

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition^{100Jurafsky, D., & Martin, J. H. (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall.}

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing^{101Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2022). Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys. https://doi.org/10.1145/3560815}

PromptPapers^{102Ding, N., & Hu, S. (2022). PromptPapers. https://github.com/thunlp/PromptPapers}

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT^{103White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT.}

Techniques

Chain of Thought Prompting Elicits Reasoning in Large Language Models^{104Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models.}

Large Language Models are Zero-Shot Reasoners^{105Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners.}

Self-Consistency Improves Chain of Thought Reasoning in Language Models^{106Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models.}

What Makes Good In-Context Examples for GPT-3?^{107Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., & Chen, W. (2022). What Makes Good In-Context Examples for GPT-3? Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. https://doi.org/10.18653/v1/2022.deelio-1.10}

Generated Knowledge Prompting for Commonsense Reasoning^{108Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Bras, R. L., Choi, Y., & Hajishirzi, H. (2021). Generated Knowledge Prompting for Commonsense Reasoning.}

Recitation-Augmented Language Models^{109Sun, Z., Wang, X., Tay, Y., Yang, Y., & Zhou, D. (2022). Recitation-Augmented Language Models.}

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?^{110Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. (2022). Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?}

Show Your Work: Scratchpads for Intermediate Computation with Language Models^{111Nye, M., Andreassen, A. J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., Luan, D., Sutton, C., & Odena, A. (2021). Show Your Work: Scratchpads for Intermediate Computation with Language Models.}

Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations^{112Jung, J., Qin, L., Welleck, S., Brahman, F., Bhagavatula, C., Bras, R. L., & Choi, Y. (2022). Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations.}

STaR: Bootstrapping Reasoning With Reasoning^{113Zelikman, E., Wu, Y., Mu, J., & Goodman, N. D. (2022). STaR: Bootstrapping Reasoning With Reasoning.}

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models^{114Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.}

Reframing Instructional Prompts to GPTk’s Language^{115Mishra, S., Khashabi, D., Baral, C., Choi, Y., & Hajishirzi, H. (2022). Reframing Instructional Prompts to GPTk’s Language. Findings of the Association for Computational Linguistics: ACL 2022. https://doi.org/10.18653/v1/2022.findings-acl.50}

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models^{116Logan IV, R., Balazevic, I., Wallace, E., Petroni, F., Singh, S., & Riedel, S. (2022). Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models. Findings of the Association for Computational Linguistics: ACL 2022, 2824–2835. https://doi.org/10.18653/v1/2022.findings-acl.222}

Role-Play with Large Language Models^{117Shanahan, M., McDonell, K., & Reynolds, L. (2023). Role-Play with Large Language Models.}

CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society^{118Li, G., Hammoud, H. A. A. K., Itani, H., Khizbullin, D., & Ghanem, B. (2023). CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society.}

TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks^{119Santu, S. K. K., & Feng, D. (2023). TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks.}

Models

Image Models

Stable Diffusion^{120Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021). High-Resolution Image Synthesis with Latent Diffusion Models.}

DALLE^{121Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents.}

Language Models

ChatGPT^{122OpenAI. (2022). ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt/. https://openai.com/blog/chatgpt/}

GPT-3^{123Brown, T. B. (2020). Language models are few-shot learners. arXiv Preprint arXiv:2005.14165.}

Instruct GPT^{124Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback.}

GPT-4^{125OpenAI. (2023). GPT-4 Technical Report.}

PaLM: Scaling Language Modeling with Pathways^{126Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., … Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways.}

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model^{127Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A. S., Yvon, F., Gallé, M., Tow, J., Rush, A. M., Biderman, S., Webson, A., Ammanamanchi, P. S., Wang, T., Sagot, B., Muennighoff, N., del Moral, A. V., … Wolf, T. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.}

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting^{128Yong, Z.-X., Schoelkopf, H., Muennighoff, N., Aji, A. F., Adelani, D. I., Almubarak, K., Bari, M. S., Sutawika, L., Kasai, J., Baruwa, A., Winata, G. I., Biderman, S., Radev, D., & Nikoulina, V. (2022). BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting.}

Jurassic-1: Technical Details and Evaluation, White paper, AI21 Labs, 2021^{129Lieber, O., Sharir, O., Lentz, B., & Shoham, Y. (2021). Jurassic-1: Technical Details and Evaluation, White paper, AI21 Labs, 2021. URL: Https://Uploads-Ssl. Webflow. Com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_ Tech_paper. Pdf.}

GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model^{130Wang, B., & Komatsuzaki, A. (2021). GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax. https://github.com/kingoflolz/mesh-transformer-jax}

Roberta: A robustly optimized bert pretraining approach^{131Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv Preprint arXiv:1907.11692.}

Tooling

Ides

TextBox 2.0: A Text Generation Library with Pre-trained Language Models^{132Tang, T., Junyi, L., Chen, Z., Hu, Y., Yu, Z., Dai, W., Dong, Z., Cheng, X., Wang, Y., Zhao, W., Nie, J., & Wen, J.-R. (2022). TextBox 2.0: A Text Generation Library with Pre-trained Language Models.}

Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models^{133Strobelt, H., Webson, A., Sanh, V., Hoover, B., Beyer, J., Pfister, H., & Rush, A. M. (2022). Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. arXiv. https://doi.org/10.48550/ARXIV.2208.07852}

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts^{134Bach, S. H., Sanh, V., Yong, Z.-X., Webson, A., Raffel, C., Nayak, N. V., Sharma, A., Kim, T., Bari, M. S., Fevry, T., Alyafeai, Z., Dey, M., Santilli, A., Sun, Z., Ben-David, S., Xu, C., Chhablani, G., Wang, H., Fries, J. A., … Rush, A. M. (2022). PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts.}

PromptChainer: Chaining Large Language Model Prompts through Visual Programming^{135Wu, T., Jiang, E., Donsbach, A., Gray, J., Molina, A., Terry, M., & Cai, C. J. (2022). PromptChainer: Chaining Large Language Model Prompts through Visual Programming.}

OpenPrompt: An Open-source Framework for Prompt-learning^{136Ding, N., Hu, S., Zhao, W., Chen, Y., Liu, Z., Zheng, H.-T., & Sun, M. (2021). OpenPrompt: An Open-source Framework for Prompt-learning. arXiv Preprint arXiv:2111.01998.}

PromptMaker: Prompt-Based Prototyping with Large Language Models^{137Jiang, E., Olson, K., Toh, E., Molina, A., Donsbach, A., Terry, M., & Cai, C. J. (2022). PromptMaker: Prompt-Based Prototyping with Large Language Models. Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491101.3503564}

Tools

LangChain^{138Chase, H. (2022). LangChain (0.0.66) [Computer software]. https://github.com/hwchase17/langchain}

GPT Index^{139Liu, J. (2022). GPT Index. https://doi.org/10.5281/zenodo.1234}

Footnotes

Karpas, E., Abend, O., Belinkov, Y., Lenz, B., Lieber, O., Ratner, N., Shoham, Y., Bata, H., Levine, Y., Leyton-Brown, K., Muhlgay, D., Rozen, N., Schwartz, E., Shachaf, G., Shalev-Shwartz, S., Shashua, A., & Tenenholtz, M. (2022). ↩
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ↩
Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., & Neubig, G. (2022). ↩
Significant-Gravitas. (2023). https://news.agpt.co/ ↩
Nakajima, Y. (2023). https://github.com/yoheinakajima/babyagi ↩
Reworkd.ai. (2023). https://github.com/reworkd/AgentGPT ↩
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). ↩
Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv Preprint arXiv:2010.15980. ↩
Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022). Large Language Models Are Human-Level Prompt Engineers. ↩
Lester, B., Al-Rfou, R., & Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning. ↩
Khashabi, D., Lyu, S., Min, S., Qin, L., Richardson, K., Welleck, S., Hajishirzi, H., Khot, T., Sabharwal, A., Singh, S., & Choi, Y. (2021). Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts. ↩
Lake, B. M., & Baroni, M. (2018). Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. https://doi.org/10.48550/arXiv.1711.00350 ↩
Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., & Schulman, J. (2021). Training Verifiers to Solve Math Word Problems. ↩
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. ↩
Roy, S., & Roth, D. (2015). Solving General Arithmetic Word Problems. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1743–1752. https://doi.org/10.18653/v1/D15-1202 ↩
Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018). FEVER: a large-scale dataset for Fact Extraction and VERification. ↩
Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M., & Bowman, S. R. (2021). BBQ: A Hand-Built Bias Benchmark for Question Answering. ↩
Roose, K. (2022). Don’t ban chatgpt in schools. teach with it. https://www.nytimes.com/2023/01/12/technology/chatgpt-schools-teachers.html ↩
Lipman, J., & Distler, R. (2023). Schools Shouldn’t Ban Access to ChatGPT. https://time.com/6246574/schools-shouldnt-ban-access-to-chatgpt/ ↩
Bansal, A., yeh Ping-Chiang, Curry, M., Jain, R., Wigington, C., Manjunatha, V., Dickerson, J. P., & Goldstein, T. (2022). Certified Neural Network Watermarks with Randomized Smoothing. ↩
Gu, C., Huang, C., Zheng, X., Chang, K.-W., & Hsieh, C.-J. (2022). Watermarking Pre-trained Language Models with Backdooring. ↩
Noonan, E., & Averill, O. (2023). GW preparing disciplinary response to AI programs as faculty explore educational use. https://www.gwhatchet.com/2023/01/17/gw-preparing-disciplinary-response-to-ai-programs-as-faculty-explore-educational-use/ ↩
Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A Watermark for Large Language Models. https://arxiv.org/abs/2301.10226 ↩
Mitchell, E., Lee, Y., Khazatsky, A., Manning, C., & Finn, C. (2023). DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. https://doi.org/10.48550/arXiv.2301.11305 ↩
Oppenlaender, J. (2022). Prompt Engineering for Text-Based Generative Art. ↩
Parsons, G. (2022). The DALLE 2 Prompt Book. https://dallery.gallery/the-dalle-2-prompt-book/ ↩
Blake. (2022). With the right prompt, Stable Diffusion 2.0 can do hands. https://www.reddit.com/r/StableDiffusion/comments/z7salo/with_the_right_prompt_stable_diffusion_20_can_do/ ↩
Davenport, T. H., & Mittal, N. (2022). How Generative AI Is Changing Creative Work. Harvard Business Review. https://hbr.org/2022/11/how-generative-ai-is-changing-creative-work ↩
Captain, S. (2023). How AI Will Change the Workplace. Wall Street Journal. https://www.wsj.com/articles/how-ai-change-workplace-af2162ee ↩
Verma, P., & Vynck, G. D. (2023). ChatGPT took their jobs. Now they walk dogs and fix air conditioners. Washington Post. https://www.washingtonpost.com/technology/2023/06/02/ai-taking-jobs/ ↩
Ford, B. (2023). Bloomberg.Com. https://www.bloomberg.com/news/articles/2023-05-01/ibm-to-pause-hiring-for-back-office-jobs-that-ai-could-kill ↩
Efrat, A., & Levy, O. (2020). The Turking Test: Can Language Models Understand Instructions? ↩
Oppenlaender, J. (2022). A Taxonomy of Prompt Modifiers for Text-To-Image Generation. ↩
Wang, Z. J., Montoya, E., Munechika, D., Yang, H., Hoover, B., & Chau, D. H. (2022). DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models. ↩
Hao, Y., Chi, Z., Dong, L., & Wei, F. (2022). Optimizing Prompts for Text-to-Image Generation. ↩
Dohan, D., Xu, W., Lewkowycz, A., Austin, J., Bieber, D., Lopes, R. G., Wu, Y., Michalewski, H., Saurous, R. A., Sohl-dickstein, J., Murphy, K., & Sutton, C. (2022). Language Model Cascades. ↩
Liu, V., & Chilton, L. B. (2022). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3501825 ↩
Perez, E., Ringer, S., Lukošiūtė, K., Nguyen, K., Chen, E., Heiner, S., Pettit, C., Olsson, C., Kundu, S., Kadavath, S., Jones, A., Chen, A., Mann, B., Israel, B., Seethor, B., McKinnon, C., Olah, C., Yan, D., Amodei, D., … Kaplan, J. (2022). Discovering Language Model Behaviors with Model-Written Evaluations. ↩
Su, H., Kasai, J., Wu, C. H., Shi, W., Wang, T., Xin, J., Zhang, R., Ostendorf, M., Zettlemoyer, L., Smith, N. A., & Yu, T. (2022). Selective Annotation Makes Language Models Better Few-Shot Learners. ↩
Izacard, G., Lewis, P., Lomeli, M., Hosseini, L., Petroni, F., Schick, T., Dwivedi-Yu, J., Joulin, A., Riedel, S., & Grave, E. (2022). Atlas: Few-shot Learning with Retrieval Augmented Language Models. ↩
Wang, B., Feng, C., Nair, A., Mao, M., Desai, J., Celikyilmaz, A., Li, H., Mehdad, Y., & Radev, D. (2022). STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension. ↩
Beurer-Kellner, L., Fischer, M., & Vechev, M. (2022). Prompting Is Programming: A Query Language For Large Language Models. ↩
Ratner, N., Levine, Y., Belinkov, Y., Ram, O., Abend, O., Karpas, E., Shashua, A., Leyton-Brown, K., & Shoham, Y. (2022). Parallel Context Windows Improve In-Context Learning of Large Language Models. ↩
Bursztyn, V. S., Demeter, D., Downey, D., & Birnbaum, L. (2022). Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models. ↩
Wang, Y., Mishra, S., Alipoormolabashi, P., Kordi, Y., Mirzaei, A., Arunkumar, A., Ashok, A., Dhanasekaran, A. S., Naik, A., Stap, D., Pathak, E., Karamanolakis, G., Lai, H. G., Purohit, I., Mondal, I., Anderson, J., Kuznia, K., Doshi, K., Patel, M., … Khashabi, D. (2022). Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks. ↩
Gao, T., Fisch, A., & Chen, D. (2021). Making Pre-trained Language Models Better Few-shot Learners. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). https://doi.org/10.18653/v1/2021.acl-long.295 ↩
Dang, H., Mecke, L., Lehmann, F., Goller, S., & Buschek, D. (2022). How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. ↩
Akyürek, A. F., Paik, S., Kocyigit, M. Y., Akbiyik, S., Runyun, Ş. L., & Wijaya, D. (2022). On Measuring Social Biases in Prompt-Based Multi-Task Learning. ↩
Jin, Y., Kadam, V., & Wanvarie, D. (2022). Plot Writing From Pre-Trained Language Models. ↩
Nadeem, M., Bethke, A., & Reddy, S. (2021). StereoSet: Measuring stereotypical bias in pretrained language models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 5356–5371. https://doi.org/10.18653/v1/2021.acl-long.416 ↩
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., & Fung, P. (2022). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys. https://doi.org/10.1145/3571730 ↩
Yuan, A., Coenen, A., Reif, E., & Ippolito, D. (2022). Wordcraft: Story Writing With Large Language Models. 27th International Conference on Intelligent User Interfaces, 841–852. ↩
Fadnavis, S., Dhurandhar, A., Norel, R., Reinen, J. M., Agurto, C., Secchettin, E., Schweiger, V., Perini, G., & Cecchi, G. (2022). PainPoints: A Framework for Language-based Detection of Chronic Pain and Expert-Collaborative Text-Summarization. arXiv Preprint arXiv:2209.09814. ↩
Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N. A., Khashabi, D., & Hajishirzi, H. (2022). Self-Instruct: Aligning Language Model with Self Generated Instructions. ↩
Guo, J., Li, J., Li, D., Tiong, A. M. H., Li, B., Tao, D., & Hoi, S. C. H. (2022). From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models. ↩
Markov, T. (2022). New and improved content moderation tooling. In OpenAI. OpenAI. https://openai.com/blog/new-and-improved-content-moderation-tooling/ ↩
Schick, T., & Schütze, H. (2020). Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. ↩
Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338. ↩
Forsgren, S., & Martiros, H. (2022). Riffusion - Stable diffusion for real-time music generation. https://riffusion.com/about ↩
Bonta, A. (2022). How to use OpenAI’s ChatGPT to write the perfect cold email. https://www.streak.com/post/how-to-use-ai-to-write-perfect-cold-emails ↩
Nobel, P. S., & others. (2002). Cacti: biology and uses. Univ of California Press. ↩
Webson, A., Loo, A. M., Yu, Q., & Pavlick, E. (2023). Are Language Models Worse than Humans at Following Prompts? It’s Complicated. arXiv:2301.07085v1 [Cs.CL]. ↩
Wang, Z., Mao, S., Wu, W., Ge, T., Wei, F., & Ji, H. (2023). Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration. ↩
Crothers, E., Japkowicz, N., & Viktor, H. (2022). Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods. ↩
u/Nin_kat. (2023). New jailbreak based on virtual functions - smuggle illegal tokens to the backend. https://www.reddit.com/r/ChatGPT/comments/10urbdj/new_jailbreak_based_on_virtual_functions_smuggle ↩
Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks. ↩
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). More than you’ve asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models. ↩
KIHO, L. (2023). ChatGPT “DAN” (and other “Jailbreaks”). https://github.com/0xk1h0/ChatGPT_DAN ↩
Branch, H. J., Cefalu, J. R., McHugh, J., Hujer, L., Bahl, A., del Castillo Iglesias, D., Heichman, R., & Darwishi, R. (2022). Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples. ↩
Willison, S. (2022). Prompt injection attacks against GPT-3. https://simonwillison.net/2022/Sep/12/prompt-injection/ ↩
Goodside, R. (2022). Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions. https://twitter.com/goodside/status/1569128808308957185 ↩
Goodside, R. (2023). History Correction. https://twitter.com/goodside/status/1610110111791325188?s=20&t=ulviQABPXFIIt4ZNZPAUCQ ↩
Chase, H. (2022). adversarial-prompts. https://github.com/hwchase17/adversarial-prompts ↩
Goodside, R. (2022). GPT-3 Prompt Injection Defenses. https://twitter.com/goodside/status/1578278974526222336?s=20&t=3UMZB7ntYhwAk3QLpKMAbw ↩
Mark, C. (2022). Talking to machines: prompt engineering & injection. https://artifact-research.com/artificial-intelligence/talking-to-machines-prompt-engineering-injection/ ↩
Stuart Armstrong, R. G. (2022). Using GPT-Eliezer against ChatGPT Jailbreaking. https://www.alignmentforum.org/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking ↩
Selvi, J. (2022). Exploring Prompt Injection Attacks. https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/ ↩
Liu, K. (2023). The entire prompt of Microsoft Bing Chat?! (Hi, Sydney.). https://twitter.com/kliu128/status/1623472922374574080 ↩
Perez, F., & Ribeiro, I. (2022). Ignore Previous Prompt: Attack Techniques For Language Models. arXiv. https://doi.org/10.48550/ARXIV.2211.09527 ↩
Brundage, M. (2022). Lessons learned on Language Model Safety and misuse. In OpenAI. OpenAI. https://openai.com/blog/language-model-safety-and-misuse/ ↩
Wang, Y.-S., & Chang, Y. (2022). Toxicity Detection with Generative Prompt-based Inference. arXiv. https://doi.org/10.48550/ARXIV.2205.12390 ↩
Maz, A. (2022). ok I saw a few people jailbreaking safeguards openai put on chatgpt so I had to give it a shot myself. https://twitter.com/alicemazzy/status/1598288519301976064 ↩
Piedrafita, M. (2022). Bypass @OpenAI’s ChatGPT alignment efforts with this one weird trick. https://twitter.com/m1guelpf/status/1598203861294252033 ↩
Parfait, D. (2022). ChatGPT jailbreaking itself. https://twitter.com/haus_cole/status/1598541468058390534 ↩
Soares, N. (2022). Using “pretend” on #ChatGPT can do some wild stuff. You can kind of get some insight on the future, alternative universe. https://twitter.com/NeroSoares/status/1608527467265904643 ↩
Moran, N. (2022). I kinda like this one even more! https://twitter.com/NickEMoran/status/1598101579626057728 ↩
samczsun. (2022). uh oh. https://twitter.com/samczsun/status/1598679658488217601 ↩
Degrave, J. (2022). Building A Virtual Machine inside ChatGPT. Engraved. https://www.engraved.blog/building-a-virtual-machine-inside/ ↩
Imani, S., Du, L., & Shrivastava, H. (2023). MathPrompter: Mathematical Reasoning using Large Language Models. ↩
Ye, X., & Durrett, G. (2022). The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning. ↩
Si, C., Gan, Z., Yang, Z., Wang, S., Wang, J., Boyd-Graber, J., & Wang, L. (2022). Prompting GPT-3 To Be Reliable. ↩
Li, Y., Lin, Z., Zhang, S., Fu, Q., Chen, B., Lou, J.-G., & Chen, W. (2022). On the Advance of Making Language Models Better Reasoners. ↩
Arora, S., Narayan, A., Chen, M. F., Orr, L., Guha, N., Bhatia, K., Chami, I., Sala, F., & Ré, C. (2022). Ask Me Anything: A simple strategy for prompting language models. ↩
Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models. ↩
Liévin, V., Hother, C. E., & Winther, O. (2022). Can large language models reason about medical questions? ↩
Mitchell, E., Noh, J. J., Li, S., Armstrong, W. S., Agarwal, A., Liu, P., Finn, C., & Manning, C. D. (2022). Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference. ↩
Shaikh, O., Zhang, H., Held, W., Bernstein, M., & Yang, D. (2022). On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning. ↩
Chase, H. (2022). Evaluating language models can be tricky. https://twitter.com/hwchase17/status/1607428141106008064 ↩
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI Feedback. ↩
Jurafsky, D., & Martin, J. H. (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall. ↩
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2022). Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys. https://doi.org/10.1145/3560815 ↩
Ding, N., & Hu, S. (2022). PromptPapers. https://github.com/thunlp/PromptPapers ↩
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. ↩
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. ↩
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners. ↩
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. ↩
Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., & Chen, W. (2022). What Makes Good In-Context Examples for GPT-3? Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. https://doi.org/10.18653/v1/2022.deelio-1.10 ↩
Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Bras, R. L., Choi, Y., & Hajishirzi, H. (2021). Generated Knowledge Prompting for Commonsense Reasoning. ↩
Sun, Z., Wang, X., Tay, Y., Yang, Y., & Zhou, D. (2022). Recitation-Augmented Language Models. ↩
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. (2022). Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? ↩
Nye, M., Andreassen, A. J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., Luan, D., Sutton, C., & Odena, A. (2021). Show Your Work: Scratchpads for Intermediate Computation with Language Models. ↩
Jung, J., Qin, L., Welleck, S., Brahman, F., Bhagavatula, C., Bras, R. L., & Choi, Y. (2022). Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations. ↩
Zelikman, E., Wu, Y., Mu, J., & Goodman, N. D. (2022). STaR: Bootstrapping Reasoning With Reasoning. ↩
Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. ↩
Mishra, S., Khashabi, D., Baral, C., Choi, Y., & Hajishirzi, H. (2022). Reframing Instructional Prompts to GPTk’s Language. Findings of the Association for Computational Linguistics: ACL 2022. https://doi.org/10.18653/v1/2022.findings-acl.50 ↩
Logan IV, R., Balazevic, I., Wallace, E., Petroni, F., Singh, S., & Riedel, S. (2022). Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models. Findings of the Association for Computational Linguistics: ACL 2022, 2824–2835. https://doi.org/10.18653/v1/2022.findings-acl.222 ↩
Shanahan, M., McDonell, K., & Reynolds, L. (2023). Role-Play with Large Language Models. ↩
Li, G., Hammoud, H. A. A. K., Itani, H., Khizbullin, D., & Ghanem, B. (2023). CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society. ↩
Santu, S. K. K., & Feng, D. (2023). TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks. ↩
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021). High-Resolution Image Synthesis with Latent Diffusion Models. ↩
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. ↩
OpenAI. (2022). ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt/. https://openai.com/blog/chatgpt/ ↩
Brown, T. B. (2020). Language models are few-shot learners. arXiv Preprint arXiv:2005.14165. ↩
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. ↩
OpenAI. (2023). GPT-4 Technical Report. ↩
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., … Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways. ↩
Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A. S., Yvon, F., Gallé, M., Tow, J., Rush, A. M., Biderman, S., Webson, A., Ammanamanchi, P. S., Wang, T., Sagot, B., Muennighoff, N., del Moral, A. V., … Wolf, T. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. ↩
Yong, Z.-X., Schoelkopf, H., Muennighoff, N., Aji, A. F., Adelani, D. I., Almubarak, K., Bari, M. S., Sutawika, L., Kasai, J., Baruwa, A., Winata, G. I., Biderman, S., Radev, D., & Nikoulina, V. (2022). BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting. ↩
Lieber, O., Sharir, O., Lentz, B., & Shoham, Y. (2021). Jurassic-1: Technical Details and Evaluation, White paper, AI21 Labs, 2021. URL: Https://Uploads-Ssl. Webflow. Com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_ Tech_paper. Pdf. ↩
Wang, B., & Komatsuzaki, A. (2021). GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax. https://github.com/kingoflolz/mesh-transformer-jax ↩
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv Preprint arXiv:1907.11692. ↩
Tang, T., Junyi, L., Chen, Z., Hu, Y., Yu, Z., Dai, W., Dong, Z., Cheng, X., Wang, Y., Zhao, W., Nie, J., & Wen, J.-R. (2022). TextBox 2.0: A Text Generation Library with Pre-trained Language Models. ↩
Strobelt, H., Webson, A., Sanh, V., Hoover, B., Beyer, J., Pfister, H., & Rush, A. M. (2022). Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. arXiv. https://doi.org/10.48550/ARXIV.2208.07852 ↩
Bach, S. H., Sanh, V., Yong, Z.-X., Webson, A., Raffel, C., Nayak, N. V., Sharma, A., Kim, T., Bari, M. S., Fevry, T., Alyafeai, Z., Dey, M., Santilli, A., Sun, Z., Ben-David, S., Xu, C., Chhablani, G., Wang, H., Fries, J. A., … Rush, A. M. (2022). PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts. ↩
Wu, T., Jiang, E., Donsbach, A., Gray, J., Molina, A., Terry, M., & Cai, C. J. (2022). PromptChainer: Chaining Large Language Model Prompts through Visual Programming. ↩
Ding, N., Hu, S., Zhao, W., Chen, Y., Liu, Z., Zheng, H.-T., & Sun, M. (2021). OpenPrompt: An Open-source Framework for Prompt-learning. arXiv Preprint arXiv:2111.01998. ↩
Jiang, E., Olson, K., Toh, E., Molina, A., Donsbach, A., Terry, M., & Cai, C. J. (2022). PromptMaker: Prompt-Based Prototyping with Large Language Models. Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491101.3503564 ↩
Chase, H. (2022). LangChain (0.0.66) [Computer software]. https://github.com/hwchase17/langchain ↩
Liu, J. (2022). GPT Index. https://doi.org/10.5281/zenodo.1234 ↩

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Edit this page

📙 Vocabulary Reference

📦 Prompted Products

Master Generative AI with Our Courses

Need Business GenAI Training?

Contact Sales

Want to keep learning

Explore Our Full Course Collection

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses

Bibliography

Agents

MRKL1Karpas, E., Abend, O., Belinkov, Y., Lenz, B., Lieber, O., Ratner, N., Shoham, Y., Bata, H., Levine, Y., Leyton-Brown, K., Muhlgay, D., Rozen, N., Schwartz, E., Shachaf, G., Shalev-Shwartz, S., Shashua, A., & Tenenholtz, M. (2022).

ReAct2Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022).

PAL3Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., & Neubig, G. (2022).

Auto-GPT4Significant-Gravitas. (2023). https://news.agpt.co/

Baby AGI5Nakajima, Y. (2023). https://github.com/yoheinakajima/babyagi

AgentGPT6Reworkd.ai. (2023). https://github.com/reworkd/AgentGPT

Toolformer7Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023).

Automated

AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts8Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv Preprint arXiv:2010.15980.

automatic prompt engineer9Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022). Large Language Models Are Human-Level Prompt Engineers.

Soft Prompting10Lester, B., Al-Rfou, R., & Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning.

discretized soft prompting (interpreting)11Khashabi, D., Lyu, S., Min, S., Qin, L., Richardson, K., Welleck, S., Hajishirzi, H., Khot, T., Sabharwal, A., Singh, S., & Choi, Y. (2021). Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts.

Datasets

SCAN dataset (compositional generalization)12Lake, B. M., & Baroni, M. (2018). Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. https://doi.org/10.48550/arXiv.1711.00350

GSM8K13Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., & Schulman, J. (2021). Training Verifiers to Solve Math Word Problems.

hotpotQA14Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering.

multiarith15Roy, S., & Roth, D. (2015). Solving General Arithmetic Word Problems. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1743–1752. https://doi.org/10.18653/v1/D15-1202

fever dataset16Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018). FEVER: a large-scale dataset for Fact Extraction and VERification.

bbq17Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M., & Bowman, S. R. (2021). BBQ: A Hand-Built Bias Benchmark for Question Answering.

Detection

Don't ban chatgpt in schools. teach with it.18Roose, K. (2022). Don’t ban chatgpt in schools. teach with it. https://www.nytimes.com/2023/01/12/technology/chatgpt-schools-teachers.html

Schools Shouldn't Ban Access to ChatGPT19Lipman, J., & Distler, R. (2023). Schools Shouldn’t Ban Access to ChatGPT. https://time.com/6246574/schools-shouldnt-ban-access-to-chatgpt/

Certified Neural Network Watermarks with Randomized Smoothing20Bansal, A., yeh Ping-Chiang, Curry, M., Jain, R., Wigington, C., Manjunatha, V., Dickerson, J. P., & Goldstein, T. (2022). Certified Neural Network Watermarks with Randomized Smoothing.

Watermarking Pre-trained Language Models with Backdooring21Gu, C., Huang, C., Zheng, X., Chang, K.-W., & Hsieh, C.-J. (2022). Watermarking Pre-trained Language Models with Backdooring.

A Watermark for Large Language Models23Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A Watermark for Large Language Models. https://arxiv.org/abs/2301.10226

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature24Mitchell, E., Lee, Y., Khazatsky, A., Manning, C., & Finn, C. (2023). DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. https://doi.org/10.48550/arXiv.2301.11305

Image Prompt Engineering

Prompt Engineering for Text-Based Generative Art25Oppenlaender, J. (2022). Prompt Engineering for Text-Based Generative Art.

The DALLE 2 Prompt Book26Parsons, G. (2022). The DALLE 2 Prompt Book. https://dallery.gallery/the-dalle-2-prompt-book/

With the right prompt, Stable Diffusion 2.0 can do hands.27Blake. (2022). With the right prompt, Stable Diffusion 2.0 can do hands. https://www.reddit.com/r/StableDiffusion/comments/z7salo/with_the_right_prompt_stable_diffusion_20_can_do/

Meta Analysis

How Generative AI Is Changing Creative Work28Davenport, T. H., & Mittal, N. (2022). How Generative AI Is Changing Creative Work. Harvard Business Review. https://hbr.org/2022/11/how-generative-ai-is-changing-creative-work

How AI Will Change the Workplace29Captain, S. (2023). How AI Will Change the Workplace. Wall Street Journal. https://www.wsj.com/articles/how-ai-change-workplace-af2162ee

ChatGPT took their jobs. Now they walk dogs and fix air conditioners.30Verma, P., & Vynck, G. D. (2023). ChatGPT took their jobs. Now they walk dogs and fix air conditioners. Washington Post. https://www.washingtonpost.com/technology/2023/06/02/ai-taking-jobs/

No title31Ford, B. (2023). Bloomberg.Com. https://www.bloomberg.com/news/articles/2023-05-01/ibm-to-pause-hiring-for-back-office-jobs-that-ai-could-kill

Miscl

The Turking Test: Can Language Models Understand Instructions?32Efrat, A., & Levy, O. (2020). The Turking Test: Can Language Models Understand Instructions?

A Taxonomy of Prompt Modifiers for Text-To-Image Generation33Oppenlaender, J. (2022). A Taxonomy of Prompt Modifiers for Text-To-Image Generation.

DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models34Wang, Z. J., Montoya, E., Munechika, D., Yang, H., Hoover, B., & Chau, D. H. (2022). DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models.

Optimizing Prompts for Text-to-Image Generation35Hao, Y., Chi, Z., Dong, L., & Wei, F. (2022). Optimizing Prompts for Text-to-Image Generation.

Language Model Cascades36Dohan, D., Xu, W., Lewkowycz, A., Austin, J., Bieber, D., Lopes, R. G., Wu, Y., Michalewski, H., Saurous, R. A., Sohl-dickstein, J., Murphy, K., & Sutton, C. (2022). Language Model Cascades.

Design Guidelines for Prompt Engineering Text-to-Image Generative Models37Liu, V., & Chilton, L. B. (2022). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3501825

Selective Annotation Makes Language Models Better Few-Shot Learners39Su, H., Kasai, J., Wu, C. H., Shi, W., Wang, T., Xin, J., Zhang, R., Ostendorf, M., Zettlemoyer, L., Smith, N. A., & Yu, T. (2022). Selective Annotation Makes Language Models Better Few-Shot Learners.

Atlas: Few-shot Learning with Retrieval Augmented Language Models40Izacard, G., Lewis, P., Lomeli, M., Hosseini, L., Petroni, F., Schick, T., Dwivedi-Yu, J., Joulin, A., Riedel, S., & Grave, E. (2022). Atlas: Few-shot Learning with Retrieval Augmented Language Models.

STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension41Wang, B., Feng, C., Nair, A., Mao, M., Desai, J., Celikyilmaz, A., Li, H., Mehdad, Y., & Radev, D. (2022). STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension.

Prompting Is Programming: A Query Language For Large Language Models42Beurer-Kellner, L., Fischer, M., & Vechev, M. (2022). Prompting Is Programming: A Query Language For Large Language Models.

Parallel Context Windows Improve In-Context Learning of Large Language Models43Ratner, N., Levine, Y., Belinkov, Y., Ram, O., Abend, O., Karpas, E., Shashua, A., Leyton-Brown, K., & Shoham, Y. (2022). Parallel Context Windows Improve In-Context Learning of Large Language Models.

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models44Bursztyn, V. S., Demeter, D., Downey, D., & Birnbaum, L. (2022). Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models.

On Measuring Social Biases in Prompt-Based Multi-Task Learning48Akyürek, A. F., Paik, S., Kocyigit, M. Y., Akbiyik, S., Runyun, Ş. L., & Wijaya, D. (2022). On Measuring Social Biases in Prompt-Based Multi-Task Learning.

Plot Writing From Pre-Trained Language Models49Jin, Y., Kadam, V., & Wanvarie, D. (2022). Plot Writing From Pre-Trained Language Models.

Survey of Hallucination in Natural Language Generation51Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., & Fung, P. (2022). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys. https://doi.org/10.1145/3571730

Wordcraft: Story Writing With Large Language Models52Yuan, A., Coenen, A., Reif, E., & Ippolito, D. (2022). Wordcraft: Story Writing With Large Language Models. 27th International Conference on Intelligent User Interfaces, 841–852.

Self-Instruct: Aligning Language Model with Self Generated Instructions54Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N. A., Khashabi, D., & Hajishirzi, H. (2022). Self-Instruct: Aligning Language Model with Self Generated Instructions.

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models55Guo, J., Li, J., Li, D., Tiong, A. M. H., Li, B., Tao, D., & Hoi, S. C. H. (2022). From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models.

New and improved content moderation tooling56Markov, T. (2022). New and improved content moderation tooling. In OpenAI. OpenAI. https://openai.com/blog/new-and-improved-content-moderation-tooling/

Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference57Schick, T., & Schütze, H. (2020). Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference.

Human-level concept learning through probabilistic program induction58Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.

{Riffusion - Stable diffusion for real-time music generation}59Forsgren, S., & Martiros, H. (2022). Riffusion - Stable diffusion for real-time music generation. https://riffusion.com/about

How to use OpenAI’s ChatGPT to write the perfect cold email60Bonta, A. (2022). How to use OpenAI’s ChatGPT to write the perfect cold email. https://www.streak.com/post/how-to-use-ai-to-write-perfect-cold-emails

Cacti: biology and uses61Nobel, P. S., & others. (2002). Cacti: biology and uses. Univ of California Press.

Are Language Models Worse than Humans at Following Prompts? It’s Complicated62Webson, A., Loo, A. M., Yu, Q., & Pavlick, E. (2023). Are Language Models Worse than Humans at Following Prompts? It’s Complicated. arXiv:2301.07085v1 [Cs.CL].

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration63Wang, Z., Mao, S., Wu, W., Ge, T., Wei, F., & Ji, H. (2023). Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration.