Bienvenidos
😃 Básico
💼 Aplicaciones básicas
🧙‍♂️ Intermediate
🤖 Agentes
⚖️ Reliability
🖼️ Image Prompting
🔓 Prompt Hacking
🔨 Tooling
💪 Prompt Tuning
🎲 Miscellaneous
Models
📙 Referencia de Vocabulario
📚 Bibliography
📦 Prompted Products
🛸 Recursos adicionales
🔥 Hot Topics
✨ Créditos

Bibliography

Reading Time: 3 minutes
Last updated on August 7, 2024

Sander Schulhoff

The page contains an organized list of all papers used by this course. The papers are organized by topic.

To cite this course, use the provided citation in the Github repository.

🔵 = Paper directly cited in this course. Other papers have informed my understanding of the topic.

Note: since neither the GPT-3 nor the GPT-3 Instruct paper correspond to davinci models, I attempt not to cite them as such.

Prompt Engineering Strategies

Chain of Thought 🔵

Zero Shot Chain of Thought 🔵

Self Consistency 🔵

What Makes Good In-Context Examples for GPT-3? 🔵

Generated Knowledge 🔵

Rethinking the role of demonstrations 🔵

Scratchpads

Maieutic Prompting

STaR

Least to Most

Reliability

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning 🔵

Prompting GPT-3 to be reliable

Diverse Prompts 🔵

Calibrate Before Use: Improving Few-Shot Performance of Language Models 🔵

Enhanced Self Consistency

Bias and Toxicity in Zero-Shot CoT 🔵

Constitutional AI: Harmlessness from AI Feedback 🔵

Automated Prompt Engineering

AutoPrompt 🔵

Automatic Prompt Engineer

Models

Language Models

GPT-3 🔵

GPT-3 Instruct 🔵

PaLM 🔵

BLOOM 🔵

BLOOM+1 (more languages/ 0 shot improvements)

Jurassic 1 🔵

GPT-J-6B

Roberta

Image Models

Stable Diffusion 🔵

DALLE 🔵

Soft Prompting

Soft Prompting 🔵

Interpretable Discretized Soft Prompts 🔵

Datasets

GSM8K 🔵

HotPotQA 🔵

Fever 🔵

BBQ: A Hand-Built Bias Benchmark for Question Answering 🔵

Image Prompt Engineering

Taxonomy of prompt modifiers

DiffusionDB

The DALLE 2 Prompt Book 🔵

Prompt Engineering for Text-Based Generative Art 🔵

With the right prompt, Stable Diffusion 2.0 can do hands. 🔵

Optimizing Prompts for Text-to-Image Generation

Prompt Engineering IDEs

Prompt IDE 🔵

Prompt Source 🔵

PromptChainer 🔵

PromptMaker 🔵

Tooling

LangChain 🔵

TextBox 2.0: A Text Generation Library with Pre-trained Language Models 🔵

OpenPrompt: An Open-source Framework for Prompt-learning 🔵

GPT Index 🔵

Applied Prompt Engineering

Language Model Cascades

MRKL 🔵

ReAct 🔵

PAL: Program-aided Language Models 🔵

User Interface Design

Design Guidelines for Prompt Engineering Text-to-Image Generative Models

Prompt Injection

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods 🔵

Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples 🔵

Prompt injection attacks against GPT-3 🔵

Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions 🔵

adversarial-prompts 🔵

GPT-3 Prompt Injection Defenses 🔵

Talking to machines: prompt engineering & injection

Exploring Prompt Injection Attacks 🔵

Using GPT-Eliezer against ChatGPT Jailbreaking 🔵

Jailbreaking

Ignore Previous Prompt: Attack Techniques For Language Models

Lessons learned on Language Model Safety and misuse

Toxicity Detection with Generative Prompt-based Inference

New and improved content moderation tooling

OpenAI API 🔵

OpenAI ChatGPT 🔵

ChatGPT 4 Tweet 🔵

Acting Tweet 🔵

Research Tweet 🔵

Pretend Ability Tweet 🔵

Responsibility Tweet 🔵

Lynx Mode Tweet 🔵

Sudo Mode Tweet 🔵

Ignore Previous Prompt 🔵

Surveys

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

PromptPapers

Dataset Generation

Discovering Language Model Behaviors with Model-Written Evaluations

Selective Annotation Makes Language Models Better Few-Shot Learners

Applications

Atlas: Few-shot Learning with Retrieval Augmented Language Models

STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension

Miscl

Prompting Is Programming: A Query Language For Large Language Models

Parallel Context Windows Improve In-Context Learning of Large Language Models

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Making Pre-trained Language Models Better Few-shot Learners

Grounding with search results

How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models

On Measuring Social Biases in Prompt-Based Multi-Task Learning

Plot Writing From Pre-Trained Language Models 🔵

StereoSet: Measuring stereotypical bias in pretrained language models

Survey of Hallucination in Natural Language Generation

Examples

Wordcraft

PainPoints

Self-Instruct: Aligning Language Model with Self Generated Instructions

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference

A Watermark for Large Language Models

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

  1. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models.

  2. Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners.

  3. Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models.

  4. Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., & Chen, W. (2022). What Makes Good In-Context Examples for GPT-3? Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. https://doi.org/10.18653/v1/2022.deelio-1.10 2

  5. Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Bras, R. L., Choi, Y., & Hajishirzi, H. (2021). Generated Knowledge Prompting for Commonsense Reasoning.

  6. Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. (2022). Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

  7. Nye, M., Andreassen, A. J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., Luan, D., Sutton, C., & Odena, A. (2021). Show Your Work: Scratchpads for Intermediate Computation with Language Models.

  8. Jung, J., Qin, L., Welleck, S., Brahman, F., Bhagavatula, C., Bras, R. L., & Choi, Y. (2022). Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations.

  9. Zelikman, E., Wu, Y., Mu, J., & Goodman, N. D. (2022). STaR: Bootstrapping Reasoning With Reasoning.

  10. Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.

  11. Ye, X., & Durrett, G. (2022). The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning.

  12. Si, C., Gan, Z., Yang, Z., Wang, S., Wang, J., Boyd-Graber, J., & Wang, L. (2022). Prompting GPT-3 To Be Reliable.

  13. Li, Y., Lin, Z., Zhang, S., Fu, Q., Chen, B., Lou, J.-G., & Chen, W. (2022). On the Advance of Making Language Models Better Reasoners.

  14. Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models.

  15. Mitchell, E., Noh, J. J., Li, S., Armstrong, W. S., Agarwal, A., Liu, P., Finn, C., & Manning, C. D. (2022). Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference.

  16. Shaikh, O., Zhang, H., Held, W., Bernstein, M., & Yang, D. (2022). On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning.

  17. Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI Feedback.

  18. Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv Preprint arXiv:2010.15980.

  19. Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022). Large Language Models Are Human-Level Prompt Engineers.

  20. Brown, T. B. (2020). Language models are few-shot learners. arXiv Preprint arXiv:2005.14165.

  21. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback.

  22. Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., … Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways.

  23. Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A. S., Yvon, F., Gallé, M., Tow, J., Rush, A. M., Biderman, S., Webson, A., Ammanamanchi, P. S., Wang, T., Sagot, B., Muennighoff, N., del Moral, A. V., … Wolf, T. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.

  24. Yong, Z.-X., Schoelkopf, H., Muennighoff, N., Aji, A. F., Adelani, D. I., Almubarak, K., Bari, M. S., Sutawika, L., Kasai, J., Baruwa, A., Winata, G. I., Biderman, S., Radev, D., & Nikoulina, V. (2022). BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting.

  25. Lieber, O., Sharir, O., Lentz, B., & Shoham, Y. (2021). Jurassic-1: Technical Details and Evaluation, White paper, AI21 Labs, 2021. URL: Https://Uploads-Ssl. Webflow. Com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_ Tech_paper. Pdf.

  26. Wang, B., & Komatsuzaki, A. (2021). GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax. https://github.com/kingoflolz/mesh-transformer-jax

  27. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv Preprint arXiv:1907.11692.

  28. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021). High-Resolution Image Synthesis with Latent Diffusion Models.

  29. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents.

  30. Lester, B., Al-Rfou, R., & Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning.

  31. Khashabi, D., Lyu, S., Min, S., Qin, L., Richardson, K., Welleck, S., Hajishirzi, H., Khot, T., Sabharwal, A., Singh, S., & Choi, Y. (2021). Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts.

  32. Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., & Schulman, J. (2021). Training Verifiers to Solve Math Word Problems.

  33. Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering.

  34. Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018). FEVER: a large-scale dataset for Fact Extraction and VERification.

  35. Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M., & Bowman, S. R. (2021). BBQ: A Hand-Built Bias Benchmark for Question Answering.

  36. Oppenlaender, J. (2022). A Taxonomy of Prompt Modifiers for Text-To-Image Generation.

  37. Wang, Z. J., Montoya, E., Munechika, D., Yang, H., Hoover, B., & Chau, D. H. (2022). DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models.

  38. Parsons, G. (2022). The DALLE 2 Prompt Book. https://dallery.gallery/the-dalle-2-prompt-book/

  39. Oppenlaender, J. (2022). Prompt Engineering for Text-Based Generative Art.

  40. Blake. (2022). With the right prompt, Stable Diffusion 2.0 can do hands. https://www.reddit.com/r/StableDiffusion/comments/z7salo/with_the_right_prompt_stable_diffusion_20_can_do/

  41. Hao, Y., Chi, Z., Dong, L., & Wei, F. (2022). Optimizing Prompts for Text-to-Image Generation.

  42. Strobelt, H., Webson, A., Sanh, V., Hoover, B., Beyer, J., Pfister, H., & Rush, A. M. (2022). Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. arXiv. https://doi.org/10.48550/ARXIV.2208.07852

  43. Bach, S. H., Sanh, V., Yong, Z.-X., Webson, A., Raffel, C., Nayak, N. V., Sharma, A., Kim, T., Bari, M. S., Fevry, T., Alyafeai, Z., Dey, M., Santilli, A., Sun, Z., Ben-David, S., Xu, C., Chhablani, G., Wang, H., Fries, J. A., … Rush, A. M. (2022). PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts.

  44. Wu, T., Jiang, E., Donsbach, A., Gray, J., Molina, A., Terry, M., & Cai, C. J. (2022). PromptChainer: Chaining Large Language Model Prompts through Visual Programming.

  45. Jiang, E., Olson, K., Toh, E., Molina, A., Donsbach, A., Terry, M., & Cai, C. J. (2022). PromptMaker: Prompt-Based Prototyping with Large Language Models. Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491101.3503564

  46. Chase, H. (2022). LangChain (0.0.66) [Computer software]. https://github.com/hwchase17/langchain

  47. Tang, T., Junyi, L., Chen, Z., Hu, Y., Yu, Z., Dai, W., Dong, Z., Cheng, X., Wang, Y., Zhao, W., Nie, J., & Wen, J.-R. (2022). TextBox 2.0: A Text Generation Library with Pre-trained Language Models.

  48. Ding, N., Hu, S., Zhao, W., Chen, Y., Liu, Z., Zheng, H.-T., & Sun, M. (2021). OpenPrompt: An Open-source Framework for Prompt-learning. arXiv Preprint arXiv:2111.01998.

  49. Liu, J. (2022). GPT Index. https://doi.org/10.5281/zenodo.1234

  50. Dohan, D., Xu, W., Lewkowycz, A., Austin, J., Bieber, D., Lopes, R. G., Wu, Y., Michalewski, H., Saurous, R. A., Sohl-dickstein, J., Murphy, K., & Sutton, C. (2022). Language Model Cascades.

  51. Karpas, E., Abend, O., Belinkov, Y., Lenz, B., Lieber, O., Ratner, N., Shoham, Y., Bata, H., Levine, Y., Leyton-Brown, K., Muhlgay, D., Rozen, N., Schwartz, E., Shachaf, G., Shalev-Shwartz, S., Shashua, A., & Tenenholtz, M. (2022).

  52. Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022).

  53. Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., & Neubig, G. (2022).

  54. Liu, V., & Chilton, L. B. (2022). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3501825

  55. Crothers, E., Japkowicz, N., & Viktor, H. (2022). Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods.

  56. Branch, H. J., Cefalu, J. R., McHugh, J., Hujer, L., Bahl, A., del Castillo Iglesias, D., Heichman, R., & Darwishi, R. (2022). Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples.

  57. Willison, S. (2022). Prompt injection attacks against GPT-3. https://simonwillison.net/2022/Sep/12/prompt-injection/

  58. Goodside, R. (2022). Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions. https://twitter.com/goodside/status/1569128808308957185

  59. Chase, H. (2022). adversarial-prompts. https://github.com/hwchase17/adversarial-prompts

  60. Goodside, R. (2022). GPT-3 Prompt Injection Defenses. https://twitter.com/goodside/status/1578278974526222336?s=20&t=3UMZB7ntYhwAk3QLpKMAbw

  61. Mark, C. (2022). Talking to machines: prompt engineering & injection. https://artifact-research.com/artificial-intelligence/talking-to-machines-prompt-engineering-injection/

  62. Selvi, J. (2022). Exploring Prompt Injection Attacks. https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/

  63. Stuart Armstrong, R. G. (2022). Using GPT-Eliezer against ChatGPT Jailbreaking. https://www.alignmentforum.org/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking

  64. Perez, F., & Ribeiro, I. (2022). Ignore Previous Prompt: Attack Techniques For Language Models. arXiv. https://doi.org/10.48550/ARXIV.2211.09527

  65. Brundage, M. (2022). Lessons learned on Language Model Safety and misuse. In OpenAI. OpenAI. https://openai.com/blog/language-model-safety-and-misuse/

  66. Wang, Y.-S., & Chang, Y. (2022). Toxicity Detection with Generative Prompt-based Inference. arXiv. https://doi.org/10.48550/ARXIV.2205.12390

  67. Markov, T. (2022). New and improved content moderation tooling. In OpenAI. OpenAI. https://openai.com/blog/new-and-improved-content-moderation-tooling/

  68. OpenAI. (2022). https://beta.openai.com/docs/guides/moderation

  69. OpenAI. (2022). https://openai.com/blog/chatgpt/

  70. Maz, A. (2022). ok I saw a few people jailbreaking safeguards openai put on chatgpt so I had to give it a shot myself. https://twitter.com/alicemazzy/status/1598288519301976064

  71. Piedrafita, M. (2022). Bypass @OpenAI’s ChatGPT alignment efforts with this one weird trick. https://twitter.com/m1guelpf/status/1598203861294252033

  72. Parfait, D. (2022). ChatGPT jailbreaking itself. https://twitter.com/haus_cole/status/1598541468058390534

  73. Soares, N. (2022). Using “pretend” on #ChatGPT can do some wild stuff. You can kind of get some insight on the future, alternative universe. https://twitter.com/NeroSoares/status/1608527467265904643

  74. Moran, N. (2022). I kinda like this one even more! https://twitter.com/NickEMoran/status/1598101579626057728

  75. Degrave, J. (2022). Building A Virtual Machine inside ChatGPT. Engraved. https://www.engraved.blog/building-a-virtual-machine-inside/

  76. Sudo. (2022). https://www.sudo.ws/

  77. Perez, F., & Ribeiro, I. (2022). Ignore Previous Prompt: Attack Techniques For Language Models. arXiv. https://doi.org/10.48550/ARXIV.2211.09527

  78. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2022). Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys. https://doi.org/10.1145/3560815

  79. Ding, N., & Hu, S. (2022). PromptPapers. https://github.com/thunlp/PromptPapers

  80. Perez, E., Ringer, S., Lukošiūtė, K., Nguyen, K., Chen, E., Heiner, S., Pettit, C., Olsson, C., Kundu, S., Kadavath, S., Jones, A., Chen, A., Mann, B., Israel, B., Seethor, B., McKinnon, C., Olah, C., Yan, D., Amodei, D., … Kaplan, J. (2022). Discovering Language Model Behaviors with Model-Written Evaluations.

  81. Su, H., Kasai, J., Wu, C. H., Shi, W., Wang, T., Xin, J., Zhang, R., Ostendorf, M., Zettlemoyer, L., Smith, N. A., & Yu, T. (2022). Selective Annotation Makes Language Models Better Few-Shot Learners.

  82. Izacard, G., Lewis, P., Lomeli, M., Hosseini, L., Petroni, F., Schick, T., Dwivedi-Yu, J., Joulin, A., Riedel, S., & Grave, E. (2022). Atlas: Few-shot Learning with Retrieval Augmented Language Models.

  83. Wang, B., Feng, C., Nair, A., Mao, M., Desai, J., Celikyilmaz, A., Li, H., Mehdad, Y., & Radev, D. (2022). STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension.

  84. Beurer-Kellner, L., Fischer, M., & Vechev, M. (2022). Prompting Is Programming: A Query Language For Large Language Models.

  85. Ratner, N., Levine, Y., Belinkov, Y., Ram, O., Abend, O., Karpas, E., Shashua, A., Leyton-Brown, K., & Shoham, Y. (2022). Parallel Context Windows Improve In-Context Learning of Large Language Models.

  86. Bursztyn, V. S., Demeter, D., Downey, D., & Birnbaum, L. (2022). Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models.

  87. Wang, Y., Mishra, S., Alipoormolabashi, P., Kordi, Y., Mirzaei, A., Arunkumar, A., Ashok, A., Dhanasekaran, A. S., Naik, A., Stap, D., Pathak, E., Karamanolakis, G., Lai, H. G., Purohit, I., Mondal, I., Anderson, J., Kuznia, K., Doshi, K., Patel, M., … Khashabi, D. (2022). Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks.

  88. Gao, T., Fisch, A., & Chen, D. (2021). Making Pre-trained Language Models Better Few-shot Learners. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). https://doi.org/10.18653/v1/2021.acl-long.295

  89. Liévin, V., Hother, C. E., & Winther, O. (2022). Can large language models reason about medical questions?

  90. Dang, H., Mecke, L., Lehmann, F., Goller, S., & Buschek, D. (2022). How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models.

  91. Akyürek, A. F., Paik, S., Kocyigit, M. Y., Akbiyik, S., Runyun, Ş. L., & Wijaya, D. (2022). On Measuring Social Biases in Prompt-Based Multi-Task Learning.

  92. Jin, Y., Kadam, V., & Wanvarie, D. (2022). Plot Writing From Pre-Trained Language Models.

  93. Nadeem, M., Bethke, A., & Reddy, S. (2021). StereoSet: Measuring stereotypical bias in pretrained language models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 5356–5371. https://doi.org/10.18653/v1/2021.acl-long.416

  94. Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., & Fung, P. (2022). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys. https://doi.org/10.1145/3571730

  95. Yuan, A., Coenen, A., Reif, E., & Ippolito, D. (2022). Wordcraft: Story Writing With Large Language Models. 27th International Conference on Intelligent User Interfaces, 841–852.

  96. Fadnavis, S., Dhurandhar, A., Norel, R., Reinen, J. M., Agurto, C., Secchettin, E., Schweiger, V., Perini, G., & Cecchi, G. (2022). PainPoints: A Framework for Language-based Detection of Chronic Pain and Expert-Collaborative Text-Summarization. arXiv Preprint arXiv:2209.09814.

  97. Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N. A., Khashabi, D., & Hajishirzi, H. (2022). Self-Instruct: Aligning Language Model with Self Generated Instructions.

  98. Guo, J., Li, J., Li, D., Tiong, A. M. H., Li, B., Tao, D., & Hoi, S. C. H. (2022). From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models.

  99. Schick, T., & Schütze, H. (2020). Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference.

  100. Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A Watermark for Large Language Models. https://arxiv.org/abs/2301.10226