Este capítulo cubre cómo hacer que las completaciones sean más confiables, así como cómo implementar verificaciones para asegurar que las salidas sean confiables.

Hasta cierto punto, la mayoría de las técnicas anteriores cubiertas tienen que ver con mejorar la precisión de las completaciones y, por lo tanto, la confiabilidad, en particular la autoconsistencia^{1Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models.}. Sin embargo, hay una serie de otras técnicas que se pueden utilizar para mejorar la confiabilidad, más allá de las estrategias básicas de generación de entradas.

Los LLMs exhiben diversos problemas, incluyendo alucinaciones^{2Ye, X., & Durrett, G. (2022). The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning.}, explicaciones defectuosas con los métodos de generación de entradas CoT^{2Ye, X., & Durrett, G. (2022). The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning.}, y múltiples sesgos, incluyendo sesgo de la etiqueta mayoritaria, sesgo de recencia y sesgo de token común^{3Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models.}. Además, la generación de entradas de cero disparo CoT puede ser particularmente sesgada al tratar temas sensibles^{4Shaikh, O., Zhang, H., Held, W., Bernstein, M., & Yang, D. (2022). On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning.}.

Las soluciones comunes para algunos de estos problemas incluyen calibradores para eliminar los sesgos a priori, y verificadores para puntuar las completaciones, así como promover la diversidad en las completaciones.

Calibración de LLMs

🟢 Eliminación de sesgos

🟦 Diverse Prompts

🟦 Prompt Ensembling

🟦 Autoevaluación de LLM

🟦 Math

Footnotes

Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. ↩
Ye, X., & Durrett, G. (2022). The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning. ↩ ↩²
Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models. ↩
Shaikh, O., Zhang, H., Held, W., Bernstein, M., & Yang, D. (2022). On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning. ↩

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Edit this page

🟦 Codigo como Razonamiento

🟢 Eliminación de sesgos

Master Generative AI with Our Courses

Need Business GenAI Training?

Contact Sales

Want to keep learning

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses

Introducción

Calibración de LLMs

🟢 Eliminación de sesgos

🟦 Diverse Prompts

🟦 Prompt Ensembling

🟦 Autoevaluación de LLM

🟦 Math

Footnotes

Sander Schulhoff

Master Generative AI with Our Courses

Contact Sales

Explore Our Full Course Collection

Explore Courses

Resources

Follow Us