๋ณธ๋ฌธ์œผ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

๐ŸŸข ์‚ฌ๊ณ  ์‚ฌ์Šฌ ํ”„๋กฌํ”„ํŒ…

์‚ฌ๊ณ  ์‚ฌ์Šฌ (CoT) ํ”„๋กฌํ”„ํŒ…1์€ ์ตœ๊ทผ์— ๊ฐœ๋ฐœ๋œ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. LLM์—๊ฒŒ ์ด์œ ์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•˜๋„๋ก ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ์‚ฌ์ง„์—์„œ1 ํ“จ์ƒท ํ‘œ์ค€ ํ”„๋กฌํ”„ํŠธ(์™ผ์ชฝ)์™€ ์‚ฌ๊ณ  ์‚ฌ์Šฌ ํ”„๋กฌํ”„ํŒ…(์˜ค๋ฅธ์ชฝ)์„ ๋น„๊ตํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ‘œ์ค€ ํ”„๋กฌํ”„ํŒ… vs CoT (Wei et al.)

CoT์˜ ๋ฉ”์ธ ์•„์ด๋””์–ด๋Š” LLM์—๊ฒŒ ๋‹ต์ด ๋‚˜์˜ค๋Š” ๊ณผ์ •์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•œ %%ํ‘œ๋ณธ|ํ‘œ๋ณธ%%์„ ๋ณด์—ฌ์คŒ์œผ๋กœ์จ LLM์ด ํ”„๋กฌํ”„ํŠธ์— ๋‹ตํ•  ๋•Œ ๋˜‘๊ฐ™์ด ๊ทธ์— ๋Œ€ํ•œ ๊ณผ์ •์„ ์„ค๋ช…ํ•˜๋„๋ก ๋งŒ๋“œ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด์œ ์— ๋Œ€ํ•œ ์„ค๋ช…์€ ์ข…์ข… ๋‹ต์„ ๋” ์ •ํ™•ํ•˜๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

์˜ˆ์‹œโ€‹

๋ช‡๊ฐ€์ง€ ์˜ˆ์‹œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋จผ์ € GPT-3 (davinci-003)๊ฐ€ ๋ช‡๊ฐ€์ง€ ๊ฐ„๋‹จํ•œ ๋‹จ์–ด ๋ฌธ์ œ๋ฅผ ํ‹€๋ฆฐ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ์—์„œ๋Š” GPT-3(davinci-003)์ด CoT๋ฅผ ํ†ตํ•ด์„œ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๋งž์€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Incorrectโ€‹

Correctโ€‹

๊ฒฐ๊ณผโ€‹

CoT๋Š” ์‚ฐ์ˆ ์ , ์ƒ์‹์ , ์ƒ์ง•์  ์ถ”๋ก  ๊ณผ์ œ1์™€ ๊ฐ™์€ ์ผ๋“ค์— ๋” ํšจ์œจ์ ์œผ๋กœ ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด ๋ƒˆ๋‹ค. ํŠนํžˆ PaLM 540B2๋Š” GSM8K3์—์„œ 57%์˜ ์ •ํ™•๋„ ํ–ฅ์ƒ์„ ๋ณด์˜€๋‹ค.

Comparison of models on the GSM8K benchmark (Wei et al.)

์ œํ•œโ€‹

์ค‘์š”ํ•œ ์ ์€, Wei ๋“ฑ์— ๋”ฐ๋ฅด๋ฉด, "CoT๋Š” โˆผ100์–ต ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง„ ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ๋•Œ๋งŒ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค". ๋” ์ž‘์€ ๋ชจ๋ธ์€ ๋น„๋…ผ๋ฆฌ์ ์ธ ์‚ฌ๊ณ  ์‚ฌ์Šฌ์„ ์ž‘์„ฑํ–ˆ๊ณ  ์ด๋Š” ํ‘œ์ค€ ํ”„๋กฌํ”„ํŠธ๋กœ ์ž‘์„ฑํ•  ๋•Œ ๋ณด๋‹ค ๋” ๋ถ€์ •ํ™•ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ๋“ค์€ ๋ณดํ†ต ๋ชจ๋ธ์˜ ํฌ๊ธฐ์— ๋น„๋ก€ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ CoTํ”„๋กฌํ”„ํŠธ์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

์ฐธ๊ณ โ€‹

์ด ์ฑ•ํ„ฐ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ๊ณผ์ •์—์„œ ์–ด๋–ค ์–ธ์–ด ๋ชจ๋ธ๋„ ๋‹ค์น˜์ง€ ์•Š์•˜๋‹ต๋‹ˆ๋‹ค๐Ÿ˜Š.


  1. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. โ†ฉ
  2. Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., โ€ฆ Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways. โ†ฉ
  3. Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., & Schulman, J. (2021). Training Verifiers to Solve Math Word Problems. โ†ฉ