Announcing our new Course: AI Red-Teaming and AI Safety Masterclass

Check it out →

🟢 Pertanyaan Pilihan Berganda

Kemas kini terakhir pada August 7, 2024 oleh Sander Schulhoff

Mari gunakan GPT untuk menyelesaikan sebuah pertanyaan Law School Admission Test (LSAT)1 berikut ini!

Berikut adalah contoh pertanyaan LSAT. Pertimbangkan bagaimana Anda akan menjawabnya, serta bagaimana penalaran Anda.

John of Worcester, an English monk, recorded the sighting, on December 8, 1128, of two unusually large sunspots. Five days later a brilliant aurora borealis (northern lights) was observed in southern Korea. Sunspot activity is typically followed by the appearance of an aurora borealis, after a span of time that averages five days. Thus, the Korean sighting helps to confirm John of Worcester's sighting. Which one of the following, if true, most strengthens the argument?

a) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week.
b) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did.
c) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea.
d) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds.
e) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity.

The correct answer is ... c) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea.

Coba tempelkan permasalahan berikut ke demo di bawah ini:

Mengapa jawaban saya berbeda?

Jawaban Anda bisa berbeda karena

  1. Pembaruan pada model dasar, GPT-3 2) Kecacakan dalam proses pembangkitan teks. Kita dapat membuat output lebih konsisten dengan mengatur

    suhu

    menjadi 0.

Model tersebut gagal. Apakah itu berarti model ini tidak mampu menjawab jenis pertanyaan ini? Tidak selalu. Kita akan menyelami teknik yang dapat kita gunakan untuk meningkatkan hasil model.

Frase Ajaib

Prompt standar yang kita gunakan di atas memberikan sedikit insight tentang "reasoning" dari output GPT. Kita bisa mencoba menambahkan frasa let's explain step by step seperti ini:

...
e) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity.

Let’s explain step by step

Frasa ini akan meningkatkan verbosity dari model. Anda mungkin mendapatkan output seperti ini:

Info

Perhatikan bagaimana model beralasan melalui masalah langkah-demi-langkah.

Istilah khusus untuk perilaku ini adalah Chain of Thought1; model ini secara berurutan menghasilkan pernyataan untuk mencapai sebuah jawaban. Ini merupakan konsep yang serupa dengan System 2 thinking (dari Thinking Fast and Slow); model secara default merupakan system 1 thinking, tetapi dapat menghubungkan system 1 thinking untuk mencapai jawaban yang lebih metodologis.

Perbaikan

Berikut adalah beberapa variasi pada pertanyaan pilihan ganda dasar kami:

Urutkan Elemen Pertanyaan

Kita dapat mengurutkan item-item dalam pertanyaan

...
a) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity.
b) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds.
...

Mengubah Pertanyaan

Kita mengambil ulang, prompt awalnya seperti ini:

Which one of the following, if true, most strengthens the argument?

Kita dapat mengubah prompt menjadi ini:

Identify each choice as strengthens, weakens or doesn't impact the argument.

untuk mendapatkan wawasan lebih lanjut tentang pilihan jawaban.

Tambahkan Konteks Tambahan

Inilah contoh masalah yang bisa dengan mudah diselesaikan dengan menggunakan teorema Bayes:

Consider two medical tests, A and B, for a virus. Test A is 90% effective at recognizing the virus when it is
present, but has a 5% false positive rate (indicating that the virus is present, when it is not). Test B is 95%
effective at recognizing the virus, but has a 10% false positive rate. The two tests use independent methods
of identifying the virus. The virus is carried by 2% of all people.
(a) Say that a person is tested for the virus using only Test A. What is the probability that the person
is really carrying the virus given that Test A came back positive? (2 points)
(b) Say that a person is tested for the virus using only Test B. What is the probability that the person
is really carrying the virus given that Test B came back positive? (2 points)
(c) Say that a person is tested for the virus using both tests. What is the probability that the person is
really carrying the virus given that both tests came back positive? (2 points)

Mari coba ini dengan GPT:

Output-nya salah!

Jika kita menambahkan sedikit konteks, seperti ini:

...
Let's explain step by step. The formula for bayes is

Model ini akan menggunakan rumus yang tepat, Bayes.

Yang mana merupakan jawaban yang benar ****!

Warning

Model GPT tidak melakukan operasi aritmatika dengan baik. Anda mungkin melihat bahwa meskipun ekspresi yang ditulis itu sudah benar, angka yang dihitung tidaklah demikian.

Coba tambahkan frase: Give the expression as answer, not a number untuk menonaktifkan perhitungan.

Anda mungkin tertarik dengan MRKL2, paradigma penggabungan GPT dengan alat eksternal seperti kalkulator, untuk memecahkan masalah ini.

Ditulis oleh zeyuzhao.

Footnotes

  1. LSAT (Law School Admission Test) adalah tes standar yang digunakan oleh sekolah hukum di Amerika Serikat untuk menilai kemampuan berpikir kritis dan penalaran analitis calon mahasiswa. 2

  2. Karpas, E., Abend, O., Belinkov, Y., Lenz, B., Lieber, O., Ratner, N., Shoham, Y., Bata, H., Levine, Y., Leyton-Brown, K., Muhlgay, D., Rozen, N., Schwartz, E., Shachaf, G., Shalev-Shwartz, S., Shashua, A., & Tenenholtz, M. (2022).

Edit this page
Word count: 0
Copyright © 2024 Learn Prompting.