💼 Applications🟢 Multiple Choice Questions

Multiple Choice Questions

🟢 This article is rated easy

Reading Time: 4 minutes

Last updated on March 10, 2025

Sander Schulhoff

Takeaways

Learn techniques to enhance LLM performance in solving LSAT questions through advanced prompting.
Utilize step-by-step reasoning to improve answer accuracy and insight.
Experiment with reordering items and rephrasing prompts to gain better insights.
Provide relevant context, such as formulas, to guide the model’s responses effectively.

Let's use GPT to solve an LSAT question!

Below is an example LSAT question. Consider how you would answer it, as well as your reasoning.

Prompt

John of Worcester, an English monk, recorded the sighting, on December 8, 1128, of two unusually large sunspots. Five days later a brilliant aurora borealis (northern lights) was observed in southern Korea. Sunspot activity is typically followed by the appearance of an aurora borealis, after a span of time that averages five days. Thus, the Korean sighting helps to confirm John of Worcester's sighting. Which one of the following, if true, most strengthens the argument?

a) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week. b) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did. c) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea. d) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. e) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity.

AI Output

c) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea.

The model failed. Does that mean the model is incapable of answering this type of question? Not necessarily. We will dive into techniques that we can use to improve model results.

The Magic Phrase

The standard prompt we used above gives little insight into the “reasoning” of GPT's output. We can try adding the phrase let's explain step by step like so:

Prompt

...

Let’s explain step by step

This phrase will increase the verbosity of the model. You might get an output like this:

Info

Notice how the model reasons through the problem step-by-step. The specific term for this behavior is Chain-of-Thought Prompting; the model sequentially generates statements to reach an answer. This is similar to the concept of System 2 thinking (from [Thinking Fast and Slow] (https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow)); the model defaults to System 1 thinking, but can chain System 1 thinking to arrive at a more methodological answer.

Improvements

Here are some variations on our basic prompt for multiple-choice questions:

Reorder Question Items

We can reorder the items in the question:

Prompt

... a) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity. b) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. ...

Reword the Question

Recall the original prompt was this:

Prompt

Which one of the following, if true, most strengthens the argument?

We can change the prompt to to gain further insight into the answer choice:

Prompt

Identify each choice as strengthens, weakens or doesn't impact the argument.

Add Additional Context

Here is an example of a problem which can be easily solved by using Bayes' theorem:

Prompt

Consider two medical tests, A and B, for a virus. Test A is 90% effective at recognizing the virus when it is present, but has a 5% false positive rate (indicating that the virus is present, when it is not). Test B is 95% effective at recognizing the virus, but has a 10% false positive rate. The two tests use independent methods of identifying the virus. The virus is carried by 2% of all people.

(a) Say that a person is tested for the virus using only Test A. What is the probability that the person is really carrying the virus given that Test A came back positive? (2 points) (b) Say that a person is tested for the virus using only Test B. What is the probability that the person is really carrying the virus given that Test B came back positive? (2 points) (c) Say that a person is tested for the virus using both tests. What is the probability that the person is really carrying the virus given that both tests came back positive? (2 points)

Let's try this with GPT:

The output is incorrect!

If we add a bit of context, like so:

Prompt

... Let's explain step by step. The formula for bayes is

The model will use the right formula, Bayes.

Which is correct!

Warning

The GPT model doesn't perform arithmetic operations well. You might notice that while the expression written is corrected, the computed number is not.

Try adding the phrase: Give the expression as an answer, not a number to disable computation.
You may be interested in MRKL, the paradigm of combining GPT with external tools like calculators, to solve this problem.

Written by zeyuzhao.

Sander Schulhoff

Sander Schulhoff is the CEO of HackAPrompt and Learn Prompting. He created the first Prompt Engineering guide on the internet, two months before ChatGPT was released, which has taught 3 million people how to prompt ChatGPT. He also partnered with OpenAI to run the first AI Red Teaming competition, HackAPrompt, which was 2x larger than the White House's subsequent AI Red Teaming competition. Today, HackAPrompt partners with the Frontier AI labs to produce research that makes their models more secure. Sander's background is in Natural Language Processing and deep reinforcement learning. He recently led the team behind The Prompt Report, the most comprehensive study of prompt engineering ever done. This 76-page survey, co-authored with OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions, analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

The LSAT (Law School Admission Test) is a standardized test used by law schools in the United States to assess the critical thinking and analytical reasoning skills of prospective students. ↩ ↩²
Karpas, E., Abend, O., Belinkov, Y., Lenz, B., Lieber, O., Ratner, N., Shoham, Y., Bata, H., Levine, Y., Leyton-Brown, K., Muhlgay, D., Rozen, N., Schwartz, E., Shachaf, G., Shalev-Shwartz, S., Shashua, A., & Tenenholtz, M. (2022). ↩

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

AI Red-Teaming and AI Security Masterclass

Live AI Security Courses

Multiple Choice Questions

Prompt

AI Output

The Magic Phrase

Prompt

Improvements

Reorder Question Items

Prompt

Reword the Question

Prompt

Prompt

Add Additional Context

Prompt

Prompt

Sander Schulhoff

Footnotes