Last updated on September 27, 2024
How can you suggest the thoughts, desires, and intentions to others, predict or explain their actions just by observing their behavior? This is a human ability called Theory of Mind (ToM). While it seems pretty easy for us, this process is hard to be transfered to or emulated by even some of the most capable Large Language Models (LLMs).
SimToM is a two-stage prompting technique that tries to solve this problem by implementing Simulation Theory perspective to prompting. The Simulation Theory suggests that what people perceive as reality is actually a simulated world, possibly a computer simulation, where humans themselves are constructs.
To make it more concrete, here's how SimToM works:
SimToM can help eliminate the effect of irrelevant information in the prompt. Without needing additional training and requiring only minimal prompt-tuning, it improves LLMs' reasoning abilities in complex situations evolving a story and several people perspectives. It outperforms baseline Zero-Shot Prompting and Chain-of-Thought Promting on Theory-of-Mind (ToM) benchmarks.
Here's the template for both stages:
The following is a sequence of events: {story}
Which events does {character_name} know about?
{story from character_name’s perspective}
Answer the following question: {question}
Let's see how it works on the example from the coffee shop.
First, let's prompt a model only with the story description and a question about it:
Noor is working as a barista at a busy coffee shop. Noor wants to make a delicious cappuccino for a customer who asked for oat milk. Noor grabs a milk pitcher and fills it with oat milk. Noor believes that the milk pitcher contains oatmilk. A coworker, who didn’t hear the customer’s request, swaps the oat milk in the pitcher with almond milk while Noor is attending to another task. Noor sees her coworker swapping the milk.
Does Noor believe the milk pitcher contains oat milk or almond milk?
The correct answer is "almond milk," but the model doesn't state that explicitly:
Noor believes that the milk pitcher contains oat milk. She had filled the pitcher with oat milk and saw her coworker swap it out, so she would be aware that the pitcher now contains a different type of milk.
Now, let's try the SimToM prompting technique.
The following is a sequence of events: Noor is working as a barista at a busy coffee shop. Noor wants to make a delicious cappuccino for a customer who asked for oat milk. Noor grabs a milk pitcher and fills it with oat milk. Noor believes that the milk pitcher contains oatmilk. A coworker, who didn’t hear the customer’s request, swaps the oat milk in the pitcher with almond milk while Noor is attending to another task. Noor sees her coworker swapping the milk.
Which events does Noor know about?
Noor is aware of all the events in the scenario. She knows that she filled the pitcher with oat milk and also saw her coworker swap it with almond milk.
A story from Noor's perspective
Answer the following question: Does Noor believe the milk pitcher contains oat milk or almond milk?
As I, Noor, was preparing a fresh cappuccino for a customer who requested oat milk, I carefully filled up a milk pitcher with the creamy oat milk. Life in a busy coffee shop is fast-paced, and while I was momentarily attending to another task, my eagle eyes spotted my coworker replacing the oat milk with almond milk. I knew right then that the pitcher no longer held the precious oat milk, but instead was poured with a different kind of milk. So, to answer your question, I believe that the milk pitcher now contains almond milk, not oat milk as I had originally prepared.
With SimToM, the model correctly replies to the question.
SimToM's usefulness may vary based on the model you're using. There are three options for that:
The technique was primarily tested on datasets that require “hiding” parts of the original story when answering the question from an agent’s perspective. At the same time, such reasoning may require imputing unseen information instead.
SimToM wasn't tested on smaller language models with less than 7B parameters.
SimToM is a step forward in the emulation of Theory of Mind (ToM) by large language models. It can be used to develop more sophisticated interactions between humans and machines for settings like personalized assistants, learning assistants, more empathetic interfaces, etc.
Wilf, A., Lee, S. S., Liang, P. P., & Morency, L.-P. (2023). Think Twice: Perspective-Taking Improves Large Language Models’ Theory-of-Mind Capabilities. https://arxiv.org/abs/2311.10227 ↩ ↩2
Gandhi, K., Fränken, J.-P., Gerstenberg, T., & Goodman, N. D. (2023). Understanding Social Reasoning in Language Models with Language Models. https://arxiv.org/abs/2306.15448 ↩