🧠 AdvancedZero-Shot🟦 SimToM

🟦 SimToM Prompting

🟦 This article is rated medium

Reading Time: 4 minutes

Last updated on September 27, 2024

Takeaways

SimToM enhances LLMs' ability to understand and predict human thoughts and actions using a two-stage process: Perspective-Taking and Question-Answering.
In Stage 1, the model identifies which events a character is aware of, and in Stage 2, it answers questions from that character’s perspective.
SimToM outperforms traditional prompting techniques like Zero-Shot and Chain-of-Thought in tasks requiring Theory of Mind (ToM).
Limitations include its varied usefulness depending on the LLM's existing abilities and the model size (not tested on models smaller than 7B parameters).

What is SimToM?

How can you suggest the thoughts, desires, and intentions to others, predict or explain their actions just by observing their behavior? This is a human ability called Theory of Mind (ToM). While it seems pretty easy for us, this process is hard to be transfered to or emulated by even some of the most capable Large Language Models (LLMs).

SimToM is a two-stage prompting technique that tries to solve this problem by implementing Simulation Theory perspective to prompting. The Simulation Theory suggests that what people perceive as reality is actually a simulated world, possibly a computer simulation, where humans themselves are constructs.

To make it more concrete, here's how SimToM works:

Stage 1, Perspective-Taking: SimToM filters context based on what the character in question knows
Stage 2, Question-Answering: SimToM answers a question about this character's mental state

SimToM can help eliminate the effect of irrelevant information in the prompt. Without needing additional training and requiring only minimal prompt-tuning, it improves LLMs' reasoning abilities in complex situations evolving a story and several people perspectives. It outperforms baseline Zero-Shot Prompting and Chain-of-Thought Promting on Theory-of-Mind (ToM) benchmarks.

How to Use SimToM?

Here's the template for both stages:

Stage 1, Perspective-Taking

The following is a sequence of events: {story}

Which events does {character_name} know about?

Stage 2, Question-Answering

{story from character_name’s perspective}

Answer the following question: {question}

Example Situation

Let's see how it works on the example from the coffee shop.

First, let's prompt a model only with the story description and a question about it:

Simple prompt

Noor is working as a barista at a busy coffee shop. Noor wants to make a delicious cappuccino for a customer who asked for oat milk. Noor grabs a milk pitcher and fills it with oat milk. Noor believes that the milk pitcher contains oatmilk. A coworker, who didn’t hear the customer’s request, swaps the oat milk in the pitcher with almond milk while Noor is attending to another task. Noor sees her coworker swapping the milk.

Does Noor believe the milk pitcher contains oat milk or almond milk?

The correct answer is "almond milk," but the model doesn't state that explicitly:

AI Output

Noor believes that the milk pitcher contains oat milk. She had filled the pitcher with oat milk and saw her coworker swap it out, so she would be aware that the pitcher now contains a different type of milk.

Now, let's try the SimToM prompting technique.

Stage 1, Perspective-Taking

The following is a sequence of events: Noor is working as a barista at a busy coffee shop. Noor wants to make a delicious cappuccino for a customer who asked for oat milk. Noor grabs a milk pitcher and fills it with oat milk. Noor believes that the milk pitcher contains oatmilk. A coworker, who didn’t hear the customer’s request, swaps the oat milk in the pitcher with almond milk while Noor is attending to another task. Noor sees her coworker swapping the milk.

Which events does Noor know about?

AI Output

Noor is aware of all the events in the scenario. She knows that she filled the pitcher with oat milk and also saw her coworker swap it with almond milk.

Stage 2, Question-Answering

A story from Noor's perspective

Answer the following question: Does Noor believe the milk pitcher contains oat milk or almond milk?

AI Output

As I, Noor, was preparing a fresh cappuccino for a customer who requested oat milk, I carefully filled up a milk pitcher with the creamy oat milk. Life in a busy coffee shop is fast-paced, and while I was momentarily attending to another task, my eagle eyes spotted my coworker replacing the oat milk with almond milk. I knew right then that the pitcher no longer held the precious oat milk, but instead was poured with a different kind of milk. So, to answer your question, I believe that the milk pitcher now contains almond milk, not oat milk as I had originally prepared.

With SimToM, the model correctly replies to the question.

Note

In the original SimToM paper, the authors offer some model-specific prompts. Feel free to check it out if you're interested.

Limitations of SimToM

SimToM's usefulness may vary based on the model you're using. There are three options for that:

The model is already good at Theory of Mind (ToM)-related questions and SimToM is not needed anymore
The model doesn't provide correct answers, but SimToM doesn't help neither
The model doesn't provide correct answers and SimToM helps with that

The technique was primarily tested on datasets that require “hiding” parts of the original story when answering the question from an agent’s perspective. At the same time, such reasoning may require imputing unseen information instead.

SimToM wasn't tested on smaller language models with less than 7B parameters.

Conclusion

SimToM is a step forward in the emulation of Theory of Mind (ToM) by large language models. It can be used to develop more sophisticated interactions between humans and machines for settings like personalized assistants, learning assistants, more empathetic interfaces, etc.

Valeriia Kuka

Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.

Footnotes

Wilf, A., Lee, S. S., Liang, P. P., & Morency, L.-P. (2023). Think Twice: Perspective-Taking Improves Large Language Models’ Theory-of-Mind Capabilities. https://arxiv.org/abs/2311.10227 ↩ ↩²
Gandhi, K., Fränken, J.-P., Gerstenberg, T., & Goodman, N. D. (2023). Understanding Social Reasoning in Language Models with Language Models. https://arxiv.org/abs/2306.15448 ↩