"By My Eyes" is a novel approach for integrating sensor data into Multimodal Large Language Models (MLLMs) by transforming long sequences of sensor data into visual inputs, such as graphs and plots. This method uses visual prompting to guide MLLMs in performing sensory tasks (e.g., human activity recognition, health monitoring) more efficiently and accurately than text-based methods.
Text-based methods that use raw sensor data in LLM prompts face challenges like:
"By My Eyes" introduces visual prompts to represent sensor data as images (e.g., waveforms, spectrograms), making it easier for MLLMs to interpret. The key innovation is a visualization generator that automatically converts sensor data into optimal visual representations. This reduces token costs and enhances performance across various sensory tasks.
Steps of the method:
"By My Eyes" was tested on nine sensory tasks across four modalities (accelerometer, ECG, EMG, and respiration sensors). The approach consistently outperformed text-based prompts, showing:
Dataset | Modality | Task | Text Prompt Accuracy | Visual Prompt Accuracy | Token Reduction |
---|---|---|---|---|---|
HHAR | Accelerometer | Human activity recognition | 66% | 67% | 26.2Γ |
PTB-XL | ECG | Arrhythmia detection | 73% | 80% | 3.4Γ |
WESAD | Respiration | Stress detection | 48% | 61% | 49.8Γ |
The visual prompts also led to more efficient use of tokens, allowing MLLMs to handle larger datasets and more complex tasks without sacrificing accuracy.
The "By My Eyes" method provides a cost-effective and performance-boosting solution for handling sensor data in MLLMs. By transforming raw sensor data into visual prompts, it addresses the limitations of text-based approaches, making it easier for LLMs to solve real-world sensory tasks in fields like healthcare, environmental monitoring, and human activity recognition.
Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.