Since its release at the 2024 Worldwide Developers Conference (WWDC), Apple Intelligence has made headlines across all major tech news platforms and social media.

Contrary to general-purpose models like Gemini and ChatGPT, Apple Intelligence consists of numerous highly capable generative models that are fast, efficient, and tailored to seamlessly integrate into Apple users' daily lives. These models, called Apple Foundation Models (AFMs), are optimized for tasks such as crafting and refining text, summarizing notifications, generating playful images, and automating actions across apps—delivering convenience and creativity at every turn.

In this article, we’ll explore the architecture, data practices, and optimization strategies behind Apple Intelligence. We'll also highlight how Apple balances performance with its core commitment to user privacy.

The Models Behind Apple Intelligence

Apple Intelligence operates with two core models:

AFM-on-device: A lightweight ~3-billion-parameter model optimized for edge devices.
AFM-server: A robust server-based model for more intensive tasks.

Beyond these, Apple Intelligence also includes a coding model for developers and a diffusion model for generating visual content.

AFM models follow four key responsible AI principles:

Empower users with intelligent tools: AI should meet user needs responsibly.
Design with care: Minimize potential misuse or harm.
Protect privacy: Treat user data with the utmost caution.

Let’s dive deeper into the architecture powering these models.

Architecture

AFM models use a modified transformer architecture with several innovative features:

Memory-efficient design: A shared input/output embedding matrix reduces parameter size.
Training stability: Techniques like RMSNorm and query/key normalization improve model training.
Efficient attention: Grouped-query attention (GQA) minimizes memory use while maintaining quality.
Enhanced activation: SwiGLU activation improves computational efficiency.
Long-context support: RoPE embeddings extend capabilities for long inputs.

The table below outlines the AFM-on-device dimensions:

AMF-on-device dimensions

Apple employs runtime-swappable adapters, enabling a single model to specialize in dozens of tasks without bloating its architecture. Here’s an overview of the adapter-based design:

Architecture of Apple Intelligence with adapters

Optimizations for Speed and Efficiency

Apple Intelligence is designed for everyday use on resource-constrained edge devices. To achieve high performance with minimal latency and power consumption, Apple employs:

Quantization: Reduces model size without compromising accuracy, including mixed-precision techniques that compress weights to 3.5 bits.
Talaria tool: Optimizes bit rates by analyzing model latency and power use.
Adapters: Make smaller models as effective as larger ones while maintaining task-specific efficiency.

Data: Quality Over Quantity

Generative models are data-hungry models; however, the quality of data is as important as the quantity fed to the model. The data used to train AFM includes:

Web pages: Consist of publicly available information crawled using Applebot, a web crawler.
Licensed data from publishers: Consist of a limited amount of high-quality data from publishers.
Open-source datasets: Consist of data from publicly available datasets and code repositories on GitHub.

It is important to note that data from users was not used to train AFM. Explicit and inappropriate content, personally identifiable information, profanity, and unsafe material were removed from the data before training. In addition to human-generated data, synthetic data is also used to enhance data quality and diversity.

Performance Highlights

AFM powers several applications within the Apple ecosystem that involve tasks such as writing, following instructions, solving math problems, using external tools, and more. Let's look at how AFM-powered applications perform in each of these areas:

User preference: Human evaluations consistently rate AFM higher than competing models like Mistral, Gemma, and GPT.

Comparison of AFM with other models

Instruction following: AFM outperforms competitors in understanding and executing user prompts.

Instruction following capabilities of AFM

Tool use: AFM-server achieves superior accuracy in selecting tools for specific tasks.

Tool Use via Function Calling Benchmarks results

Writing tasks: AFM models lead in summarization, composition, and more.

AFM performance on writing tasks

Math performance: AFM-on-device surpasses Mistral-7B and Gemini-7B on math benchmarks.

AFM performance on math tasks

Conclusion

In conclusion, Apple Intelligence represents a significant advancement in AI tailored specifically for the Apple ecosystem. By combining the efficiency of AFM-on-device models with the processing power of AFM-server, Apple has successfully integrated AI that not only enhances the user experience across its devices but also prioritizes privacy, security, and responsible AI practices. These models, built on a refined transformer architecture, employ innovative memory and processing optimizations to make high-quality, real-time AI interactions possible even on edge devices.

The rigorous data curation and reliance on synthetic data ensure that Apple Intelligence serves users without compromising their personal information. Through AFM's architecture, optimizations, and data handling practices, Apple has set a new standard for user-centric AI, paving the way for future advancements that are both powerful and secure.

Footnotes

Apple. (2024). Apple Intelligence Foundation Language Models. https://arxiv.org/abs/2407.21075 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸

Bhuwan Bhatt

Bhuwan Bhatt, a Machine Learning Engineer with over 5 years of industry experience, is passionate about solving complex challenges at the intersection of machine learning and Python programming. Bhuwan has contributed his expertise to leading companies, driving innovation in AI/ML projects. Beyond his professional endeavors, Bhuwan is deeply committed to sharing his knowledge and experiences with others in the field. He firmly believes in continuous improvement, striving to grow by 1% each day in both his technical skills and personal development.

On this page

Introduction
The Models Behind Apple Intelligence
Data: Quality Over Quantity
Performance Highlights
Conclusion

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses

Apple Intelligence Models

Introduction