The Multimodal AI Boom

December 11, 2024

4 minutes

🟢easy Reading Level

Hey there!

Welcome to the latest edition of the Learn Prompting newsletter.

Remember when ChatGPT stunned the world by making large language models (LLMs) accessible to almost anyone? Back in 2023, LLMs were the tech world's hottest topic, dominating headlines and conversations.

But we're almost in 2025 now. And the predictions about 2024 being the year of multimodal AI have proven spot on. Especially for models that combine vision and language capabilities.

Multimodal AI combines different data types, like text and visuals, into one cohesive model. As 2024 wraps up, the multimodal AI boom is in full swing, with major generative AI players rolling out their latest creations. Here's the breakdown:

Video Models: The Hot Topic This Week

Veo is impressive! Here's an example of a frame from a video generated by Google Veo. Prompt: Timelapse of a common sunflower opening, dark background

This week, both Google and OpenAI launched video generation models, almost as if they coordinated their releases. But hold your excitement—access is still highly restricted.

Key updates:

Google introduced Veo, its first AI model for video creation, available via Google's Vertex AI platform. Want to try it? You'll need to join the Trusted Tester Waitlist (must be 18+ and in the U.S.).
OpenAI's Sora video generation model has also launched—well, sort of. It's currently available in selected regions, and some features, like generating videos with real human-like characters, are limited to select individual users. For instance, users in the EU can't access it yet.
A rising star, Luma AI, known for their Dream Machine app for generating visuals, unveiled Ray 2, a next-gen video model designed for creative video generation on AWS. The Dream Machine app is now live on the web and mobile.

Here's an example of an image generated using the Dream Machine app. Prompt: an astronaut flying in space. Dream Machine app can also generate video right from this image!

Image Models

On the image generation front, huge updates are shaking up the scene:

Grok is updated with image generation capabilities, a model code-named Aurora! This could be a real competitor to existing image-generation models. Another update is that Grok is now available to everyone, even free users can now send 10 messages every 2 hours.

Here's how Grok sees Learn Prompting on X. Prompt: Draw me

Google unveiled Genie 2, their large-scale 3D foundation model. Following up on Fei-Fei Li's World Labs, this model lets users create expansive 3D environments.
Amazon introduced a new family of multimodal models called Nova, featuring:
- Four text-generation models: Micro, Lite, Pro, and Premier (Premier debuts in early 2025)
- An image-generation model, Nova Canvas, and a video-generation model, Nova Reel, both launched on AWS this week

Other GenAI News

Some notable updates from other players in the AI race:

Microsoft's Copilot can now browse the web with you. Dubbed Copilot Vision, this feature is rolling out to a limited number of Pro subscribers in the U.S., allowing users to browse alongside the AI in Edge. Looks like a direct follow-up to Claude's computer use, although Claude is supposed to work with any app on your computer.
Reddit is testing AI-powered Reddit Answers, a feature designed to provide quick responses based on platform posts. It's accessible via a new button on Reddit's homepage, leading to a dedicated Q&A page. Initially, it's available to a limited U.S. audience.
OpenAI has rolled out Canvas to all users, including free accounts. Canvas is a new interface for ChatGPT that lets you collaborate on writing and coding projects. It opens in a separate window, offering a more immersive way to work with ChatGPT.

Explore Generative AI with Our Courses

Multimodal AI is evolving fast, with models blending vision, language, and so much more. But why just watch it happen when you can be part of the action?

We've put together 15 specialized courses under one subscription to help you get hands-on with the tools and techniques driving Generative AI. Whether you're a beginner or looking to level up, there's something here for you.

Start Learning for Free

10 Predictions from the State of AI Report 2024

The State of AI Report 2024 was recently released as a 200+ slides-long presentation. We summarized 10 predictions for 2025 for you:

$10B+ Sovereign Investment triggers national security review
No-code App Success goes viral in the App Store
Data Collection Reforms after legal trials
Softer EU AI Act implementation due to overregulation concerns
Open-source Model Surpasses OpenAI o1 in reasoning benchmarks
NVIDIA's Dominance Continues with no significant market challenges
Humanoid Investment Declines due to product-market fit struggles
Apple's On-device AI drives personal AI assistant momentum
AI-generated Research Paper accepted at a major ML conference
AI-driven Video Game achieves mainstream success

We also collected the key takeaways in one article. Take a look.

Thanks for reading this week's newsletter!

If you enjoyed these insights about AI developments and would like to stay updated, you can subscribe below to get the latest news delivered straight to your inbox.

See you next week!

Valeriia Kuka

Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses