Skip to main content
The Keyword
Hear a podcast discussion about Gemini’s multimodal capabilities.

The latest episode of the Google AI: Release Notes podcast focuses on how Gemini was built from the ground up as a multimodal model — meaning a model that works with text, images, video and documents.

Host Logan Kilpatrick chats with Anirudh Baddepudi, the product lead for Gemini's multimodal vision capabilities. They discuss how Gemini understands and reasons about images, video and documents, the future of product experiences when "everything is vision" and how these capabilities are creating new ways for developers and users to use Gemini.

Watch the full conversation below, or listen to the Google AI: Release Notes podcast on Apple Podcasts or Spotify.

Related stories

Let’s stay in touch. Get the latest news from Google in your inbox.

Subscribe