Tech

Gemini 2.0 to Release Real-Time Audio, Video-Streaming Input Tools in New Update

December 18, 2024

550

Gemini 2.0 Flash

Gemini 2.0 Flash builds on the success of 1.5 Flash but Gemini 2.0 Flash outperforms 1.5 Pro on key benchmarks, at twice the speed.

Apart from supporting multimodal inputs like images, video and audio, 2.0 Flash now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, code execution as well as third-party user-defined functions.

Gemini 2.0 Flash is available now as an experimental model to developers via the Gemini API in Google AI Studio and Vertex AI with multimodal input and text output available to all developers, and text-to-speech and native image generation available to early-access partners. General availability will follow in January, along with more model sizes.

Google is also releasing a new Multimodal Live API that has real-time audio, video-streaming input and the ability to use multiple, combined tools.

Gemini 2.0 available in Gemini app, our AI assistant

Gemini users globally can access a chat optimized version of 2.0 Flash experimental by selecting it in the model drop-down on desktop and mobile web and it will be available in the Gemini mobile app soon. With this new model, users can experience an even more helpful Gemini assistant.

Unlocking agentic experiences with Gemini 2.0

Gemini 2.0 Flash’s native user interface action-capabilities, along with other improvements like multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use and improved latency, all work in concert to enable a new class of agentic experiences.

The practical application of AI agents is a research area full of exciting possibilities. Project Astra, Gemini’s research prototype exploring future capabilities of a universal AI assistant; the new Project Mariner, which explores the future of human-agent interaction, starting with your browser; and Jules, an AI-powered code agent that can help developers.

Project Astra: agents using multimodal understanding in the real world

Project Astra‘s latest version now has the ability to converse in multiple languages and in mixed languages, with a better understanding of accents and uncommon words. Users can use Google Search, Lens and Maps and it can remember things while keeping you in control. It now has up to 10 minutes of in-session memory. It also has new streaming capabilities and native audio understanding, the agent can understand language at about the latency of human conversation.

Project Mariner: agents that can help you accomplish complex tasks

Project Mariner is an early research prototype built with Gemini 2.0 that explores the future of human-agent interaction, starting with your browser. As a research prototype, it’s able to understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms, and then uses that information via an experimental Chrome extension to complete tasks for you.

Project Mariner shows that it’s becoming technically possible to navigate within a browser, even though it’s not always accurate and slow to complete tasks today, which will improve rapidly over time.

Project Mariner can only type, scroll or click in the active tab on your browser and it asks users for final confirmation before taking certain sensitive actions, like purchasing something.

Trusted testers are starting to test Project Mariner using an experimental Chrome extension now.

Jules: agents for developers

Next, Google is exploring how AI agents can assist developers with Jules — an experimental AI-powered code agent that integrates directly into a GitHub workflow. It can tackle an issue, develop a plan and execute it, all under a developer’s direction and supervision.

Agents in games and other domains

Google DeepMind has a long history of using games to help AI models become better at following rules, planning and logic. Just last week, Google introduced Genie 2, its AI model that can create an endless variety of playable 3D worlds — all from a single image. I has also built agents using Gemini 2.0 that can users to navigate the virtual world of video games. It can reason about the game based solely on the action on the screen, and offer up suggestions for what to do next in real time conversation.

Gemini 2.0 to Release Real-Time Audio, Video-Streaming Input Tools in New Update

Gemini 2.0 Flash

Gemini 2.0 available in Gemini app, our AI assistant

Unlocking agentic experiences with Gemini 2.0

Project Astra: agents using multimodal understanding in the real world

Project Mariner: agents that can help you accomplish complex tasks

Jules: agents for developers

Agents in games and other domains

LEAVE A REPLY

Editor Picks

Canva buys UK’s Cavalry and US-based MangoAI to bolster AI creative suite

Africar Group Expands Auto Data Footprint With New Pricing Platforms

Spiro Raises $50 Million as Demand Surges Across Africa

Popular Posts

How To Make Your Phone Busy When Someone Calls

IT Guru Attempts To Crack Secret Safaricom Scratch Card Code

Ampion among winners of the 2015 Innovation Ecosystem Awards

Popular Category

ABOUT US

TechMoran