Gemini Omni & 3.5 Flash 2026: Video AI Meets Agents

TL;DR

Gemini Omni generates and edits video from multimodal inputs (text, image, audio, video) through conversational prompts, maintaining scene consistency across edits
Gemini 3.5 Flash delivers frontier-level intelligence optimized for agentic workflows, now powering Search, Gemini app, and enterprise automation via Antigravity
Both models are generally available: Omni to Google AI subscribers and YouTube tools, 3.5 Flash via APIs and consumer products globally
Gemini Spark, a personal AI agent running on 3.5 Flash, launches for Google AI Ultra subscribers with deep Workspace integration

What Happened

At Google I/O 2026, Google launched two distinct model families targeting different AI capabilities. Gemini Omni represents Google’s entry into conversational video creation—users can generate video from any combination of text, image, audio, or video inputs, then edit results through natural language instructions. Unlike traditional video editing, Omni maintains character consistency, physical coherence, and scene memory across multiple editing rounds.

Gemini 3.5 Flash takes a different approach. It’s built for “long-horizon agentic tasks”—complex, multi-step workflows that require sustained reasoning and action. Google positions it as rivaling large flagship models while maintaining the speed of the Flash series. The model powers several new consumer features: information agents in Search (launching for Pro/Ultra subscribers this summer), custom generative UI that builds interactive tools on-the-fly, and Gemini Spark, a 24/7 personal agent integrated with Workspace.

Both models are rolling out now. Omni reaches all Google AI Plus, Pro, and Ultra subscribers through the Gemini app and Google Flow, plus YouTube Shorts and YouTube Create at no cost. 3.5 Flash is generally available via Antigravity, Gemini API, AI Studio, Android Studio, and powers the Gemini app globally.

Why It Matters

Google is making two strategic bets with this dual release: that video will become a first-class interface for AI interaction, and that agentic capabilities will define the next frontier of model utility.

For developers, Omni opens video generation beyond prompt-to-clip workflows. The conversational editing model—where each instruction builds on previous context—suggests a new UX pattern for creative tools. If physics and character consistency hold up at scale (Google’s demos show promise), this could compress weeks of VFX work into iterative conversations. The immediate availability through APIs means third-party apps can integrate this capability without building their own video models.

3.5 Flash matters more for enterprise automation and agent deployment. Google’s emphasis on “long-horizon tasks” and integration with Antigravity (their agent orchestration framework) signals a push beyond single-turn AI responses. The model’s ability to “execute multi-step workflows and coding tasks while sustaining frontier performance” directly challenges Anthropic’s Claude in the agentic coding space and positions Google for the emerging agent-to-agent coordination market.

Key Details

Gemini Omni Capabilities

Input modalities: Text, image, audio, video (any combination)
Output: High-quality video generation
Key feature: Multi-turn editing with scene/character consistency
Availability: Google AI Plus/Pro/Ultra subscribers, YouTube Shorts/Create (free), APIs for developers

Gemini 3.5 Flash Specifications

Performance: Frontier-level intelligence matching large flagship models
Speed: Maintained Flash series latency
Primary use case: Agentic workflows, complex coding tasks
Integration: Antigravity harness for collaborative sub-agents
Availability: Gemini API, AI Studio, Android Studio, Antigravity, Gemini app (global default model)

New Consumer Features (Powered by 3.5 Flash)

Information agents: Background reasoning for personalized updates (Pro/Ultra, summer 2026)
Generative UI in Search: Custom visual tools and simulations built on-the-fly (free, summer 2026)
Custom experiences: Persistent dashboards, trackers, mini apps (Pro/Ultra first, coming months)
Gemini Spark: 24/7 personal agent with Workspace integration (Ultra subscribers, U.S. only)

Implications

Google’s release strategy reveals their response to OpenAI’s Sora and Anthropic’s agent-focused positioning. By shipping Omni to consumers first through YouTube—a platform with 2.5 billion users—they’re betting on distribution over model superiority. If Omni can handle basic content creation at acceptable quality, YouTube creators gain a native video editing assistant without leaving the platform.

The 3.5 Flash deployment is more aggressive. Making it the default model in Gemini app globally means hundreds of millions of users now interact with an agent-optimized model, even if they’re not explicitly using agentic features. This creates a feedback loop: more agent interactions generate more training data for future agentic models.

The Antigravity integration is the real signal. Google is treating agent orchestration as infrastructure, not a feature. By offering 3.5 Flash through Antigravity with “collaborative sub-agents” for enterprise customers, they’re positioning for a future where companies deploy fleets of specialized AI agents. The demo showing automatic asset renaming and categorization at scale suggests Google sees agent orchestration as the enterprise moat, not the underlying model.

Our Take

Google shipped two genuinely different products here, which is smarter than forcing both capabilities into a single model. Omni’s conversational video editing will live or die on consistency—if characters drift or physics break across edits, users will revert to traditional tools. The YouTube integration is shrewd; creators will tolerate quality issues if the workflow is 10x faster.

3.5 Flash is the more consequential release. The agent infrastructure play matters more than the model benchmarks. By making Antigravity a first-class deployment path and integrating it with Search and Workspace, Google is building the distribution advantage that OpenAI lacks. Enterprises won’t adopt agents because the model is 2% better—they’ll adopt because Google provides the orchestration layer, monitoring, and integration with tools they already use.

Watch for three signals:

Whether third-party developers build on Omni’s multi-turn video editing or treat it as a novelty
How quickly enterprise customers move from proof-of-concept to production agent deployments using 3.5 Flash + Antigravity
Whether Gemini Spark gains traction or becomes another Google product that users ignore despite technical capability

The wildcard is pricing. Google announced availability but not API pricing for Omni. If video generation costs make this impractical for anything beyond hobbyist use, distribution through YouTube won’t matter. For 3.5 Flash, the real test is whether “frontier performance at Flash speed” means lower costs than GPT-4 Turbo for equivalent quality—that’s the only metric enterprise buyers care about.