The Sky Is Not Falling

AI Video Generation After Sora

Mar 29, 2026

Article voiceover

0:00

-16:08

On December 11, 2024, I published an AI music video called “I’m so Sora.” The music track was generated with Udio, while the background visuals were produced with the initial version of OpenAI’s video model Sora. OpenAI had announced Sora in February of that year with demonstrations so cinematic they induced something close to existential panic in Hollywood. But by the time most of us could actually use it in early December, the gap between the marketing hype and the actual product was substantial. OpenAI had overpromised and underdelivered.

The “I’m so Sora” video I made was a fun experiment. But it was the only one I ever created with Sora. The visual quality fell short. Better options were already emerging, and by early 2025, I was making most of my music video experiments with Kling or Veo instead. So when OpenAI announced on March 24, 2026, that it was discontinuing the Sora app, the API, and the sora.com website, I was not particularly surprised.

What I find interesting about the public reaction to that announcement is how quickly commentators leapt from “Sora is dead” to “AI video is dead.” The word sora means “sky” in Japanese, and the metaphor writes itself: the sky has fallen, the dream is over, the bubble has burst. But the sky is not falling. What fell was a single, spectacularly unsustainable product. The technology it represented, and the competitive ecosystem that grew around it, is not merely surviving. It is accelerating in ways that everyone working in film, animation, or visual media needs to pay close attention to.

The trajectory of a cautionary tale

The arc of Sora’s life, from breathtaking demonstration to deprecated product in roughly 25 months, is worth examining in some detail, because it illustrates a pattern that recurs throughout the history of consumer technology: the company that captures the public imagination first is not necessarily the company that captures the market.

OpenAI unveiled Sora in February 2024, and for the rest of that year, the model existed primarily as a demonstration of what was theoretically possible. A limited version called “Sora Turbo” arrived in December 2024, accessible only through ChatGPT’s premium tiers. The full consumer push came nine months later, on September 30, 2025, when OpenAI launched the Sora 2 standalone application for iOS. The app was designed as a TikTok-style social network for synthetic media. Users could scan their faces and insert themselves into generated scenes and scroll through an endless feed of algorithmically produced content.

The initial numbers looked impressive. Sora topped the App Store charts within days of its launch and reached 3.3 million monthly downloads by November 2025. In December, Disney announced a three-year, $1 billion licensing deal that would have allowed Sora users to generate videos featuring over 200 iconic characters, from Mickey Mouse to Darth Vader, with plans to curate the results for Disney+.

Then the architecture collapsed, not from a single point of failure but from several converging simultaneously. By February 2026, monthly downloads had fallen roughly 67 percent, to about 1.1 million. The app was entertaining for brief experiments but lacked the sustained utility required for daily retention.

More damaging still, the platform became a showcase for exactly the content no company preparing for an IPO wants associated with its brand. Users quickly circumvented the weak moderation guardrails to produce deepfakes, including of deceased public figures such as Martin Luther King Jr. and Robin Williams, and of copyrighted characters in various states of absurdity. TechCrunch memorably called it “the creepiest app on your phone.” The families of deceased celebrities protested publicly, and entertainment guilds demanded transparency about training data provenance that OpenAI could not or would not provide.

On March 23, 2026, OpenAI published a comprehensive safety framework. The following day, it shut everything down.

Fifteen million dollars a day

The reputational damage was severe, but the decisive factor was simpler: money. Video generation is exponentially more resource-intensive than text generation. Every user request required the model to render photorealistic frames, simulate physics, and maintain temporal coherence across dynamic spatial environments. By multiple accounts, OpenAI was spending approximately $15 million per day to keep Sora operational. The platform’s total lifetime revenue from in-app purchases amounted to $2.1 million. Bill Peebles, OpenAI’s own head of Sora, acknowledged publicly that the economics were “completely unsustainable.”

The Disney partnership, which had been announced with language comparing it to the end of the silent film era, dissolved in a single morning. On March 24, teams from both companies met to discuss the integration’s future. Thirty minutes after that meeting concluded, OpenAI informed Disney it was pulling the plug on the video model entirely. No capital had changed hands. The $1 billion agreement evaporated before it had begun. For anyone who has been following the broader economics of generative AI, none of this should be surprising. The model could generate entertaining video, but it could not sustain a business.

The field Sora left behind

But here is where the “sky is falling” narrative collapses. The discontinuation of a single product, however prominent, tells us nothing about the health of the underlying technology. In the months and years during which Sora was accumulating headlines, a diverse ecosystem of competitors was quietly solving the technical problems that Sora never overcame, including temporal coherence, physics simulation, audio synchronization, and, critically, cost efficiency. By March 2026, the competitive landscape bears almost no resemblance to the one Sora entered. Platforms like Arena, where users evaluate models by comparing unlabeled outputs stripped of brand identification, offer the clearest picture of how dramatically the field has advanced.

Google’s Veo 3.1, released in its current form in January 2026, is widely recognized as the leading text-to-video model for prompt adherence and structural reliability. It supports native 1080p output with 4K upscaling, generates up to 8 seconds of continuous video at 24 frames per second, and produces synchronized dialogue, sound effects, and environmental ambiance directly from a single text prompt. Its “Ingredients to Video” feature allows users to upload reference images for character consistency across scenes. For enterprise marketing teams and narrative creators who need precise adherence to complex cinematic directions, Veo 3.1 has become the preferred engine. I myself have used an earlier version of Veo in my music video “Endless Ascent.”
Kuaishou’s Kling 3.0, emerging from China’s intensely competitive tech sector, operates at native 4K resolution at 60 frames per second and is built on a Multi-modal Visual Language (MVL) architecture that allows it to simulate fluid dynamics, gravity, and material interactions with unprecedented accuracy. Where other models produce a “rubbery” or uncanny quality when rendering water, fabric, or dynamic collisions, Kling 3.0 frequently produces results that are difficult to distinguish from physical reality. It has become a standard tool for commercial visual effects pipelines and product advertising, where material texture is paramount. Most of the videos I produced last year were generated with Kling, including “Something Simple ‘25” and “Through the Mystic Green.”
ByteDance’s SeeDance 2.0, launched domestically on Jimeng AI in February 2026, prioritizes narrative cohesion and character consistency. Trained on ByteDance’s enormous datasets of short-form mobile video, the model allows creators to feed up to twelve reference files simultaneously, ensuring that a protagonist’s facial features, lighting, and clothing remain stable across diverse camera angles. Its lip-syncing and beat-linked motion capabilities align generated actions precisely with audio tracks. For episodic content and short-form social media narratives, SeeDance 2.0 provides a level of directorial predictability that its competitors have yet to match. The YouTube channel Theoretically Media recently released an outstanding video showcasing a SeeDance 2.0 based AI video production workflow.

Luma AI’s Ray 3.14, launched in late January 2026 via its Dream Machine platform, differentiates itself by integrating chain-of-thought reasoning directly into the generation pipeline. This allows the model to holistically “think” through scene descriptions, evaluate its own outputs, and maintain strict physical and narrative logic across complex motions. Delivering native 1080p output and world-first 16-bit High Dynamic Range (HDR) color generation, Ray 3.14 drastically reduces the traditional quality-speed-cost tradeoff. Its advanced “Modify Video” capabilities enable natural-language scene editing, while a robust character reference system seamlessly locks in actor likeness and costume continuity.
RunwayML’s Gen-4.5, which rolled out to professional tiers in late 2025, cements the platform’s position as a comprehensive post-production ecosystem rather than just a standalone clip generator. Delivering up to 4K resolution, the model’s core technical differentiator is its unparalleled “world consistency.” However, Gen-4.5 truly separates itself through its deep integration with Runway’s advanced control suite. This includes “Aleph” for granular, localized in-video editing without degrading surrounding pixels, and “Act-Two,” a next-generation motion capture engine that allows directors to seamlessly map precise head, face, body, and hand movements from a driving video directly onto generated characters.
Alibaba’s Wan 2.6 takes a different strategic approach by promoting open-source accessibility. Operating as what its developers call a “short-film engine” rather than a clip generator, Wan 2.6 introduces a “Starring System” that locks onto a character’s identity via a single reference image and maintains consistency across multiple independently generated shots. It can take a narrative prompt and automatically decompose it into individual shots with transitions, camera angles, and pacing.
Lightricks’ LTX-2, released fully open-source under an Apache 2.0 license in early 2026, may be the most consequential model for independent creators and educators. Its asymmetric 19-billion parameter architecture generates up to 20 seconds of native 4K video at 50 frames per second with perfectly synchronized audio in a single unified pass. Because the audio and video latent spaces are processed simultaneously, the emotional and atmospheric cues are intrinsically linked. Most importantly, LTX-2 can run on consumer-grade GPU setups, drastically undercutting the API costs of closed-source competitors.

I need to acknowledge that this landscape is not uniformly positive. Serious concerns about copyright, training data provenance, labor displacement, problematic adult content, and deepfake proliferation remain unresolved and, in many respects, are intensifying as the technology improves. These are not problems that better models automatically solve. But the technical trajectory is unmistakable: the models replacing Sora are not merely iterating on its approach. They have leapfrogged it.

From gadgets to infrastructure

The deeper significance of Sora’s collapse is not that it failed, but how it failed. It failed as a consumer entertainment product. The technology behind it, diffusion transformer architectures for video generation, did not fail at all. OpenAI itself has not abandoned the research. Instead, it has been reported that it redirected the Sora team and its compute resources toward “world simulation” for robotics applications, feeding into the development of its next-generation multimodal model, internally codenamed “Spud.” The company concluded, correctly, that burning $15 million a day on social media videos was a less valuable use of those resources than training autonomous reasoning agents.

This pivot reflects a broader structural shift across the industry. The era in which AI video generation was a parlor trick you showed your friends is ending. What is replacing it is something more consequential and, for those of us in education, considerably more important to understand. Generative video is becoming embedded infrastructure, from pre-visualization in film production to storyboarding and rapid prototyping in animation and special effects pipelines.

For educators in film and animation programs, this transition is consequential. The students currently in our classrooms will not graduate into an industry where AI video generation is a curiosity. They will graduate into one where it is a standard production tool, integrated into editing suites like Adobe Premiere and DaVinci Resolve, and available through APIs that connect directly to professional workflows. The question will be how to teach with and about these tools in ways that develop genuine creative and critical capacity rather than passive dependence on prompt engineering.

The sky above the sky

The governing metaphor has one more turn. The Japanese word sora can refer not only to the sky but to the void, to emptiness. OpenAI named its model aspirationally, reaching for the boundless. What it discovered instead was the void in its own business model: the immense, empty space between what the technology could generate and what the market would sustain. That void swallowed a billion-dollar partnership in thirty minutes.

But the actual sky, the competitive ecosystem of video generation models, is wider and more populated than it has ever been. Google, Kuaishou, ByteDance, Alibaba, Lightricks, Runway, Luma AI - their models are more capable, more efficient, and more accessible than anything Sora ever achieved. The demise of the Sora consumer application does not signal the end of AI video generation. It signals the end of the experimental phase: the unsustainable, compute-heavy, moderation-light era of treating a profound technological capability as a social media novelty. What will follow is the professionalization of the field, with all the opportunities and responsibilities that this entails.

The sky did not fall. The scaffolding came down. The building is still going up.

The videos in this article are taken from my two YouTube channels.

Share The Augmented Educator

P.S. I believe transparency builds the trust that AI detection systems fail to enforce. That’s why I’ve published an ethics and AI disclosure statement, which outlines how I integrate AI tools into my intellectual work.

The Augmented Educator

Discussion about this post

Ready for more?