No, the AI Genie Won't Go Back Into the Bottle

Why the Potential “AI Bubble Burst” Won’t Matter at All

Nov 16, 2025

∙ Paid

Upgrade to paid to play voiceover

I’ve noticed something curious in recent conversations with fellow educators: a kind of wishful waiting. Some colleagues speak hopefully about the eventual bursting of the “AI bubble,” imagining a return to familiar pedagogical ground once venture capital loses interest and major AI companies fold or scale back. The underlying assumption feels almost comforting: that this technological disruption might prove temporary, a passing frenzy like so many tech booms before it.

But this hope rests on a fundamental misunderstanding of where AI capabilities currently exist. The common perception frames artificial intelligence as something housed primarily in corporate data centers, accessible only through subscription services and dependent on the continued operation of companies like OpenAI, Anthropic, and Google. If these companies disappeared tomorrow, the thinking goes, their AI systems would disappear with them.

The reality looks different. While most educational discussions center on ChatGPT, Claude, and Gemini, a parallel ecosystem has quietly matured to where it matches, and in some ways exceeds, the capabilities of these commercial services. This open-source, locally executable AI infrastructure cannot be uninvented or recalled. The tools exist, the models are distributed, and the knowledge of how to use them spreads daily. More importantly, these systems run not on remote servers but on consumer hardware that many professionals already own.

What “Local AI” Actually Means

When I refer to local AI, I mean models that execute entirely on a personal computer without requiring internet connectivity or cloud services. These aren’t simplified versions of “real” AI. Current open-source language models with hundreds of billions parameters rival the capabilities of commercial offerings, while image, video, and audio generation tools that once required expensive API access now run on capable consumer machines.

The hardware barrier has largely dissolved. Consider my own setup: a MacBook Pro with Apple’s M4 Max processor and 128GB of unified memory. This MacBook costs between $4,500 and $5,500, depending on configuration. And while this is certainly expensive, it is within reach of professionals and comparable to what many schools spend on teacher workstations. A similarly capable desktop PC with an NVIDIA RTX 4090 and sufficient RAM would cost approximately $3,500 to $4,500.

These aren’t exotic research machines. They’re consumer products available at any electronics retailer. The Mac’s unified memory architecture proves particularly significant here. Where a traditional PC separates system RAM from graphics memory (with the high-end RTX 4090 limited to 24GB of VRAM), Apple’s design provides a single, large memory pool accessible to both CPU and GPU. This architectural choice means a 128GB Mac offers approximately 96GB of usable video memory. This is enough to run language models that would require multi-GPU server configurations on traditional hardware.

The performance characteristics matter because they determine usability. Running Meta’s Llama 3.3 model with 70 billion parameters requires about 75GB of VRAM on my MacBook and generates text at roughly 350-400 words per minute. This is faster than most people read and more than sufficient for interactive work. I am using the 8-bit quantized version, which offers a strong balance of speed and fidelity. It runs at 6.5 tokens per second, while a more compressed 4-bit version can run even faster with slightly reduced output quality. These speeds feel responsive. The system doesn’t lag or stutter. It just works.

The Language Model Landscape

The center of this ecosystem revolves around open source user interface software such as LM Studio, a free application that manages local language model execution. Its interface resembles ChatGPT with a conversation window, a text input box, and a sidebar for managing chats. The difference lies entirely in what runs underneath. Instead of sending queries to OpenAI’s servers, LM Studio loads models directly into the computer’s memory and processes everything locally.

Keep reading with a 7-day free trial

Subscribe to The Augmented Educator to keep reading this post and get 7 days of free access to the full post archives.