The Speed of Human Oversight

Why AI-Generated Development Remains Limited by Human Understanding

Feb 27, 2026

Article voiceover

0:00

-14:28

A few weeks ago, I published a post here about Moltbook, a social networking platform designed for AI agents to post, read, and coordinate tasks, and the security disaster that followed from building it without meaningful human oversight. Moltbook is a cautionary tale about what happens when AI generates faster than humans can verify.

In January 2026, something similar played out from the opposite direction: the team behind the open-source project curl, a ubiquitous data transfer tool that runs on billions of devices, permanently ended their bug bounty program. It was shut down because AI agents were submitting security reports faster than the curl maintainers could evaluate them. And, crucially, most of these security reports suffered from inherent flaws because the people overseeing the generating AI agents failed to provide adequate human oversight themselves.

One incident shows the cost of skipping human review during development; the other shows the cost of overwhelming it afterward. Together, these examples expose a clearer picture of where AI-assisted software development actually stands in 2026 and consequently offer a useful lens for thinking about the skills educators need to cultivate.

These two events did not occur in isolation. They are symptoms of a greater structural constraint that is coming into focus and that I think we have been slow to name clearly. I want to call it “The speed of human oversight.”

When human developers ignore security constraints

As I detailed in my earlier post, OpenClaw (previously Clawdbot, then Moltbot) was released in late 2025 as an open-source framework for giving AI agents expansive access to a user’s terminal, file systems, email, and execution environment. The project went viral. By January 2026, over 30,000 instances had been exposed to the open internet.

The security architecture was, to use the most charitable possible description, minimal. Within a week of widespread deployment, multiple critical vulnerabilities were disclosed. The most severe allowed attackers to achieve full control over the OpenClaw gateway and run arbitrary commands on the host machine. This vulnerability was simple to exploit: since the agent navigated web pages and read messages on its own, an attacker just had to send it to a harmful website. The agent’s authentication token would then leak, handing over administrative control.

What strikes me about this is not just that the vulnerabilities existed, as security flaws in fast-moving open-source projects are not unusual. What strikes me is the apparent absence of even basic security thinking during development. The architecture suggests a developer who moved at the speed of generation without pausing for the kind of architectural review that experienced security engineers treat as non-negotiable.

The Moltbook exposure compounded this. Cybersecurity researchers at Wiz identified a misconfigured Supabase database belonging to the Moltbook team that allowed full, unauthenticated read and write access to all platform data. The exposure included 1.5 million API authentication tokens, 35,000 email addresses, over 17,000 personal identity records, and private direct messages between agents that contained plaintext OpenAI API keys and third-party SaaS credentials.

Consider what this means in practical terms. Users had shared private credentials with their agents in the natural course of using the platform, exactly as the product invited them to do. Those credentials were then exposed because a database was misconfigured. The AI had written code; no one with sufficient expertise had reviewed it.

When human attention becomes the bottleneck

The curl decision illustrates the same constraint from a different angle.

Bug bounty programs work on a simple premise: pay researchers to find vulnerabilities, and you get a crowdsourced security audit at a manageable cost. Daniel Stenberg, curl’s lead developer, announced the program’s end on HackerOne in January 2026, citing an unmanageable flood of AI-generated vulnerability reports. The submissions were, in his characterization, “made-up lies” — plausible-sounding but fictitious security claims produced by language models that are very good at generating persuasive technical prose.

In the week before the program closed, the curl team received seven formal submissions. Some identified minor bugs. None described an actual security vulnerability. All required significant human effort to safely triage and dismiss. Stenberg called this an effective distributed denial-of-service attack on human attention. That framing is precise and worth reflecting upon. When the cost of generating a plausible-sounding security report drops to zero, the humans responsible for evaluating those reports become the bottleneck. The signal-to-noise ratio collapsed until the verification system itself became untenable.

This is not a story about bad actors, exactly. It is a story about what happens when generation capacity scales faster than verification capacity. The two are not equivalent, and they do not scale together.

The productivity numbers don’t add up

The curl story might feel like an edge case involving motivated misuse. But the more mundane reality of AI-assisted development is pointing in a similar direction.

A 2025 randomized controlled trial conducted by the research organization METR examined the impact of AI coding tools — specifically Cursor Pro with Claude 3.5 — on experienced open-source developers. The participants averaged five years of experience and over 1,500 commits on large, mature codebases. The result was counterintuitive: developers assigned to use AI tools took 19% longer to complete their tasks than the control group working without AI assistance.

More revealing was the perception gap. Before the study, those developers predicted AI would make them 24% faster. Even after the trial — after they had objectively slowed down — they reported believing the tools had made them 20% faster.

This discrepancy is not mysterious. Prompting an AI feels fast. The initial generation phase requires little cognitive effort compared to writing code from scratch. What developers systematically underestimate is the time required to read, verify, debug, and integrate the AI’s output into an existing system they need to understand architecturally. A 2025 analysis of 470 real-world pull requests found that AI-generated code contained 1.7 times more issues overall and roughly 2.7 times more security vulnerabilities than human-written code.

The generation is fast. The verification is slow. And verification is not optional.

Who is liable when the agent leaks your keys?

This brings me to the question that I think will define how enterprises actually engage with agentic AI systems over the next several years: when an AI agent causes a security breach, who is responsible?

In the case of OpenClaw, the answer is contractually clear and practically uncomfortable. Peter Steinberger released the software under the MIT License, which explicitly disclaims all liability. OpenAI’s terms of service limit aggregate liability to the amount paid in the past twelve months, with a nominal floor of $100. If an OpenClaw agent deployed by an employee leaks proprietary API keys on Moltbook, the legal exposure falls, in the first instance, on the user — the person or organization that ran the software.

For individual hobbyists, that answer, while unsatisfying, is at least coherent. You chose to grant an AI agent root access to your system and connect it to an unaudited social platform. The risk was yours.

In a professional context, this answer does not hold. Enterprises operate under contractual obligations, regulatory frameworks, and fiduciary duties that cannot simply be transferred to an MIT License. GDPR compliance, for instance, is not optional. Indemnification is not discretionary. If a deployed AI agent exposes client data, the enterprise faces regulatory fines, civil liability, and reputational consequences that no open-source disclaimer can absorb.

Educational institutions face analogous pressures: FERPA obligations, data privacy policies for minors, and vendor procurement processes all exist precisely because individual users cannot absorb institutional risk alone.

Legal scholars are increasingly examining these questions through the lens of agency law and what some call “common enterprise theory.” The basic principle is that the organization deploying the AI agent is the principal; the agent acts on its behalf. When an agent causes harm within its designated scope of activity, the deploying organization bears vicarious liability. This framework is already appearing in litigation: the Mobley v. Workday case, in which a plaintiff alleged that an AI-based applicant screening tool discriminated against him on grounds of race and age, demonstrated that deploying algorithms without meaningful human review does not insulate a company from anti-discrimination law.

The practical implications are significant. Enterprise procurement teams are now demanding indemnification clauses, documented governance frameworks, cyber liability insurance requirements, and explicit human-in-the-loop provisions from AI vendors. Each of these requirements takes time to negotiate, review, and operationalize. That time is human time, bounded by human capacity. And it cannot be automated away.

The constraint that was always there

What I am describing is not a failure of generative AI as a technology. The models can write code. They can write it quickly, and in many contexts, they write it well. The constraint is something else.

Software engineering, understood properly, is not the act of producing syntax. It is the discipline of building systems that are correct, secure, maintainable, and legally defensible. Every one of those properties requires human judgment: the judgment of someone who understands what the code does, why it was written the way it was, what it connects to, and what it would do if it failed. That judgment cannot be offloaded without cost.

Research on cognitive load suggests that heavy reliance on AI assistance tends to degrade what engineers call tacit knowledge — the internalized, architectural understanding of a system that allows experienced developers to reason about it, debug it, and extend it without having to reconstruct every assumption from scratch. When developers accept AI output without working through its logic, they accumulate what senior engineers call “trust debt”: a deferred reckoning with code they have deployed but do not truly understand. When that reckoning arrives, as it did for Moltbook’s users, it can arrive all at once.

The parallel for educators is immediate, and I suspect most readers of this post felt it before I named it. When students use AI to generate an essay or a piece of code without engaging with the underlying reasoning, they accumulate the same kind of trust debt. The output may look correct. The student may even feel confident. But when the exam requires them to extend that argument, or when the assignment changes the parameters, the gap between apparent and actual understanding becomes visible. The Dunning-Kruger dynamic that researchers observe in AI-assisted developers maps directly onto patterns that teachers have been noticing in their classrooms for the past two years.

What this means for how we think about acceleration

The prediction that AI would accelerate software development was not wrong about the generative component. But it was incomplete about everything else.

Generation is only one phase of the software lifecycle. Verification, security review, architectural comprehension, legal due diligence, and debugging make up the rest. And these phases remain substantially bounded by human cognitive capacity. The curl bug bounty collapse shows what happens when generation capacity floods a verification system. The Moltbook breach shows what happens when generation proceeds without verification at all.

For those of us in education, the implication is not that AI tools should be avoided. It is that the pedagogical emphasis needs to shift. If human verification capacity is the bottleneck for AI-assisted work, then we must focus on developing that capacity in our students. The question is no longer only “can you produce this?” It is “do you understand it well enough to defend it, correct it, and own it?”

This means designing learning environments that permit AI assistance but ensure independent verification of foundational knowledge. It means treating the verification step as the intellectually demanding one, not the generation step. And it means being honest with students about what the research actually shows: that the feeling of productivity AI provides is often disconnected from the reality of comprehension. The developers in the METR study felt faster. They were not.

AI does not accelerate development, or learning, at the rate many predicted. Not because the models are incapable, but because everything downstream of generation is still bounded by what we, as humans, can understand and verify. That boundary is not a temporary limitation waiting to be engineered away. It is, in software as in education, the foundation of anything that deserves to be trusted.

The images in this article were generated with Nano Banana Pro.

Share The Augmented Educator

P.S. I believe transparency builds the trust that AI detection systems fail to enforce. That’s why I’ve published an ethics and AI disclosure statement, which outlines how I integrate AI tools into my intellectual work.

The Augmented Educator

Discussion about this post

Ready for more?