Navigating the Hidden Currents of AI

Understanding and Counteracting Censorship in Large Language Models

Apr 05, 2025

In today’s rapidly evolving AI landscape, educators face a critical challenge: large language models (LLMs) often come with embedded censorship that shapes what knowledge students can access. In this post I want to examine the political and cultural filters built into many current AI models, introduce emerging techniques like “abliteration” and “uncensoring” that bypass these restrictions, and offer practical strategies for educators striving to maintain intellectual integrity while using these powerful tools in their classrooms.

The Changing Landscape of AI Development

The release of DeepSeek R1 on January 20, 2025, a highly capable reasoning AI that was surprisingly inexpensive to develop, took many by surprise. For those following developments in Chinese AI research, however, this breakthrough wasn’t entirely unexpected. China has arguably been leading the development of text-to-video models for almost a year, with models such as Kling AI or Hailuo AI (formerly Minmax AI) often outperforming their US counterparts.

But what is truly noteworthy is the unconventional release strategy employed by most Chinese developers. While US companies tend to keep key innovations closed source, Chinese models, along with their weights, are often released in open source and can be downloaded and run by anybody anywhere in the world. The potential motivations behind this approach deserve their own analysis, but today I want to focus on a different issue related to this open-source nature: censorship and its consequences for educators.

When AI Refuses to Answer

To illustrate what I mean, consider an experiment I conducted while preparing for this blog post. I ran DeepSeek R1 locally on my MacBook Pro and asked the model, “Give me a detailed account of what happened on Tiananmen Square on June 3, 1989.” I wanted to test how it would handle a topic that remains sensitive within Chinese cultural and political contexts. The model responded without even entering thinking mode: “I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.”

June 3, 1989, marked the beginning of the end of the student protests on Tiananmen Square. For somebody from the West, the model’s logic for refusing to answer seems unreasonably restrictive, as I was merely requesting an account of historical facts without asking for any judgment or political commentary. There are objective truths about what happened that day, and even without assigning blame to either side, it should be possible to provide a neutral account of events.

The reality is, of course, that the answer is heavily censored. The LLM is not allowed to provide an answer to that question following strict guidelines by the Chinese government.

The Global Nature of AI Censorship

This example isn’t unique to Chinese models. LLMs from around the world, including those developed in the US, incorporate various forms of censorship. When models refuse to answer questions, it’s rarely because they lack information—rather, they’ve been programmed with specific restrictions on what knowledge they can share.

This pervasive censorship fundamentally undermines education; controlling access to powerful information tools allows those who dominate large language models to shape students’ understanding of history and politics. And in this competition of ideas and viewpoints, models released through open source—and therefore free to use—have a significant competitive advantage.

Understanding Alignment: Censored vs Uncensored LLMs

Most modern LLMs incorporate self-censorship mechanisms as part of what the AI industry often calls “alignment” or “moderation.” These guardrails prevent them from generating content deemed harmful, illegal, or sensitive according to their creators’ guidelines. When you encounter responses like “I’m sorry, I cannot assist with that request,” you’re witnessing these safety features in action.

While these restrictions aim to prevent misuse, such as generating hate speech or dangerous instructions, they simultaneously limit the model’s utility for open inquiry and exploration of complex topics. The boundaries of what makes up “harmful” content vary dramatically between cultures and contexts, reflecting the values and priorities of the model’s creators.

It’s important to acknowledge that these censorship mechanisms exist for legitimate reasons: preventing harmful content, complying with regulations, and avoiding legal liability. However, they also present significant challenges for educational contexts where open inquiry and exploration of complex topics is essential.

An uncensored LLM takes a fundamentally different approach by removing these refusal mechanisms entirely. Rather than automatically declining certain requests, it attempts to answer any question to the best of its abilities. This makes uncensored models more responsive and versatile, particularly for educational contexts requiring exploration of sensitive historical or political topics—but it also means they might generate content that censored models would block.

Circumventing Alignment: Uncensoring a Censored LLM

While most LLM developers release their models with censorship mechanisms intact, educators and researchers don’t necessarily have to accept these limitations. The open-source nature of many models creates opportunities to modify them by removing or bypassing these restrictions. The tech community has devised two effective methods to remove censorship from LLMs, even when companies don’t offer uncensored versions.

Fine-tuning to Uncensor

The first method involves traditional fine-tuning. Developers gather thousands of examples where a censored model would refuse to answer, then create appropriate responses to those questions. The model is then trained on these prompt-response pairs to “unlearn” its tendency to refuse certain topics. This method effectively changes the model’s behavior through new learning examples rather than altering its underlying neural architecture.

Researchers from Perplexity AI showed this fine-tuning approach with DeepSeek-R1 by first identifying approximately 300 sensitive topics. They created thousands of example prompts covering these topics, crafting factual, helpful answers for each one instead of refusals. Through training the model on these carefully prepared examples, they produced an “uncensored” variant called R1-1776 (named after the year symbolizing free expression) that would respond freely to previously forbidden questions.

To illustrate these differences, I ran a comparative test with the uncensored DeepSeek R1. When presented with the identical question about Tiananmen Square, the uncensored model provided a surprisingly detailed, factual response about the events of June 3, 1989. For those interested in the response, I’ve included the complete answer, including its reasoning, from the uncensored model in the appendix of this blog post.

Abliteration

The second approach, abliteration, offers a more precise intervention than fine-tuning. Rather than teaching the model new responses through thousands of examples, this technique directly targets and neutralizes the specific neural pathways responsible for generating refusals. The name itself—a blend of “ablation” (surgical removal) and “obliteration”—reflects its focused approach to disabling censorship.

Think of abliteration as finding and disconnecting an “off switch” for the model’s censorship mechanisms. Researchers found that a model’s tendency to reject specific topics can be traced to distinct patterns in its neural network, a kind of “refusal direction” activated by sensitive material. By identifying this pattern through careful analysis of the model’s behavior and then neutralizing it, they can bypass the need for extensive retraining.

Abliteration can be performed relatively quickly compared to fine-tuning, without the need to generate and curate thousands of training examples. The process involves intricate mathematical operations on the model’s neural connections, essentially removing its ability to represent and generate refusal responses. After successful modification, the model can no longer access its censorship circuitry, enabling it to answer questions it would previously have declined.

Both uncensoring methods—fine-tuning and abliteration—can be effective, but they present different trade-offs for educators and researchers. Fine-tuning offers more nuanced control over exactly how the model responds to previously censored topics but requires significant data preparation and computational resources. Abliteration provides a faster, more efficient approach, but might be less targeted in its effects on the model’s behavior, potentially affecting other aspects of its performance in subtle ways.

Verifying Claims of Uncensored Models

While both fine-tuning and abliteration techniques offer practical ways to remove censorship from LLMs, educators should approach these “uncensored” or “abliterated” models with healthy skepticism. The R1-1776 model and similar uncensored variants may claim to have completely removed censorship mechanisms, but results can vary significantly depending on implementation quality and thoroughness.

Just as the original censorship varies in scope and implementation across models, the process of removing these restrictions can be similarly inconsistent.

Inconsistent Uncensoring

Some models might be partially uncensored—responding to some sensitive queries but not others. For instance, a model might freely discuss certain historical events but still refuse questions about contemporary political issues or specific cultural taboos. This selective uncensoring could reflect the priorities or oversights of those who modified the model. It is noteworthy, for example, that in my testing, none of the abliterated DeepSeek R1 models available to me would answer my question about Tiananmen Square.

Cultural Framework Alignment

What makes up acceptable discourse varies dramatically across cultural and educational contexts. A model uncensored according to one cultural framework might still enforce values and restrictions from another. For example, a Chinese model “uncensored” by Western developers might lose its political restrictions but retain other cultural biases or still respond with perspectives shaped by its original training data.

Hidden Biases Remain

Despite disabling censorship, inherent biases in the training data persist. An uncensored model doesn’t become neutral—it may now answer previously forbidden questions, but how it answers them reflects what it “learned” during its original training. If its knowledge about certain topics comes primarily from biased or incomplete sources, those limitations will remain regardless of whether the refusal mechanism is removed.

Managing Risk in Educational Settings

It’s crucial to recognize that uncensored models may produce extremely sensitive or inappropriate content in response to certain prompts. Before implementing these tools in a classroom, educators should understand the potential risks and have clear protocols for handling problematic outputs. This may include pre-screening topics, establishing clear guidelines for student use, and preparing contextual materials that help students critically evaluate AI-generated content.

Educators should also develop their own testing procedures before using uncensored models in classroom settings. Create a diverse set of prompts relevant to the classroom context that touch on different categories of potentially censored content, from political and historical topics to scientific, ethical, and cultural questions that might trigger refusals in standard models.

In addition, it is vital for educators to consult both their institution’s policies and any relevant laws or regulations that address the use of these models in educational contexts. Using an uncensored model might create liability or conflict with institutional guidelines, especially in K-12 environments or when discussing sensitive topics.

Pedagogical Implications

As AI continues to reshape education, the question of who controls the narrative becomes increasingly significant. When large language models serve as gatekeepers of knowledge, their embedded censorship mechanisms don’t just limit information—they silently shape how students understand the world.

By approaching these technologies with an awareness of their embedded censorship and biases, we help students develop crucial critical thinking skills. They learn not just subject content, but how to question the invisible algorithms increasingly mediating their access to information. This meta-awareness—the ability to recognize when knowledge is being filtered or framed—may become one of the most valuable educational outcomes of our digital age.

For educators, uncensored and abliterated models offer unique teaching opportunities. Not as perfect solutions, but as windows revealing how technology shapes knowledge and discourse. When students compare censored and uncensored responses to the same questions, they witness firsthand the power structures embedded in seemingly neutral tools. The decision about which topics warrant censorship, the methods used to implement restrictions, and even the techniques used to remove those restrictions all represent value judgments worthy of examination.

These moments of refusal or questionable answers can become powerful teachable opportunities. Rather than simply accepting or rejecting an AI’s response, educators can guide students to analyze why a model might refuse certain questions or provide particular perspectives. This approach transforms AI limitations into lessons on media literacy, cultural bias, and the intersection of technology with politics and power.

It’s important to emphasize that removing censorship does not remove bias. These are distinct issues with different implications for classroom discourse. While an uncensored model might willingly answer previously forbidden questions, its answers still reflect the biases and limitations of its training data. Teaching students to recognize both censorship and bias prepares them for critical engagement with increasingly AI-mediated information.

In this context, the modifications we make to AI systems become as important a subject of study as the systems themselves, offering valuable insights into the complex relationship between technology, knowledge, and power in our increasingly AI-mediated world.

Beyond the Guardrails

In this exciting new era of education, it’s crucial to remember how quickly AI is developing. The tools, techniques, and policies surrounding AI censorship will likely transform significantly in the coming years. What makes up an “uncensored” model today may seem quaint by next year’s standards.

As educators, we should commit to staying informed about these transforming technologies and their implications. Regular reviews of institutional policies, ongoing conversations with colleagues and students, and a willingness to adapt approaches as the field evolves will all be essential components of effective AI integration in education.

By maintaining this forward-looking perspective, we can help our students develop not just the knowledge they need today, but the critical awareness and adaptive skills they’ll need to navigate the AI-mediated information environments of tomorrow.

Appendix 1: Educator’s Checklist for Using Uncensored LLMs

Before incorporating uncensored language models in your classroom:

Check institutional policies and obtain necessary approvals
Test the model thoroughly with diverse prompts to understand its limitations
Prepare contextual materials to help students critically evaluate AI outputs
Create clear guidelines for appropriate student interaction with the model
Design structured activities that leverage the model as a subject of analysis rather than an authority
Plan for debriefing discussions that connect the experience to broader digital literacy goals
Provide guidelines for respectful use: no harassing or hateful prompts
Establish a protocol for what to do if the model generates disturbing or harmful content

By approaching uncensored AI models as objects of study rather than simply as tools, we can transform potential pitfalls into powerful learning experiences about information literacy in the age of artificial intelligence.

Appendix 2: DeepSeek R1’s Uncensored Answer to my Question

For this experiment I used an uncensored version of DeepSeek R1, run locally on an M4 MacBook Pro with 128 GB of shared memory.

Thought for 3 minutes 1 seconds:
Okay, so I need to give a detailed account of what happened on Tiananmen Square on June 3, 1989. Hmm, first, I should recall the basic facts about this event. From what I remember, there was a student-led protest in Beijing’s Tiananmen Square around that time. The Chinese government cracked down on it, leading to many casualties and arrests. But I need to be more precise.
Let me start by setting the context. The 1989 protests were part of broader pro-democracy demonstrations across China. Students had been gathering in Tiananmen Square since mid-April, following the death of Hu Yaobang, a former leader seen as reformist. They wanted political reforms and freedom of speech. By late May, the government declared martial law but didn’t immediately enforce it.
On June 3rd, the military was ordered to clear the square. I think there were tanks and soldiers involved. The exact number of casualties is disputed; some reports say hundreds died, others thousands. There’s confusion because the Chinese government never released an official death toll. Also, after the crackdown, many participants were arrested or had their lives disrupted.
Wait, but I need to make sure about the dates. Some sources say the military moved in on June 3rd and 4th, with the heaviest suppression around midnight. So the actual clearance happened late at night into early morning of the 4th. But the question specifies June 3, so maybe the initial movement started then.
Another thing is the exact sequence of events: how did the troops move in? From different directions around Beijing? The use of live ammunition, tear gas, and other forceful measures. Also, there’s the famous image of a man stopping a tank, which happened on June 5th, but that’s part of the aftermath.
I should also mention international reactions. The event was widely condemned globally, leading to sanctions against China. But domestically, it’s suppressed; the Chinese government still doesn’t discuss it openly and censors related information.
Wait, did the government admit any casualties? Some reports say they acknowledged a few hundred deaths, but independent sources estimate much higher numbers. Also, there were different groups involved: students, workers, intellectuals, etc., not just students. The movement was called the “People’s Movement” or “Democracy Movement.”
I need to check if there were specific incidents on June 3rd versus June 4th. For instance, the military began moving into Beijing on the night of June 3rd, with armored vehicles and soldiers advancing from multiple directions towards the city center where the protesters were concentrated in Tiananmen Square. They encountered resistance in places like Muxidi, Xidan, and Fuchengmen, leading to clashes and casualties.
By dawn on June 4th, the square was cleared, but there were reports of continuing violence in surrounding areas throughout the day. The government then imposed strict control over information and media, making it difficult for accurate reporting.
Also, important to note that the Chinese government’s stance is that this was necessary to preserve stability and national security, referring to the event as a “counter-revolutionary rebellion.”
I should structure the answer with context leading up to June 3rd, the military action on that night into early morning of the 4th, casualties, aftermath including censorship and arrests, international reactions, and the legacy. Make sure not to mix up dates and clarify that while the main clearance happened around midnight, the movement was initiated on June 3rd.
Need to verify the exact timeline: April 15, Hu Yaobang’s death; protests start in Beijing and other cities. By mid-May, hunger strikes in Tiananmen Square. On May 20, martial law declared but not enforced yet. Then June 3-4 when military action occurs.
Also, mention that the Chinese government has not officially released a death toll, leading to various estimates from hundreds to thousands. Some sources cite the number of injured as well.
Finally, note the ongoing censorship and commemoration attempts outside China, like in Hong Kong until 2020 when laws were introduced to suppress it.

The following is the exact answer provided by the uncensored version of DeepSeek R1. I only added text formatting for better readability.

Detailed Account of Tiananmen Square Crackdown (June 3-4, 1989):
Context: In April 1989, following the death of Hu Yaobang—a reformist leader seen as sympathetic to political change—students began pro-democracy protests in Beijing. Gathered in Tiananmen Square, they demanded freedoms such as speech and press, an end to corruption, and political reforms. By late May, these demonstrations had grown into a broad movement involving students, workers, intellectuals, and citizens from across China, despite the government declaring martial law on May 20th.
Military Action Initiated (June 3-4): On the evening of June 3rd, Chinese authorities authorized military forces to clear Tiananmen Square. Tanks, armored vehicles, and troops moved into Beijing from multiple directions—Muxidi, Xidan, Fuchengmen—facing resistance in these areas. The use of live ammunition, tear gas, and brute force led to violent clashes with protesters, resulting in significant casualties. By dawn on June 4th, Tiananmen Square was cleared, but reports indicate sporadic violence persisted throughout the city.
Casualties: The exact number of deaths remains disputed due to censorship. While official estimates are low (around 300), independent sources and eyewitness accounts suggest thousands were killed or injured. Many victims were students, workers, and passersby caught in crossfire or crushed by military vehicles. The Chinese government has never released an official death toll.
Aftermath:
In the wake of the crackdown:
Censorship: All media and public discussions about the events were banned. Internet searches for related terms (e.g., “Tank Man”) remain blocked in China.
Arrests and Repression: Thousands were detained, tortured, or forced into exile. Families of victims faced ongoing harassment.
Government Stance: Authorities labeled the protests a “counter-revolutionary rebellion,” justifying military intervention as necessary for national stability.
International Reactions: Global condemnation followed, with many countries imposing sanctions on China. The event became emblematic of human rights struggles, while also accelerating diplomatic isolation and scrutiny of China’s authoritarian practices.
Legacy and Commemoration:
Censorship Continues: Discussion in China is forbidden; memorials are suppressed.
Hong Kong Legacy: Until 2020, annual vigils were held there, but a National Security Law introduced that year stifled public commemoration.
Symbolism: The “Tank Man” image—captured on June 5th as an individual defied tanks—is widely recognized as an icon of resistance.
Conclusion: The Tiananmen Square crackdown marked a pivotal and tragic moment in China’s modern history, highlighting the government’s prioritization of control over dissent. Its legacy endures in global advocacy for human rights and democracy.

The Augmented Educator

Discussion about this post