Tutorial: Hallucination-Proof References
A practical workflow from LLM to reference manager
A brief housekeeping note before we begin. This post departs from my usual publishing practice on The Augmented Educator in two ways.
First, it is free from day one. I am planning to implement some adjustments to my regular posting schedule, including shortening the window between early access for paid subscribers and general availability; more on that in the coming week. Second, instead of my usual, more theoretical writings and opinion pieces, this post offers a hands-on tutorial.
In the following, I want to address a question I receive more often than almost any other.
The question goes something like this: how do you make sure that the references a Large Language Model (LLM) such as Claude or ChatGPT gives you are not hallucinated? And, perhaps more importantly, how do you build a seamless workflow that begins with an LLM conversation, produces accurate citations, and ends with those references sitting properly formatted in a reference manager such as Zotero, EndNote, or Mendeley, ready to be used for a traditionally written research paper or dissertation?
What usually surprises people is how straightforward the answer is. The remedy is, in fact, simple enough that whenever I hear complaints about hallucinated references, I cannot help but wonder whether we are looking at a failure of generative AI or simply a failure of AI literacy. In this tutorial, I therefore want to show how to ensure citation accuracy, so that you can integrate LLMs into your research practice safely and confidently.
Background: why standalone LLMs hallucinate references
By now, it is fairly well known that asking a standalone LLM for sources is not the same as consulting a library or even doing a web search. The LLM is generating text that has the statistical shape of a citation, assembled from patterns absorbed during training. Usually, those patterns correspond to an actual paper, but often enough they do not.
This is not a failure of the LLM, but a consequence of its design. The model does not retrieve information; it completes patterns. And while recent models have undoubtedly become more accurate, the fundamental problem remains, and will continue to do so for the foreseeable future.
The consequences are well documented. The most famous example remains Mata v. Avianca, the 2023 case in which two New York attorneys submitted a legal brief containing six judicial opinions that ChatGPT had simply invented, complete with plausible case names and docket numbers. Judge P. Kevin Castel of the Southern District of New York sanctioned the lawyers, and the case became a prime example of what happens when generated text is mistaken for retrieved fact.
Careful prompting can reduce the problem to a degree, but it cannot eliminate it. Instead, you need to change where the references come from. This means connecting the LLM to a real scholarly database, so that every citation it returns carries a verifiable link back to its source.
That capability is now mature, widely available, and, for most purposes, free.
What you need
For this tutorial, you need three components.
A Consensus account. A free account is perfectly sufficient. If you have not yet encountered Consensus, it is a generative AI tool with one decisive feature: it sits on top of a database of actual research literature. Consensus draws on a massive academic search index of over 220 million peer-reviewed research papers, medical journals, and scientific documents, built primarily on Semantic Scholar and OpenAlex and augmented by Consensus’s own web crawler. When Consensus returns a paper, that paper exists.
An LLM that can integrate Consensus as an external service. At the time of writing, this means either a Claude model through the MCP connector, or ChatGPT through a ChatGPT app. I will use Claude in my example.
Reference management software that can receive Consensus data. The easiest option, and the one I will demonstrate, is Zotero. This is because of the Zotero Connector browser extension, which converts Consensus web pages directly into properly formatted Zotero entries. If you use EndNote or Mendeley, everything in this post still works; it simply involves the additional step of exporting references from Consensus in RIS format and importing the RIS file into your reference manager.
The workflow, step by step
Step 1: Enable the Consensus connector in Claude
Go to the “Customize” section of Claude and add Consensus to your connectors. You will need to log in to Consensus to establish the connection, but a free account will work. If you are using ChatGPT, add the Consensus app in ChatGPT’s “Apps” section instead.
Step 2: Enable the Zotero Connector in your browser
Make sure Zotero is running on your machine and the Zotero Connector extension is enabled in your web browser. If you work with EndNote or Mendeley, skip this step. You can also skip this step if the Zotero Connector does not cooperate with your browser. The RIS route described in Step 5 will work with Zotero as well.
Step 3: Query Consensus from inside Claude
For this example, I decided to search for foundational research relevant to the phenomenon of “Corporate AI Psychosis,” a topic I recently explored on this Substack. I suspected the topic might hold genuine research potential beyond the blog post, and I wanted to find out.
Here is the exact prompt I used:
Attached is a blog post I published on my Substack, ‘The Augmented Educator.’ I am investigating whether I should turn the main idea of ‘Corporate AI Psychosis’ into an academic research paper. For this purpose, I need an understanding of the general research landscape on this topic. Using the Consensus app, identify the five most influential academic papers related to the concept of ‘Corporate AI Psychosis.’ List the papers along with their Consensus links.
The last sentence of that prompt is critical. It asks Claude specifically to include the Consensus links to any articles it finds. Those links are what will carry each reference, intact and verifiable, into your reference manager. Also note that Claude will ask whether it may use the Consensus connection, and you need to approve it in order to proceed.
After only a few seconds, Claude returned the following answer to my prompt:
I searched Consensus across the two literatures your concept bridges: the clinical “AI psychosis” research and the organizational research on executive judgment and AI adoption. One thing worth knowing up front: “Corporate AI Psychosis” itself does not yet exist as a term in the academic literature — which is actually good news for your paper, since it means the conceptual bridge you’re proposing is open territory. The five most influential papers anchoring that territory, ranked by a mix of citation impact and conceptual centrality:
Artificial intelligence and the future of work: Human-AI symbiosis in organizational decision making (Jarrahi, 2018, Business Horizons, ~1,742 citations). The most cited work on AI in organizational decision-making, arguing for augmentation over replacement. Your essay’s closing argument — that education must build human capability rather than outsource it — is essentially the Jarrahi position, so this is your organizational-theory anchor.
Will Generative Artificial Intelligence Chatbots Generate Delusions in Individuals Prone to Psychosis? (Østergaard, 2023, Schizophrenia Bulletin, ~53 citations). The foundational paper for the clinical concept you’re metaphorically scaling up — the first formal hypothesis that chatbots could fuel delusions. Any paper on “Corporate AI Psychosis” needs this as the origin point of the source metaphor.
Artificial intelligence-associated delusions and large language models: risks, mechanisms of delusion co-creation, and safeguarding strategies (Morrin et al., 2026, The Lancet Psychiatry). This appears to be the Lancet Psychiatry piece your essay already invokes for the “digital folie à deux” framing — it lays out the mechanisms of delusion co-creation (validation, amplification, blurred reality boundaries) that you’re transposing from individual to boardroom.
Delusional Experiences Emerging From AI Chatbot Interactions or “AI Psychosis” (Hudon et al., 2025, JMIR Mental Health, ~28 citations). The most comprehensive theoretical treatment of AI psychosis to date, explicitly developing the “digital folie à deux” concept and the role of algorithmic reinforcement of belief-confirming content — the exact mechanism your essay maps onto the executive echo chamber.
Managerial overreliance on AI-augmented decision-making processes (Keding & Meissner, 2021, Technological Forecasting and Social Change, ~134 citations). The closest empirical analog to your thesis on the corporate side: an experiment with 150 senior executives showing they trust AI-based advice more than human advice and overrate decisions made with it. This is “corporate AI psychosis” measured in a lab, minus the label.
A couple of runner-ups worth keeping in your file: Exploring the influence of CEO overconfidence on innovation in artificial intelligence technology (Zheng et al., 2025) directly links executive cognitive bias to AI strategy, and Technological folie à deux: feedback loops between AI chatbots and mental health (Dohnány et al., 2026, Nature Mental Health) is the most recent high-profile statement of the feedback-loop mechanism.
The shape of the landscape suggests your paper’s contribution would be the synthesis itself: psychiatry has the delusion-reinforcement mechanism, management science has executive overconfidence and AI overreliance, but nobody has formally connected them at the organizational level. If you’d like, I can run a follow-up search on organizational groupthink and echo chambers specifically, to see whether anyone has applied shared-delusion frameworks to firms.
Two observations about this result. First, I used Claude Fable 5 for this query, but earlier testing showed that the specific model makes relatively little difference. Claude 4.8 or even earlier versions produce essentially the same output. This is because the actual search happens on the Consensus side of the connection. The LLM is the interface and not the librarian.
Second, I limited the search to the five most relevant articles purely so the full answer would fit in this post. Consensus can return far more, particularly on a paid plan. When I repeated the identical query without the five-article restriction, Claude listed a total of 47 relevant research papers, neatly organized into five categories, and every one of them with a correct Consensus link.
Step 4: Follow the links and inspect the articles
Clicking any Consensus link opens that article’s page on the Consensus site, with all the details the database holds. One caveat: some publishers do not permit third parties to display abstracts, so the amount of visible information varies from paper to paper. What does not vary is accuracy, because everything shown is pulled directly from a proper research database.
Note the option to access the PDF where one is available. This respects licensing requirements properly. In my case, the article shown was available through my university’s library; clicking the link routed me directly to the PDF within our library system.
Step 5: Move the article into your reference manager
With Zotero, moving the article into its database is a matter of a single click on the Zotero Connector button in your browser. The Consensus article page is structured such that the Connector can harvest everything needed to populate the Zotero record.
If you use EndNote or Mendeley, add the article to your Consensus library and export the library in RIS format, then import the RIS file into your reference manager. This route works for Zotero too if you prefer it.
Step 6: Verify and proceed
The reference now sits in your reference manager, properly formatted, with every important data field populated, including the DOI. If you have access to the PDF, attach it, and you are ready to work with the reference exactly as you would with one found through any traditional search.
A final note on the mechanics: the prompt I used was merely an example. You can invoke Consensus in any prompt on any topic. If you intend to move the resulting references into a reference manager, just remember the one non-negotiable instruction: require the LLM to return the exact Consensus links.
What this workflow does, and what it does not do
The obvious benefit of this method is that it dramatically accelerates the opening phase of a research project. In my example, I ended up with 47 references, all highly relevant to my research question, in a couple of minutes. Assembling a comparable initial selection by traditional means would have taken days, possibly weeks.
But the workflow only guarantees that the references are real, not that they are a perfect fit for your purposes, and certainly not that you understand them.
For some of the 47 articles I got, Consensus had no access to the abstract and inferred relevance from the article’s metadata, particularly where and how it has been cited. In my experience, this is still remarkably accurate. But as with everything in generative AI, nothing is 100% certain. Do not hand the resulting pile of papers back to an LLM for analysis without reading them yourself. The reading is not an inconvenience in the research workflow; the reading is the research.
An experiment in format
Let me know in the comments whether you found this tutorial useful. I am also curious how generative AI currently figures in your own research practice: where it genuinely saves you time, and where your workflow still feels improvised. If there is enough interest, I will follow up with further tutorials on turning generative AI from an occasional shortcut into a genuine enhancement of your everyday work.
The hero image in this article was generated with Nano Banana 2.
P.S. I believe transparency builds the trust that AI detection systems fail to enforce. That’s why I’ve published an ethics and AI disclosure statement, which outlines how I integrate AI tools into my intellectual work.






