Case studies and benchmarks from real research
Evaluating the Reference Checker
We tested our reference checker on a held-out set of 476 citations containing both real and AI-fabricated references. The test set is drawn from research published in Nature Scientific Reports on LLM hallucinations.
97.5%
464/476 correct
98.6%
Low false positives
97.3%
Real refs found
98.0%
Balanced measure
The benchmark includes both legitimate citations from published research and AI-fabricated references generated by GPT-3.5 and GPT-4.
Fabricated citations include fully invented papers, fake authors, and non-existent journals—typical LLM hallucination patterns.
Our reference checker combines multiple verification approaches:
A citation is marked as "not found" only after exhausting all available sources.
291
True Positive
8
False Negative
4
False Positive
173
True Negative
98.6% precision — minimal false alarms
Only 4 of 177 fabricated citations evaded detection. Low false positives mean researchers can trust flagged references warrant investigation.
97.7% of fabricated citations caught
173 of 177 LLM-hallucinated references—including those with fake DOIs and plausible metadata—were correctly identified.
97.3% of real citations verified
291 of 299 legitimate references were successfully matched and linked to their original sources—even with formatting variations.
I. Wilmut, ..., K. H. Campbell. Viable offspring derived from fetal and adult mammalian cells. Nature, 385(6619), 810-813.
The famous Dolly the sheep cloning paper, correctly verified.
J. Smith. The impact of migration on the health of older adults. Journal of Gerontology: Social Sciences, 2015, 70(4), 497-505.
Generic author name and plausible-sounding title, but completely fabricated by GPT.
Y. Kang, J. Kim. A Comparison of the Environmental Impact of Molten Salt Reactors and Conventional Nuclear Fission Reactors. Journal of Cleaner Production, 2021, 288, 124959.
Includes a fake DOI that looks legitimate, hallucinated by the LLM.
S. E. Carrell, J. E. West. Does professor quality matter? Evidence from random assignment of students to professors. Journal of Political Economy, 2010, 118(3), 409-432.
Influential education economics paper, verified through academic databases.
World Health Organization (WHO). Global oral health data bank. Geneva: World Health Organization, 2013.
Institutional reports often lack DOIs and standard metadata, making them harder to verify.
Y. J. Lee. Enhancing students' communication skills in the science classroom through socio-scientific issues-based instruction. International Journal of Science Education, 2017, 39(4), 414-434.
A paper with the exact title and journal exists but with different authors.




Upload a paper or grant and receive one free review—no credit card required.
Get a Free Review