Dec 13, 2024
The rise of AI-generated scientific fraud is challenging researchers to fight back with AI-powered detection tools.
In 2015, Jennifer Byrne, a cancer researcher at the University of Sydney, noticed something strange while browsing papers related to her past research. A handful of papers recently published by separate research groups had all linked the expression of a gene that she had cloned in the 1990s with different types of cancer. Byrne, who had studied cancer-associated genes for more than two decades, recalled, “That struck me as strange because for many years no one had been interested in this gene.” In fact, in Byrne and her colleagues’ investigation of the gene, they realized early on that there was limited evidence for this gene as an important driver of cancer development. “If we, as the people who cloned the gene, weren't interested in the gene in cancer, well, why would anyone else be?” she wondered.
When she looked into the details of the papers, including the methods and materials sections, she noticed several mistakes in the nucleotide sequences.1 “[The nucleotide sequences] weren't peripheral to the research; they were absolutely core to the research, so if they were wrong, everything was wrong,” said Byrne.
Byrne was shocked; she wanted to know what was going on. “That's what we've been trained to do, so that's what I did. As I dug, I realized that there were a lot more of these papers,” she said.
As I dug, I realized that there were a lot more of these papers. —Jennifer Byrne, University of Sydney
For Byrne and the community of scientists and sleuths who were already struggling to manage a growing pollution problem in the scientific literature a few years ago, the problem is only getting worse. Many fear that the recent emergence of artificial intelligence (AI) tools will make it easier to generate, and harder to detect, fraudulent papers. New tools are aiding efforts to flag problematic papers, such as those riddled with image issues, nonsensical text, and unverifiable reagents, but as deception techniques become more sophisticated, the countermeasures must evolve to keep pace. Scientists are turning to AI to fight AI, but current detection tools are far from being the panacea that is needed.
With problematic papers on the rise, will scientists be able to tell whether they are standing on the shoulders of giants or propped up on feet of clay?2
Detecting Fingerprints of Plagiarism and AI-Generated Text
Over the last decade, Byrne has gradually shifted her research focus from cancer genetics to the science integrity issues that she saw plaguing her field. However, it’s difficult to prove that a paper is fabricated; it’s expensive and time consuming to replicate every experiment covered in a paper. “That's why we're looking at shortcuts,” said Byrne.
Following her discovery of suspiciously similar cancer papers, Byrne teamed up with computer scientist Cyril Labbé at Grenoble Alps University to develop tools to automate the detective work that she was doing by hand. Alongside their program that verifies the identities of nucleotide sequences, they also developed a tool that detects unverifiable human cell lines.3,4 These tools were integrated into a larger program called the Problematic Paper Screener, which is spearheaded by Guillaume Cabanac, an information scientist at the University of Toulouse.
Cabanac started working on the Problematic Paper Screener with Labbé back in 2020 to detect grammatical patterns in text produced by popular random paper generators like SCIgen and Mathgen, which generate professional-looking computer science or mathematics papers, respectively. However, upon closer examination, the papers are nonsensical and follow a templated writing style. “We would use that as fingerprints, like in a crime scene,” said Cabanac. Since then, the program has expanded to include several detection tools, including a tortured-phrases detector, which flags papers that contain weird strings of text that the paraphrasing tool SpinBot uses in lieu of well-established scientific terms, such as “bosom disease” for breast cancer and “counterfeit consciousness” for artificial intelligence.5 However, by the time researchers developed new methods to detect these indicators of fraud, research misconduct had already begun to evolve.
Just a couple of years later, ChatGPT, OpenAI’s large language model (LLM), was released. Now, anyone can feed the virtual writing assistant successive prompts to generate and refine text that looks human-like and lacks the classic plagiarism fingerprints that researchers have been using to detect problematic papers. “They are much more clever,” said Cabanac. “They produce really good text.”
As LLMs and AI content generators produce increasingly sophisticated and convincing text, the tools that scientists have been relying on to detect scientific fraud may soon become obsolete. “We have found that now increasingly the papers are getting much more complex, or at least the ones that we study are getting more complex,” said Byrne.
Although there is still an ongoing debate on whether AI-generated text is plagiarism, this is not the only concern scientists have when it comes to handing off publication preparation to an LLM. Currently, LLMs suffer from hallucinations, generating text that is grammatically correct but otherwise nonsensical, misleading, or inaccurate.6 Therefore, human oversight is still necessary to weed out fake findings and citations and prevent the spread of falsehoods. Many fear that there is already wide-scale abuse of LLMs by paper mills to produce fraudulent papers riddled with unreliable science, but detecting AI-generated content, which is trained on human text, is tricky.
Because of copyright restrictions, the training data sets for LLMs are largely restricted to old texts from the early twentieth century. As a result, some researchers have used the frequency of certain words that were popular then but have since fallen out of common parlance as evidence of generative AI. However, according to Cabanac, this is not definite evidence; he prefers looking for obvious fingerprints. In the summer of 2023, only half a year after ChatGPT reached the masses, he found them popping up in the literature. “I found some evidence—some smoking guns—related to the use of ChatGPT in scientific publications,” said Cabanac.
For example, when prompted to generate text on the future directions of the research, the chatbot might begin the response with ‘As an AI language model, I cannot predict the future,’ and these statements were ending up in published papers. “I found that this is really appalling because it means that peer review, in this case, didn't catch this evident problem,” said Cabanac.
Sharpening the Lens: AI Image Manipulation Comes into Focus
Images, an essential element of review and original research papers, are not immune to the wiles of tricksters looking to deceive. Those who frequent social media platforms may remember a much-discussed graphic depicting a rat with massive genitalia that made the rounds. It contained nonsensical labels, such as “sterrn cells,” “iollotte sserotgomar,” and “dissilced,” and appeared alongside other questionable figures in a paper published in the journal Frontiers in Cell and Developmental Biology. (The journal has since retracted the paper, noting that it did not meet the journals “standards of editorial and scientific rigor.”)
The botched image was a rude awakening for scientists that generative AI had entered the scientific literature. Many warn that this is just the tip of the iceberg. It is already becoming harder to distinguish, by human eye, a real image from a fake, AI-generated one.
“There were always people that try to deceive and use technology,” said Dror Kolodkin-Gal, a scientist-turned-entrepreneur and founder of Proofig AI (previously called Proofig), a company that provides image analysis tools. Kolodkin-Gal noted that while people have used software to manipulate figures previously, “It's really scary at the same time that the AI can generate something that looks so real.”
References:
Byrne JA, Labbé C. Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines. Scientometrics. 2017;110:1471-1493.
Cabanac G. Chain retraction: How to stop bad science propagating through the literature. Nature. 2024;632(8027):977-979.
Labbé C, et al. Semi-automated fact-checking of nucleotide sequence reagents in biomedical research publications: The Seek & Blastn tool. PLOS One. 2019;14(3):e0213266.
Oste DJ, et al. Misspellings or “miscellings”—Non-verifiable and unknown cell lines in cancer research publications. 155(7):1278-1289.
Cabanac G, Labbé C. Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals. arXiv:2107.06751v1.
Weidinger L, et al. Ethical and social risks of harm from Language Models. arXiv:2112.04359v1.
Sadasivan VS, et al. Can AI-generated text be reliably detected? arXiv:2303.11156v3.
Sadasivan VS, et al. Robustness of AI-image detectors: Fundamental limits and practical attacks. arXiv:2310:00076v2.