top of page

All Science journals will now do an AI-powered check for image fraud

A set of four microscopic histology images showing labeled tissue sections with areas marked as "Liver" and "Met" (likely metastasis).

Jan 5, 2024

Science journals will now use AI-powered tools like Proofig AI to detect image manipulation and uphold research integrity. This marks a major step in preventing fraud, though some challenges remain.

On Thursday, the research publisher Science announced that all of its journals will begin using commercial software that automates the process of detecting improperly manipulated images. The move comes many years into our awareness that the transition to digital data and publishing has made it comically easy to commit research fraud by altering images.


While the move is a significant first step, it's important to recognize the software's limitations. While it will catch some of the most egregious cases of image manipulation, enterprising fraudsters can easily avoid being caught if they know how the software operates. Which, unfortunately, we feel compelled to describe (and, to be fair, the company that has developed the software does so on its website).


Fantastic fraud and how to catch it


Much of the image-based fraud we've seen arises from a dilemma faced by many scientists: It's not a problem to run experiments, but the data they generate often isn't the data you want. Maybe only the controls work, or maybe the experiments produce data that is indistinguishable from controls. For the unethical, this doesn't pose a problem since nobody other than you knows what images come from which samples. It's relatively simple to present images of real data as something they're not.


To make this concrete, we can look at data from a procedure called a western blot, which uses antibodies to identify specific proteins from a complex mixture that has been separated according to protein size. Typical western blot data looks like the image at right, with the darkness of the bands representing proteins that are present at different levels in different conditions.


Note that the bands are relatively featureless and are cropped out of larger images of the raw data, divorcing them from their original context. It's possible to take bands from one experiment and splice them into an image of a different experiment entirely, fraudulently generating "evidence" where none exists. Similar things can be done with graphs, photographs of cells, and so on.


Since data is hard to come by and fraudsters are often lazy, in many cases, the original and fraudulent images are both derived from data used for the same paper. To hide their tracks, unethical researchers will often rotate, magnify, crop, or change the brightness/contrast of images and use them more than once in the same paper.

Not everyone is quite that lazy. But this image recycling is remarkably common and perhaps the most frustrating form of research fraud. All the evidence is in the paper, and it is usually easy to see once it's pointed out. But it can be remarkably difficult to spot in the first place.


That "spot in the first place" challenge is why Science is turning to a service called Proofig to make it easier to spot problems.


Send in the AI


The editorial that Science is using to announce its new policy and the Proofig website refer to its service as being powered by AI, although that's true to an unspecified extent. One step that clearly uses AI is the identification of images within the PDF of a research manuscript. Once a user confirms that the objects the system has identified correspond to the paper's figures, the software scans them all for overlapping features, even in cases of cropping and rotation.


This latter process doesn't necessarily require AI, and some neural networks trained for tasks like recognizing shared features in images aren't especially good at highlighting the details it's using to identify similarities. In contrast, Proofig's system tabulates the number of features shared by different images and provides a graphical view that draws lines connecting these features. Proofig isn't clear about what it's using to identify image features. (It uses AI to detect when multiple images are spliced into a single composite, but may not be using it for recognizing other details.)


Regardless of how it's done, the final result is a report highlighting potential similarities in different figures and showing the overlap's extent. What to do about any discoveries is left as an exercise for the journal's editors.


In Science's case, the editors will first check if there's a problem at all. In many cases, portions of images are magnified and cropped to provide a detailed view of key features. (You can see one of these cases in the images we used from a paper on fossil cyanobacteria.) So, some of the things identified by Proofig will be perfectly valid. If there's no obvious explanation for a duplication, Science's editors will ask the paper's authors to explain the issue. Although it doesn't provide numbers, Science indicated that, during a trial period, most authors had explanations for the problems and submitted corrections before publication—most, but not all. "Papers that should not be published were detected," wrote Holden Thorp, the editor-in-chief for Science's journals.


Obviously, if severe problems are found, it may indicate cases of significant research misconduct that might go well beyond a single manuscript. "Science plans to follow guidelines from organizations like COPE and STM that recommend contacting the institutions [that the researchers are based at] in cases of severe image manipulation," a spokesperson for Science told Ars. They also indicated that the publications maintain a record of all correspondence related to manuscripts submitted to Science journals, and so would be aware if authors involved in a problematic manuscript submit additional papers in the future.


Recognizing limits


Catching problems before they are published is exactly what we'd want to see. But it's important to emphasize that this system can't possibly catch everything. It won't identify any problems that don't involve duplication of data. For example, one recent paper on high-temperature superconductivity was pulled because the researcher couldn't explain the mathematical transformations that were performed on data before it being graphed—the graph itself only appeared once in the paper.


The same researcher had a separate paper retracted because one of its graphs appeared to have been copied from his thesis work, where it was the product of an unrelated experiment. Catching that would require building a database of all images from all scientific publications—possible, but also very possibly beyond the scope of a commercial company that's serving a niche market. That many images of similar data might also create an unmanageable level of false positives.


And even that wouldn't help with the simplest workaround: if you're going to falsify data, start with an unpublished image. No system will be able to recognize similarities with an image that only lives on a couple of hard drives in a research lab.


None of this is to dismiss the system that Proofig has built or Science's decision to use it. It will catch the most frustrating cases of research fraud: things we could have seen if only we'd looked at them in the right way. Stopping research fraud is a hard problem, but any step that eliminates some cases is significant.





Science, 2024. DOI: 10.1126/science.adn7530  (About DOIs).

bottom of page