“Our aim here is to maximize amusement, rather than coherence.”
Joke papers have been known to sneak into otherwise serious publications. Notably, in the Sokal Affair, Alan Sokal, a physicist, published a nonsense paper in Social Text, a leading journal in cultural studies. After it was published, Sokal revealed this paper to be a parody, kicking off a culture war between the editors of Social Text who claimed they accepted the paper on Sokal’s authority, and Sokal & others who said that this was exactly the problem: papers should be subject to review, rather than being accepted on authority. The Sokal affair highlighted the cultural differences between certain sections of the social sciences and the natural sciences, specifically about how academic merit should be established.
Another well-publicized hoax publication occurred when a group of MIT students wrote SciGEN, a program that generates random computer-science papers. One such paper was accepted to the WMSCI conference in 2005, in a “non peer-reviewed” track. Once the organizers learned they’ve been had, they disinvited the “authors”, which did not stop them from going to the conference venue anyway, and holding their own session at the conference hotel.
Following the SCIGen incident, Predrag Radivojac and his team at Indiana University, Bloomington have developed a method to distinguish between authentic and inauthentic scientific papers, which he published as a (hopefully) authentic paper (PDF) in SIAM. The idea is to distinguish between true papers, and robo-papers such as those generated by SCIGen. Their method does the following: first, they pulled a set of about 1,000 authentic papers. Then they generated 1,000 papers from SCIGen. They then subjected both types of papers to Lempel-Ziv compression, similar the kind you use to zip your files. Why use compression? The ratio of sizes between compressed and uncompressed documents is a good way to measure the information that document contains. Since compression algorithms rely on the frequency of character patterns in the document, one may assume that documents with different patterns can be characterized by different compression ratios. The team from IU exploited the differences between typical patterns in robo-papers and those in real papers, and created a method that can distinguish between the types of papers based on their compression profiles. The method is available online. This can help reduce the number of robo-papers from going into robo-conferences.