Should research code be released as part of the peer review process?So there have been a few reactions to my latest post on accountable research software, including a Tweeter kerfuffle (again). Ever notice how people come out really aggressive on Twitter? Must the the necessity to compress ideas into 140chars. You can't just write "Interesting point you make there, sir. Don't you think that your laudable goal would be better served by adopting the following methodolo..." Oops, ran out of characters. OK, let's just call him an asshole: seven characters used. Move on. What I will try to do here is compile the various opinions expressed about research software, its manner of publication and accountability. I will also attempt to explain what my opinion is on the matter. I do not think mine is the only acceptable one. As this particular subject is based on values, my take is subject to my experiential baggage, as it were. Back to business. One interesting point was raised by Kevin Karplus (on his blog, not on Twitter):
I do worry a little about one of the justifications given for distributing research code—the need to replicate experiments. A proper replication for a computational method is not running the same code over again (and thus making the same mistakes), but re-implementing the method independently. Having access to the original code is then useful for tracking down discrepancies, as it is often the case that the good results of a method are due to something quite different from what the original researchers thought. I fear that the push to have highly polished distributable code for all publications will result in a lot less scientific validation of methods by reimplementation, and more “ritual magic” invocation of code that no one understands. I’ve seen this already with code like DSSP, which almost all protein structure people use for identifying protein secondary structure with almost no understanding of what DSSP really does nor exactly how it defines H-bonds. It does a good enough job of identifying secondary structure, so no one thinks about the problems.Kevin presents what to some may seem a radical opinion: not how to make research software accountable, but whether we should make it available in the first place. This seemingly goes against everything that scientists should stand for: transparency and the sharing of resources. He points out two possible dangers: the one to actual reproducibility, and the other to the role of bioinformaticians:
I fear that the push for polished code from researchers is an attempt to replace computational researchers with software publishing teams. The notion is that the product of the research is not the ideas and the papers, but just free code for others to use. It treats bioinformaticians as servants of “real” researchers, rather than as researchers in their own right. It’s like demanding that no papers on possible drug leads be published until Phase III trials have been completed (though not quite that expensive), and then that the drug be distributed for freeKevin's post got me thinking that perhaps not all research software should be released, at least not as part of the Methods section (and hence the peer-review phase of the paper) and also that perhaps research software, as we write it in the lab, is not all intended for release. My own concern is that, there might be unintended consequences in mandating code release during peer-review as a condition for publication. One such consequence might be that imperfect code (and research code is imperfect by its very nature of being highly prototypical) may frustrate referees to the point that they will not be able to properly run and assess it; and as they cannot ask for support, the publication will suffer. Also, installation is time-consuming -- burdening referees with installing & testing software might just cause them to turn down papers that are mandatorily accompanied by code. The nascent Bioinformatics Testing Consortium does offer a solution to this problem, by having the code go through a hardening cycle prior to submission. But even then labs can only spend so much time and effort cleaning up, documenting and hardcoding their software. Labs that can afford to bring their research code up to hardcoding and documentation standards would be in a better position to publish than those which cannot. Is that bad? It may be. Because it is only in some cases (I'll get to that) that robust, well-documented code is actually needed to review a paper. In many cases, code release during review is superfluous, and the effort of bringing it up to standards may unfairly impact labs whose manpower is already stretched. If the Methods section of the paper contain the description and equations necessary for replication of research, that should be enough in many cases, perhaps accompanied by code release post-acceptance. Exceptions do apply. One notable exception would be if the paper is mostly a methods paper, where the software -- not just the algorithm -- is key. Mostly, that is done already in journals like NAR, Bioinformatics and BMC Bioinformatics where there are such papers, and software is reviewed along with the manuscript. Another exception would be the paper Titus Brown and Jonathan Eisen wrote about: where the software is so central and novel, that not peer-reviewing it along with he paper makes the assessment of the paper's findings impossible.