Can we make accountable research software?
Preamble: this post is inspired by a series of tweets that took place over the past couple of days. I am indebted to Luis Pedro Coelho (@LuisPedroCoelho) and to Robert Buels (@rbuels) for a stimulating, 140-char-at-a-time discussion. Finally, my thanks (and yours, hopefully) to Ben Temperton for initiating the Bioinformatics Testing Consortium.
Science is messing around with things you don’t know. Contrary to what most high school and college textbooks say, the reality of day-to-day science is not a methodical hypothesis -> experiment -> conclusions, rinse, repeat. But it’s a lot messier than that. If there is any kind of process in science (method in madness) it is something like this:
1. What don’t I know that is interesting? E.g. how many teeth does a Piranha fish have.
2. How can I get to know that? That’s where things become messy. First is devising a method to catch a Piranha without losing a limb. So you need to build special equipment. Then you may want more than one fish, because number of teeth may vary between individuals. It may be gender dependent, so there’s a whole subproject of identifying boy Piranha and girl Piranha. It may also be age dependent, so how do you know how old a fish is? Etc. etc.
3. Collect lots of data on gender, age, diet, location, and of course, number of teeth.
4. Try to make sense of it all. So you may find that boy Piranha have 70 teeth, and girls have 80 teeth, but with juveniles this may be reversed, but not always, and it differs between the two rivers you visited. And in River “A” they mostly eat Possum that fall in, but in River B they eat fledgling bats who were too ambitious in their attempt to fly over the river, so there’s a a whole slew of correlations you do not understand… Also, along the way you discover that there is a new species of pacifist, vegetarian Piranha that live off algae and have a special attraction to a species of kelp whose effect is not unlike that of Cannabis on humans. Suddenly, investigating the Piranha stonerious becomes a much more interesting endeavor.
As you may have noticed, my knowledge of Piranha comes mostly from this source, so it may be slightly lacking:
I just used the Hollywood-stereotyped Pirhana to illustrate a point. The point being that
I love trashy movies science can be a messy undertaking, and once your start, you rarely know how things are going to turn out. Things that come up along the way may cause you to change tack. Sometimes you discover you are not equipped to do what you want to do. So you make your own equipment, or if unfeasible then look for a different, more realistic goal. You try this, you try that, pushing against the boundaries of your ignorance. Until finally with a lot of hard work and a bit of luck you manage to move a chunk of matter out of the space of ignorance, and into the space of “we probably understand this a bit better now”. This is not to say that science is just a lot of fiddling around until the pieces fall together. It is chipping away at ignorance in a methodical way; in a convincing methodical way: you need to convince your peers and yourself that your discoveries were made using the most rigorous of methods. And that vein of knowledge which you have unearthed after relentless excavation is, in fact, not fool’s gold but the real deal.
Which brings me to research programming.
Like many other labs, my lab looks to answer biological questions that can be answered from large amounts of genomic data. We are interested in how gene clusters evolve. Or how diet affects the interaction between bacteria and the gut in babies. When code is written in my lab, it is mostly hypothesis-testing code. Or mucking-about code. Or “let’s try this” code. We look for one thing in the data. Then at the other. We raise a hypothesis and write code to check it. We want to check it quickly so that, if the hypothesis is wrong, we can quickly eliminate it, but if it appears to be right, we will write more code to investigate the next stage, and the one after that. We slowly unearth the vein of metal, hoping it is gold rather than pyrite. But if it’s pyrite, we want to know it as soon as possible, so we can dig somewhere else. or maybe the vein is not gold, but silver. That would be an interesting side project which would become a main project.
This practices of code writing for day-to-day lab research are therefore completely unlike anything software engineers are taught. In fact, they are actually the opposite in many ways, and may horrify you if you come from a classic software-industry development environment. Research coding is not done with the purpose of being robust, or reusable, or long-lived in development and versioning repositories. Upgrades are not provided and the product, such as it is, is definitely not user-friendly for public consumption. It is usually the code’s writer who is the consumer, or in some cases a few others in the lab. The code is rarely applicable to wide range of problems: it is suited for a specific question asked on a specific data set. Most of it ends up unused after a handful of runs. When we finish a project, we usually end up with a few files filled up with Python code and functions with names like “gene_function_correlation_7” because the first 6 did not work. (I still have 1 through 6 in the file, I rarely delete code since something that was not useful yesterday, might prove to be good tomorrow). It’s mostly throwaway code. that is also why we write in Python, since development time is fast, and there are plenty of libraries to support parsing and manipulating genomic data. More on slice-and-dice scientific coding, why scripting languages are great for it in How Perl Saved the Human Genome Project, penned by Lincoln Stein.
But back to everyday research lab coding. LPC’s tweet that triggered this conversation:
Uncomfortably close to the truth. Not that I am ashamed of my code, it worked great for me! But it would not work for someone else. I’m ashamed to force someone to waste time navigating my scripts’ vagaries.
LPC has a point. But again, code which works fine on my workstation can be uninstallable on someone else’s: all those module imports I use, and my Linux is tweaked just so in terms of libraries, etc. Also, I have to write installation & usage documentation, provide module dependencies, provide some form of test input….
And “by not supposed” I mean “I don’t have the resources”. These things take time, and neither my students nor I have that.
Again, a good point. Can there be some code-verfication standard? Can we distribute our code with the research based upon it without feeling “ashamed” on the one hand, and without spending an onerous amount of time making it fit for public consumption on the other?
At least for bioinformatic code, Ben Temperton of Oregon State University has come up with an idea: the Bioinformatic Testing Consortium (full disclosure — I am a member):
While the use of professional testing in bioinformatics is undoubtedly out of the budgetary constraints of most projects, there are significant parallels to be drawn with the review process of manuscripts. The ‘Bioinformatics Testing Consortium’, was established to perform the role of testers for bioinformatics software.
The main aims of the consortium would be to verify that:
- The codebase could be installed on a wide range of infrastructures, with identified issues dealt with either in the documentation or the codebase itself.
- Verification of the codebase using a provided dataset, which could then act as a positive control in post-release analysis.
- Accurate documentation of the pipeline, ideally through a wiki system to allow issues to be captured for greater knowledge-sharing.
A great idea, and if taken up by journals, having your BTC-approved code accompanying your paper would go a long way to validating the science presented in your research article. As usual, the problem comes to time and funding: who will be spending them? For now, the suggestion is that testing will be done by volunteers. This may work for a short while, but in the long run funding agencies together with journals should pay some attention to an important lacuna in scientific publishing: the software that was used to generate the actual science is usually missing. If we (publishers, funders, and scientists) are all on the same side, and our goal is to produce quality science, then effort should be made to properly publish software same as effort is being made to publish the results that that software generates. If we pay that much attention to the figures in our papers, we should try to think of way to make transparent, to some extent, the software that made these figures.
Perhaps grant money + a fraction of the publication fees can go towards having your software refined by the BTC and then reviewed along with your manuscript? The thing is, publishing now is quite a laborious process as it is. Preparing acceptable code on top of everything else might push less-resourced labs away from journals that would mandate such practices. Careful thought has to be given as to how research software is made transparent without taxing research labs beyond their already stretched resources.