open source software – Byte Size Biology

PLoS-1 published a “creationist” paper: some thoughts on what followed

Iddo — Fri, 04 Mar 2016 19:01:30 +0000

As everyone knows by now, PloS-1 published what seemed to be a creationist paper. While references to the ‘Creator’ were few, the wording of the paper strongly supported intelligent design in human hand development. A later statement from the first author seemed to eschew actual creationism, but maintained teleological (if not theological) view of evolution, and saying that human limb evolution is unclear. The paper was published January 5, 2016. However, it seemed not to get any attention. The first comment on the PLoS-1 site was on March 2, when things blew up on Twitter, quickly adopting the #handofgod and #creatorgate hashtags. (As far as I could tell, the paper URL has not been on Twitter before March 2, except for a single mention the day it was published.) On March 3, PLoS-1 announced that a retraction is in process.

Open Access is not broken

Probably the strangest reaction I have seen to #handofgod was in this article in Wired that examined the old trope that open access articles are poorly reviewed. I thought we were already beyond that, and that at least science writers have educated themselves on the matter. Review quality has nothing to do with the licensing of the journal! Tarring all OA publication with the same brush, without even saying why open access is relevant to this problem, is simply poor journalism.

Oh, and please stop confusing Open Source (for software licensing) with Open Access (for licensing research works). The two terms stem from the same philosophy of share and reuse, but they are best not conflated.

PloS-1 is not broken

Saying that publishing this paper shows the failure of PloS-1’s publication model, is like saying that because you read a news story about someone who got run over on the sidewalk, you will never walk on a street shared with cars ever again. PloS-1 publishes 30,000 papers per year. It took PloS-1 less than a month to retract from the publication time, and less than 48 hours from when this paper came on the social media radar. In contrast, it took Lancet 12 years to retract Andrew Wakefield’s infamous paper on vaccine and autism; a paper that was not just erroneous, but ruled to be fraudulent, and has caused incredibly more damage than a silly ID paper. Also, I am still waiting for Science to retract the Arsenic Life paper from 2010, and for Nature to retract the Water Memory paper from 1988. At the same time, I bet that only few of those who clamored they will resign from editing for PloS-1, will turn down an offer to guest edit for Science. Here’s an idea for all PloS-1 editors who are “ashamed to be associated with PloS one”: instead of worrying or self-publicizing on Twitter or PloS’s comment section, take up another paper to edit, and make sure it is up to snuff.

Or, you know what, go ahead and resign; if your statistical and observational skills are so poor as to not recognize your own confirmation bias, you should not be editing papers for a scientific journal.

We should not move to a system that is exclusively post publication peer review

One argument was made that peer-review failed because this paper got through. OK, people die in car crashes even if they wear seat belts. It doesn’t mean you should never wear your seat belt because you may die anyway. Pre-publication peer-review is a safety valve: it helps maintain a certain level of quality and interest appropriate for the journal at hand. In PLoS-1, that would mean anything that is scientifically sound. In other journals, topical interest as well as gauging a level of novelty or impact may play in as well. Like seat belts, it is not 100% reliable (obviously), and it is hugely problematic (OK , this is where the seatbelt analogy breaks down, pun intended).

Exclusive post publication peer review might be mostly good for those that already established themselves as prominent scientists, and whose papers will be read anyway. I have yet to hear of a postpub plan that helps filter and rank papers somehow. And no, the good science will not always “make it” somehow. Yes, prepublication peer review can be horribly slow and unfair. But doing away with it completely is not a viable solution to publication woes, especially when a viable alternative is not proposed. But see here for an alternative and interesting, if somewhat open-ended worldview. Also, see below about making pre-publication reviews public.

(Added later) there is also the worry of the mob mentality of postpub review, that may lead the editors of a journal to a harsher response than is actually warranted. This concern was expressed with the swift retraction by PLoS-1.

Alternative metrics measure interest, not quality

The issue of alt-metrics per-se has nothing to do directly with the #handofgod paper, but the number of tweets and Facebook shares of the URL of this article shot through the roof (1446 as of the time of this writing). Alt-metrics advocates keep saying that counts of social media chatter, downloads, and web views are a more reliable metric of the interest in a paper than, say, traditional citations. (And of course, the much-maligned and manipulated Journal Impact Factor, which even Thompson-Reuters who originated it say it’s an inappropriate metric for assessing individual papers, authors or institutions). Alt-metric advocates are probably correct in saying that a high altmetric shows a high level of interest on social media, but an interest in a paper, by itself, is not necessarily a good thing. You need some additional metrics to complement it and say if the interest the paper warranted comes from a good place. Also, your paper may merit social media interest and downloads, but not receive it , for various reasons.

If your paper is really bad, you may get attention on social media, and many views and downloads. If your paper is really good, you may also get attention on social media. But you won’t get attention simply because your paper is really bad or really good. You will get attention because your paper will be an attention getter. If you publish a population survey of fish in an obscure pond over 5 years, and completely mess up the diversity equations, no one will notice. If you publish an interesting variation on how to build phylogenetic trees, you may be heralded in your sub-field, but not much more. If your paper is picked up by a media outlet, or a large journals News section, you will get more attention. But that would mean that your paper is either very relevant to current public interests, in a good or bad way.

Relevant note: one way to guarantee your paper gets high alt-metrics, is to have it discussed on Retraction Watch. You probably don’t want that.

So if your paper is sexy in a good or bad way, and hopefully it will get tweeted by someone with many followers, you will get a high number of counts. Research idea: check the correlation between corresponding authors’ number of tweeter followers, and alt-metric count.

We should make reviews public

This is probably the only good idea I heard so far to help prevent a recurrence of the #handofgod mess. I am not sure the paper’s reviews and editorial decision will ever be made public, but I am confident that the reviews do not mention the ID issue as problematic, or, on the slight chance that any of them do, the editor did not acknowledge the ID rationale when approving the paper for publication. We have all received the occasional review that was lazily written, and completely uninformative. They may have been positive and uninformative (“I have no comments, good paper”), or negatively and uninformative (“this paper should not be published in your journal”). The editor would be forced to get decent and informative reviews, or look for other reviewers. And once the paper is out, we would be able to see how and why it made it past reviews.

Note on reviewer anonymity: personally, I would prefer public reviews where the reviewers have the choice to remain anonymous. For good or bad, many scientists’ careers, especially junior scientists, still depend on the good graces of their colleagues; and scientists can be just as petty and vindictive as the rest of the human race. Anonymity helps the little fish be honest about the big fish without fear of retribution, yes, it may foster less-than-honest reviews from the little fish, but that is why several reviewers are used. PeerJ and eLife already have public anonymous (or signed by choice) peer reviews.

Choosing a software license

Iddo — Fri, 03 Oct 2014 14:57:46 +0000

(With apologies to the memory of Elizabeth Barrett Browning)

How shall I license thee? Let me count the ways

I license thee to be free to distribute and embed

My code can be buggy, when I wrote it late last night

“While” loops have been made without a stated end

I license thee to change and modify

Most urgent need, by emacs and vi

I license thee MIT, as men strive for Right,

I license thee QPL, as they turn from Praise.

I license thee with a Python, put to use

In my old griefs, and with my postdoc’s faith.

I license thee with a license I seemed to lose

With my crashed disk, — I license thee with the BSD

Mozilla, Apache, of all my web-stuff! – and, if Stallman choose,

I shall license thee better with GPLv3.

BOSC 2014 Guess the Keynote Competition

Iddo — Fri, 13 Dec 2013 17:44:55 +0000

(From Peter Cock, via the OBF News Blog)

We’re pleased to officially confirm that one of the two keynote speakers
for the 15th annual Bioinformatics Open Source Conference (BOSC 2014) will
be C. Titus Brown, as he announced on Twitter recently:

Titus Brown (@ctitusbrown):
Excited to be a keynote speaker at BOSC 2014! My title:
“A History of Bioinformatics (in the year 2039)”
– plenty of room for mischief
https://twitter.com/ctitusbrown/status/410934403565490176

In recognition of the growing use of Twitter and social media within science
as a way of connecting across geographical divides, we’re announcing a
Twitter competition to guess who is scheduled to give the second keynote
at BOSC 2014 in Boston.

To enter, please tweet using hashtag #bosc2014 and include us via @OBF_BOSC,
e.g.

I think @OBF_BOSC should invite “Professor X” to be a keynote speaker
at #BOSC2014 because…

The first correct entry (within one week) will be awarded one complementary
BOSC 2014 registration fee for themselves, or a nominated group member. This
does not cover travel or accommodation, and there is no cash substitute if you
cannot attend BOSC 2014. Members of the OBF board, BOSC organizing
committee, and ISMB SIG committee are not eligible, nor are the keynote
speakers themselves.

We intend to announce the mystery keynote speaker and any Twitter competition
winner in one week’s time, but reserve the right to cut short, modify, or
cancel the competition.

Our ulterior motive is to crowd source ideas for future keynote speakers in
BOSC 2015, so some serious suggestions please

Further details about BOSC 2014 will be posted here:
http://www.open-bio.org/wiki/BOSC_2014

Thank you,

Peter Cock & Nomi Harris, BOSC 2014 co-chairs.

This was also posted to the OBF News Blog,
http://news.open-bio.org/news/2013/12/bosc-2014-keynote-competition/

BOSC and the OBF are on Twitter as:
https://twitter.com/OBF_BOSC
https://twitter.com/OBF_news

The Bio* projects: a history in graphs

Iddo — Sat, 07 Sep 2013 21:00:11 +0000

Yesterday I received an email from Kristjan Liiva, a student at RWTH Aachen University Germany. Kristjan has developed a really cool dashboard to analyze and visualize the development of collaborative OSS projects by mining their mailing lists and software repositories. (If the link doesn’t work, try again later; the project is heavily under development). The result is a very interesting picture of social trends in collaborative OSS projects.

Kristjan has mined the mailing lists and repositories of Biopython, Bioperl and Biojava, all three bio* (‘biostar’) projects have large developer and user communities, and have been around for over a decade.

One thing Kristjan did, is create a graph for each year, where the nodes are people, and the edges are based on email communications. You see a map of what biopython looked like in the early days, 2000:

Note Jeffrey Chang (then a graduate student at Stanford), Andrew Dalke and Brad Chapman (then a graduate student at University of Georgia, Athens) with >5 edges each. They were quite busy at the time.

Biopython got bigger the next year (2001):

Note the same actors are the “hubs”: Brad, Jeffrey and Andrew. Although they have more edges now, and there are new, local hub actors. Of note is Thomas Hamelryck , who wrote most of the structural biology part of Biopython. But he appears in two nodes, (as thomas@cbs and thomas ‘at’ cbs), so his contribution has been diluted in this graph. Many, many of people contributed, and some got cut off by my rendering, sorry.

Here’s 2004:

I was helping to roll releases, so I got kinda “hubbish” myself, with many edges on my node. A couple of years later I was looking for a new job, so I mostly dropped out of this scene.

The last year on Kristjan’s dashboard is 2011:

Peter Cock is the main active character in the graph (and he still is, and you are doing an amazing job Peter, BTW, probably not hearing that enough!) along with João Rodrigues, Brad Chapman, and Eric Talevich, among many others. Again, sorry about the cropped screenshot.

EDIT: if you go to the dashboard, and select “view by release” instead of “view by year” you can highlight the core contributors for each release. I assume that was done by number of contributions to the source versioning system.

Of course, there have many contributors over the years, and Biopython and the other bio* projects would not be so successful without all contributing users that provide such a diverse amount of code. Thanks to Kristjan for his work, and for letting me write about it. I’m looking forward to seeing this project develop. The social aspects of OSS projects are no less intriguing than the technological ones!

Should research code be released as part of the peer review process?

Iddo — Tue, 04 Sep 2012 22:42:43 +0000

So there have been a few reactions to my latest post on accountable research software, including a Tweeter kerfuffle (again). Ever notice how people come out really aggressive on Twitter? Must the the necessity to compress ideas into 140chars. You can’t just write “Interesting point you make there, sir. Don’t you think that your laudable goal would be better served by adopting the following methodolo…” Oops, ran out of characters. OK, let’s just call him an asshole: seven characters used. Move on.

What I will try to do here is compile the various opinions expressed about research software, its manner of publication and accountability. I will also attempt to explain what my opinion is on the matter. I do not think mine is the only acceptable one. As this particular subject is based on values, my take is subject to my experiential baggage, as it were.

Back to business.

One interesting point was raised by Kevin Karplus (on his blog, not on Twitter):

I do worry a little about one of the justifications given for distributing research code—the need to replicate experiments. A proper replication for a computational method is not running the same code over again (and thus making the same mistakes), but re-implementing the method independently. Having access to the original code is then useful for tracking down discrepancies, as it is often the case that the good results of a method are due to something quite different from what the original researchers thought. I fear that the push to have highly polished distributable code for all publications will result in a lot less scientific validation of methods by reimplementation, and more “ritual magic” invocation of code that no one understands. I’ve seen this already with code like DSSP, which almost all protein structure people use for identifying protein secondary structure with almost no understanding of what DSSP really does nor exactly how it defines H-bonds. It does a good enough job of identifying secondary structure, so no one thinks about the problems.

Kevin presents what to some may seem a radical opinion: not how to make research software accountable, but whether we should make it available in the first place. This seemingly goes against everything that scientists should stand for: transparency and the sharing of resources. He points out two possible dangers: the one to actual reproducibility, and the other to the role of bioinformaticians:

I fear that the push for polished code from researchers is an attempt to replace computational researchers with software publishing teams. The notion is that the product of the research is not the ideas and the papers, but just free code for others to use. It treats bioinformaticians as servants of “real” researchers, rather than as researchers in their own right. It’s like demanding that no papers on possible drug leads be published until Phase III trials have been completed (though not quite that expensive), and then that the drug be distributed for free

Kevin’s post got me thinking that perhaps not all research software should be released, at least not as part of the Methods section (and hence the peer-review phase of the paper) and also that perhaps research software, as we write it in the lab, is not all intended for release. My own concern is that, there might be unintended consequences in mandating code release during peer-review as a condition for publication. One such consequence might be that imperfect code (and research code is imperfect by its very nature of being highly prototypical) may frustrate referees to the point that they will not be able to properly run and assess it; and as they cannot ask for support, the publication will suffer. Also, installation is time-consuming — burdening referees with installing & testing software might just cause them to turn down papers that are mandatorily accompanied by code. The nascent Bioinformatics Testing Consortium does offer a solution to this problem, by having the code go through a hardening cycle prior to submission. But even then labs can only spend so much time and effort cleaning up, documenting and hardcoding their software. Labs that can afford to bring their research code up to hardcoding and documentation standards would be in a better position to publish than those which cannot. Is that bad? It may be. Because it is only in some cases (I’ll get to that) that robust, well-documented code is actually needed to review a paper. In many cases, code release during review is superfluous, and the effort of bringing it up to standards may unfairly impact labs whose manpower is already stretched. If the Methods section of the paper contain the description and equations necessary for replication of research, that should be enough in many cases, perhaps accompanied by code release post-acceptance. Exceptions do apply. One notable exception would be if the paper is mostly a methods paper, where the software — not just the algorithm — is key. Mostly, that is done already in journals like NAR, Bioinformatics and BMC Bioinformatics where there are such papers, and software is reviewed along with the manuscript. Another exception would be the paper Titus Brown and Jonathan Eisen wrote about: where the software is so central and novel, that not peer-reviewing it along with he paper makes the assessment of the paper’s findings impossible.

Better unsuported code than no code?

Following my previous post I was asked several times whether releasing unsupported code better than no code at all? Isn’t something better than nothing? Intuitively the answer seems obvious: release the code and let others deal with it, as some information is better than no information. I don’t subscribe to that though. When it comes to code release, documentation and support are part of the package. A lab doing less than that will be negatively impacted, as anyone releasing seemingly shoddy work may be. Again, the lab notebook analogy: when writing up the methods section in the paper, you write up the relevant part from the pages that worked, not the 90% of false starts that your lab notebook contains.

So how about taking the scripts that work, put them in the pipeline you used, and release that? Would that not be the equivalent of taking the relevant bits from your lab notebook and releasing them? Maybe. But as any programmer will tell you, the documentation, process, and even semi-hardening of the code to handle input contingencies takes a lot of time and effort. Again, we see imperfect software all around us, even (especially?) that software which we pay for. That’s why in software development there are alpha & beta phases, release cycles, documentation, upgrades etc. If your code does not compile 3 months down the line (which can be even before paper publication) because it is incompatible with the current libc release, are you responsible for changing it? Or should anyone wanting to use your code be forced to keep a double set of libraries, which is a pain to manage? There are many cases of scientific software that works “just so” with old libraries and compilers simply because the labs that released them cannot afford to adjust compatibility.

There are several problems associated with releasing code as part of the peer-review process. I am not sure we have solutions quite yet. This postwas supposed to be a response to some of the concerns raised, but I seem to gravitate back to the BTC, (disclosure: I’m a member) which at this point seems to be the only practical approach offered to those cases when code should be needed at the review stage. However, as I tried to point out in this ~~ramble~~ post, this may not necessarily always be a good thing, and should be carefully considered.

B Temperton – The Bioinformatics Testing Consortium from Jan Aerts

Can we make accountable research software?

Iddo — Fri, 24 Aug 2012 16:41:16 +0000

Preamble: this post is inspired by a series of tweets that took place over the past couple of days. I am indebted to Luis Pedro Coelho (@LuisPedroCoelho) and to Robert Buels (@rbuels) for a stimulating, 140-char-at-a-time discussion. Finally, my thanks (and yours, hopefully) to Ben Temperton for initiating the Bioinformatics Testing Consortium.

Science is messing around with things you don’t know. Contrary to what most high school and college textbooks say, the reality of day-to-day science is not a methodical hypothesis -> experiment -> conclusions, rinse, repeat. But it’s a lot messier than that. If there is any kind of process in science (method in madness) it is something like this:

1. What don’t I know that is interesting? E.g. how many teeth does a Piranha fish have.

2. How can I get to know that? That’s where things become messy. First is devising a method to catch a Piranha without losing a limb. So you need to build special equipment. Then you may want more than one fish, because number of teeth may vary between individuals. It may be gender dependent, so there’s a whole subproject of identifying boy Piranha and girl Piranha. It may also be age dependent, so how do you know how old a fish is? Etc. etc.

3. Collect lots of data on gender, age, diet, location, and of course, number of teeth.

4. Try to make sense of it all. So you may find that boy Piranha have 70 teeth, and girls have 80 teeth, but with juveniles this may be reversed, but not always, and it differs between the two rivers you visited. And in River “A” they mostly eat Possum that fall in, but in River B they eat fledgling bats who were too ambitious in their attempt to fly over the river, so there’s a a whole slew of correlations you do not understand… Also, along the way you discover that there is a new species of pacifist, vegetarian Piranha that live off algae and have a special attraction to a species of kelp whose effect is not unlike that of Cannabis on humans. Suddenly, investigating the Piranha stonerious becomes a much more interesting endeavor.

As you may have noticed, my knowledge of Piranha comes mostly from this source, so it may be slightly lacking:

I just used the Hollywood-stereotyped Pirhana to illustrate a point. The point being that ~~I love trashy movies~~ science can be a messy undertaking, and once your start, you rarely know how things are going to turn out. Things that come up along the way may cause you to change tack. Sometimes you discover you are not equipped to do what you want to do. So you make your own equipment, or if unfeasible then look for a different, more realistic goal. You try this, you try that, pushing against the boundaries of your ignorance. Until finally with a lot of hard work and a bit of luck you manage to move a chunk of matter out of the space of ignorance, and into the space of “we probably understand this a bit better now”. This is not to say that science is just a lot of fiddling around until the pieces fall together. It is chipping away at ignorance in a methodical way; in a convincing methodical way: you need to convince your peers and yourself that your discoveries were made using the most rigorous of methods. And that vein of knowledge which you have unearthed after relentless excavation is, in fact, not fool’s gold but the real deal.

Which brings me to research programming.

Like many other labs, my lab looks to answer biological questions that can be answered from large amounts of genomic data. We are interested in how gene clusters evolve. Or how diet affects the interaction between bacteria and the gut in babies. When code is written in my lab, it is mostly hypothesis-testing code. Or mucking-about code. Or “let’s try this” code. We look for one thing in the data. Then at the other. We raise a hypothesis and write code to check it. We want to check it quickly so that, if the hypothesis is wrong, we can quickly eliminate it, but if it appears to be right, we will write more code to investigate the next stage, and the one after that. We slowly unearth the vein of metal, hoping it is gold rather than pyrite. But if it’s pyrite, we want to know it as soon as possible, so we can dig somewhere else. or maybe the vein is not gold, but silver. That would be an interesting side project which would become a main project.

This practices of code writing for day-to-day lab research are therefore completely unlike anything software engineers are taught. In fact, they are actually the opposite in many ways, and may horrify you if you come from a classic software-industry development environment. Research coding is not done with the purpose of being robust, or reusable, or long-lived in development and versioning repositories. Upgrades are not provided and the product, such as it is, is definitely not user-friendly for public consumption. It is usually the code’s writer who is the consumer, or in some cases a few others in the lab. The code is rarely applicable to wide range of problems: it is suited for a specific question asked on a specific data set. Most of it ends up unused after a handful of runs. When we finish a project, we usually end up with a few files filled up with Python code and functions with names like “gene_function_correlation_7” because the first 6 did not work. (I still have 1 through 6 in the file, I rarely delete code since something that was not useful yesterday, might prove to be good tomorrow). It’s mostly throwaway code. that is also why we write in Python, since development time is fast, and there are plenty of libraries to support parsing and manipulating genomic data. More on slice-and-dice scientific coding, why scripting languages are great for it in How Perl Saved the Human Genome Project, penned by Lincoln Stein.

But back to everyday research lab coding. LPC’s tweet that triggered this conversation:

Uncomfortably close to the truth. Not that I am ashamed of my code, it worked great for me! But it would not work for someone else. I’m ashamed to force someone to waste time navigating my scripts’ vagaries.

LPC has a point. But again, code which works fine on my workstation can be uninstallable on someone else’s: all those module imports I use, and my Linux is tweaked just so in terms of libraries, etc. Also, I have to write installation & usage documentation, provide module dependencies, provide some form of test input….

And “by not supposed” I mean “I don’t have the resources”. These things take time, and neither my students nor I have that.

Again, a good point. Can there be some code-verfication standard? Can we distribute our code with the research based upon it without feeling “ashamed” on the one hand, and without spending an onerous amount of time making it fit for public consumption on the other?

At least for bioinformatic code, Ben Temperton of Oregon State University has come up with an idea: the Bioinformatic Testing Consortium (full disclosure — I am a member):

While the use of professional testing in bioinformatics is undoubtedly out of the budgetary constraints of most projects, there are significant parallels to be drawn with the review process of manuscripts. The ‘Bioinformatics Testing Consortium’, was established to perform the role of testers for bioinformatics software.

The main aims of the consortium would be to verify that:

The codebase could be installed on a wide range of infrastructures, with identified issues dealt with either in the documentation or the codebase itself.

Verification of the codebase using a provided dataset, which could then act as a positive control in post-release analysis.

Accurate documentation of the pipeline, ideally through a wiki system to allow issues to be captured for greater knowledge-sharing.

A great idea, and if taken up by journals, having your BTC-approved code accompanying your paper would go a long way to validating the science presented in your research article. As usual, the problem comes to time and funding: who will be spending them? For now, the suggestion is that testing will be done by volunteers. This may work for a short while, but in the long run funding agencies together with journals should pay some attention to an important lacuna in scientific publishing: the software that was used to generate the actual science is usually missing. If we (publishers, funders, and scientists) are all on the same side, and our goal is to produce quality science, then effort should be made to properly publish software same as effort is being made to publish the results that that software generates. If we pay that much attention to the figures in our papers, we should try to think of way to make transparent, to some extent, the software that made these figures.

Perhaps grant money + a fraction of the publication fees can go towards having your software refined by the BTC and then reviewed along with your manuscript? The thing is, publishing now is quite a laborious process as it is. Preparing acceptable code on top of everything else might push less-resourced labs away from journals that would mandate such practices. Careful thought has to be given as to how research software is made transparent without taxing research labs beyond their already stretched resources.

From the BTC presentation at ISMB 2012. Source: Ben Temperton.

Open Access: the Revolution Will be Convenient

Iddo — Wed, 19 Jan 2011 10:56:23 +0000

Some time ago an article in Linux Journal discussed the adoption of free/open course software (FOSS) by the general public. The article (I can’t seem to find it now) talked about the people that do not care about the distinction between Free as in Free Beer vs. Free as in Freedom (libre). They want software that works, and they are even willing to pay for it, although free would be nice. Also, the lack of licensing hassles is a serious bonus. The Open Source advocates and developers are the ones who care deeply about the dissemination model: code should be available more modification and reuse. Not because of the price tag, but because not sharing code hinders development. The success stories of the open source model are obvious: Internet and WWW protocols are open source, most servers are Linux based, Mac OSX is based on FreeBSD, and I’m writing this post from a Linux machine on WordPress. Also, the programmers and FOSS advocates are not starving: they are selling books, documentation, maintenance services and penguin T-shirts. My university is switching to Sakai, a FOSS based course management system and they are hiring programmers to maintain it. The IT managers realize (I hope!) that the adoption of Sakai will not “free” as in no $$$, as these programmers will cost money. The benefit of such a system over the closed system we have used so far would be to draw upon the general knowledge of the Sakai users community, and to be able to adopt and adapt modules for a learning system suited to my university’s needs.

Credit: mcwetboy on Flickr http://www.flickr.com/photos/mcwetboy/3394518027/

Android is a Linux-based operating system for smartphones which works great. One of the reasons Android gained such a large market share from Apple’s iPhone is Android’s FOSS-friendliness for app developers, as well as the operating system’s portability to many platforms.

The not-so-successful story is FOSS in desktops. Windows still rules, and frankly up until recently, Linux desktops were not that great. They failed the “grandmother test”, in which you got your grandmother who is used to windows to try and adopt Linux. There was too much under-the-hood knowledge needed for granny to be able to even do her email and word processing on a Linux machine. I believe that now the main hindrance to adopting Linux as a desktop is not the granny test, but simply things like inertia and compatibility of certain software. The Linux desktop is quite usable now.

Which brings us to the point that the adoption of FOSS by most computer users is not one of ideology, but of convenience. If they can get the job done for free, fine. If they have to pay some money for it, fine too, as long as they are not milked into continuous upgrade and support (and sometimes even that works). But they want a convenient and familiar working platform. Linux is a choice for servers because it is much better than Windows server. Android is cheaper and has more apps than iPhone, (in a large part due to the open development model) and you are not locked into hardware. Purchasers of Android phones take all of these into consideration, not the openness of the system, since most of them will never use Android in a way which directly exploits its openness. Yes, they do benefit indirectly from openness, but that is not what attracts them.

So what has Open Access (the title) has to do with Open Source?

I believe that the advocates of scientific Open Access publication are in the same situation that Open Source advocates were in a few years ago. Advocates of both OA and FOSS models had to fight interest groups to gain acceptance. The respective fights have been mostly won. Both OA and FOSS have gained enough traction to stay and even be adopted, to some extent, by some of their previous opponents from the respective industries of publishing and software.

However, OA adoption is not yet quite wide-spread. From a recent poll published in Science, only 10% of the published papers are in OA journals, but 90% of scientists support OA. So OA is a good idea, but few adopt it. Reason: by analogy ot the Linux desktop, OA does not quite yet fit “user” expectations. You might say OA fails the “old professor” test. it appears that most scientists care primarily about two things: the perceived prestige of the publication venue, and the associated price tag(*). Also, most of the scientists polled did not care about such things as retaining copyright and Creative Commons (CC) licensing. These are the equivalent of Android users that do not care (or even know) about Open Source licensing. From a non-representative polling of my colleagues, it seems to me that many are unaware of CC and see licensing issues as niceties rather than essentials. So like in the world of Open Source it is convenience, rather than ideology, that will determine the adoption of Open Access. How much does it cost? Is it in a “good” journal? Those are the equivalent questions to those that your grandmother may ask: “can I email my grandkids with it” and “can I do my taxes with it”?

So while the Open Access movement, like the FOSS movement, is fueled by an ideal, and people carrying this ideal, the ultimate adoption will be one of convenience and self-interest.

Finally, here is a slideshow of the Open Access poll highlights, from the website of the Study of Open Access Publishing project.

SoapFall2010

View more presentations from Project SOAP.

—————–

(*) One comment about the price tag: a lot has been said about how libraries have to pay to maintain subscription to toll-access journals, how that fee is rolled over to researchers in terms of overhead, and how open-access can eliminate that. I doubt widespread adoption of Open Access publication model would make much of a difference, but I confess I don’t understand very well how the economics of science publication work. Even with a wide adoption of open access, that would mean replacing one line item (overhead) with another (publication fees). Also in the past, University of California researchers have threatened boycotts against Cell Press and Nature Publishing Group when the subscription hikes were deemed to high. Yes, institutional fees are part of the price tag. But also, most researchers would go with a closed-access subscriber-pays model, as long as the price is not perceived as exorbitant.

My Hype Cycle

Iddo — Thu, 25 Nov 2010 21:08:11 +0000

The hype cycle characterizes the over-excitement and subsequent disappointment with new technologies. I expanded this a bit to include research and social trends in science which seem prevalent nowadays.

Any views represented in this hype cycle diagram are my own, and in no way represent the views of my employers, family, friends, neighbors, greengrocer, auto mechanic, my skin microbiome or my internet provider who just slapped me with a 30% fee increase.

Click for full size. Template (without writing) taken from Wikimedia Commons, under GFDL. Credit for template: Jeremy Kemp.

Bioinformatics Open Source Conference 2010 (and a poll)

Iddo — Mon, 14 Jun 2010 15:22:29 +0000

The 11th Annual Bioinformatics Open Source Conference (BOSC) 2010 is coming up in Boston, July 9-10 2010. The BOSC meetings are a great get-together of a community of programmers who are like-minded in their advocacy of open source code for science, and specifically for bioinformatics. The whole thing is run by volunteers who take a lot of time and effort to bring a top-notch meeting every year, so a big thanks to this year’s organizing committee!

If you are reading this, and you are in Boston on those dates, consider showing up, it is a great experience. There will also be a codefest on the two days before the meeting. This year’s topic is cloud computing for bioinformatics. If you like using AWS for bioinformatics or if you want to learn more, this is your chance. Amazon have provided a grant towards this codefest. (Thanks!) Biopython, Bioperl, Biojava and Bioruby developers will all be there, tailoring code to the cloud.

Which brings me to the latest poll: if you are a bioinformatics programmer, which of the Bio* packages are you using in your programming, if any? If more than one, check the one you use most frequently. Poll answers on the right. As with all Internet polls, you must be crazy if you take it at all seriously.

AMOS on Ubuntu

Iddo — Mon, 19 Apr 2010 14:44:27 +0000

AMOS is a suite of genome assembly and editing software. It includes assemblers, validation, visualization, and scaffolding tools. I have been having some issues installing AMOS on Ubuntu 9.10. Specifically, Ubuntu 9.10 has gcc 4.4, which breaks the compilation of the AMOS release version. However, the development version has been fixed to accommodate that.

If you don’t know which Ubuntu version you are running, type:

$ lsb_release -a

No more than fifteen minutes after I posted my Q to the amos-help mailing list, Florent Angly came through with a solution. I am posting his email here.

Hi,

This issue was fixed in the development version of AMOS. See below for instructions on how to install this version on Ubuntu:

Download either the regular or development version of AMOS. As of April 4, 2010,
Minimo is only available from the development version of AMOS.
i/ The regular AMOS version is available from http://sourceforge.net/projects/amos/files/, e.g.:
$ wget http://sourceforge.net/projects/amos/files/amos/2.0.8/amos-2.0.8.tar.gz/download
ii/ The development version of AMOS is in a CVS repository. To get it, run:
$ cvs -z3 -d:pserver:anonymous@amos.cvs.sourceforge.net:/cvsroot/amos co -P AMOS

In the directory where the AMOS file are located, run the following to install
the prerequisites:
$ sudo aptitude install ash coreutils gawk gcc automake mummer mummer-doc libboost-dev

For the Hawkeye component of AMOS, you need Qt3:
$ sudo aptitude install libqt3-headers

For the standard version of AMOS, skip to next step, but for the CVS development version, first, run:
$ ./bootstrap

Then regardless of the version:
$ ./configure –with-Qt-dir=/usr/share/qt3 –prefix=/usr/local/AMOS
$ make
$ make check
$ sudo make install
$ sudo ln -s /usr/local/AMOS/bin/* /usr/local/bin/

Now all the programs shipped in AMOS should be available from the command-line.
For example try:
$ Minimo -h
Regards,

Florent

You will need the AMOS development version for Ubuntu 9.10 (and above, presumably), but the regular version for 9.04 (and below). If you are getting the development version, you will also need to install cvs on your machine:

$ sudo aptitude install cvs

Hope this helps anyone struggling with installing AMOS on Ubuntu or other Linux platforms.