Not dead, overloaded
When the Moon is in the Seventh House, and Jupiter aligns with Mars, a bunch of people gather for their “Bioinformaticians anonymous” group therapy. There they metaphorically gather, commiserating about how bioinformatics is dead (or was it bioinformaticians?), just smells funny or suffers from identity theft, probably because it got drunk at the last ISMB, passed out, and left its driver’s license, along with most of its cash on the dresser at some room.
Dear friends and colleagues: I am going to settle this matter once and for all, here and now! Y’all need to graduate, find a job, get tenure, publish papers, write grants, or sell software… not sink into a funk about these non-problems every time you neglect to take your meds. So once I do that, there will be no more of this navel-gazing nonsense. You will all pull yourselves together, get back to annotating genomes, squeezing structures, tweaking SVMs, revamping web services, normalizing schemas, and of course, debugging code. If it’s existential woes you crave, read Camus, play Morissey in your iPODs, but please stop wasting my bandwidth and your (or your boss’s) grant money.
Because bioinformatics is all of the above, and much more. Which is exactly the problem (or rather the non-problem): bioinformatics is an overloaded term. Over its short history, bioinformatics has come to mean different things to different people. What’s worse, it seems to mean different things to the same person as time goes by. (A sigh is just a sigh).
We all know what overloading means. It’s one of the first things you learn in programming. The + sign in my imaginary program language is overloaded. Here it is a binary operator that delivers different results based on the types it handles:
123 + 456 = 579 addition "123" + "345" = "123345" concatenation {1,2,3} + {3,4,5} = {1,2,3,4,5} union \Llama\ + \Llama\ = \Duck\ weird
Here ‘+’ means different things, somewhat related, depending on context; my point is, so does ‘bioinformatics’. All this unnecessary soul-searching comes from confusing one meaning of bioinformatics with another.
So here are the various, co-existing and often conflated different meanings for bioinformatics. I will call them B1, B2, etc. Because it is not my job to make up names for all these different meanings.
B1: writing algorithms to answer questions in molecular biology that can be dealt with computationally. Usually takes place in computer science departments, where thinking up clever ways to solve a problem or significantly improve upon someone else’s solutions is valued. An example for B1 would be increasing the speed of a sequence assembler or improving a microarray clustering algorithm.
B2: writing programs to deal with molecular biology problems that can be dealt with computationally. the difference being, that an algorithm is solving a problem in the abstract, whereas a program is something you can actually use.
B3: Data repositories and tools. Can be using B2 and B1 to do that, but basically writing up a web resource that is a data repository, and that serves up this data in various ways and means. The service can also include programs written in the B2 category. Many people, especially experimentalists, think that B1-B3 is the be all and end all of bioinformatics, and therefore bioinformatics is not really a “science” since it deals exclusively with engineering issues. If you are doing B2 and/or B3 only, it is unlikely you will earn a position in any life science department at a research university, as those are looking for faculty who are engaged in B4 / B5. You may get a position in an engineering department, but you have to be very good at the B1 aspect too. Places like EBI and NCBI are all about B3 activities, but to sustain that, they also have a healthy dose of B1, B2, & B4 (sometimes B5, see below).
B4: Asking a biological question (as opposed to a computational question), and investigating it using B1 + B2 + B3 or any combination of the three. An example of B4 would be the amazing discovery that the kinky-haired shrew and the runny-nosed armadillo share a surprising number of close orthologs that do not exist in other related mammalian lineages. Upon examination, these were found to be retroviral inserts, indicating a horizontal gene transfer event, probably because these two animals tend to favor the same ecosystem: dumpsters of fast food joints near interstate highway exits. OK, ok, being facetious; but there are plenty of examples of great research that was done asking biological questions using computational tools only and published in highly cited papers. Some old-school experimentalists do not believe that B4 is valid science and that hypothesis driven research should always involve getting wet. There are plenty of examples (real ones) in Nature, Science, PNAS, PloS to the contrary.
As an aside I will say that , yes, thousands of people did get wet to deposit the data that B4 researchers use. But my point stands: since there is so much of those data, one can do very successful hypotheis driven science in this category. Generally, this involves doing B1, B2 and sometimes B3 activities. Actually, if you do not do B1 through B3 as a basis for B4, you will not get very far. You need to know how to develop your own methods.
If you want a real example of some recent good work that answered biological questions by data analysis, look here, here and here .
B5: Asking a biological question and answering it using a combination of experimental and computational means. This is actually and extension of B4, where you also do some of the experimental work yourself. Whether you do that depends on the type of biological questions you would like to ask, and whether your hypotheses are required to be verified experimentally rather than, or in addition to, statistical means. Doing B5 as opposed to B4 should be driven by the question at hand. Many B4 people find they need to do a B5 once in a while, and then they find a collaborator capable of doing B5, or invest in doing it themselves. This is no different than a lab well versed in genetics that suddenly finds itself needful of proteomic analysis, and sets up a collaboration with a proteomic lab to do that.
In academia, most life-science settings prefer B5 type researchers, as they believe that true hypothesis-driven science should be experimental to some extent. On the other hand, here is a great argument as to why bioinformaticians should stick to code. Most computer science & engineering settings prefer the B1 type researchers. In industry, bioinformatics was very much hyped in the early 2000’s, and the confusion of B1-B5 was rife, resulting in an ultimate disappointment and bubble-bursting. The pendulum then swung the their way, and bioinformatics became a dirty word, usually in the mouths of the same people who did not understand the B1 through B5 spectrum of fields, and the goals of those different disciplines.
OK, now I’m sure everything is clear: bioinformatics is not dead. Bioinformaticians have evolved and diverged, and are filling many more niches in the science ecosphere, resulting in some semantic confusion.
Let’s get back to work. Have a great weekend.
Peace.
Funny and true.
Andrew (B2-B3)
very humorous, i have similar kind of post ON bioinformatics-http://www.abhishek-tiwari.com/2009/02/bioinformatics-elephant-and-blind-men.html
[…] Not dead, overloaded (Byte size biology) […]
Very interesting! I like the explanations! Thank you