Science 2.0: things that work and things that don’t

Open Notebook

Credit: hippie on Flickr

Credit: hippie on Flickr

What is it? Open Notebook means “no insider information” You lab notebook is on a wiki, out there for everyone to see. Negative results & all.  You share your research process with the world as you go along. There are many shades to this process: you may share some of your data, edit it, sanitize it… but he general idea holds, that you share a major part of your data, methods and thoughts prior to the official publication.

Why doesn’t it work? Social and cultural reasons.  A basic tenet of science culture is that competition breeds quality and innovation.  Researchers need to pass a series of competitive thresholds to be able to continue and expand their research: secure a position to be able to start your independent research, compete for a grant to fund it (at a 10-15% funding rate in the US for biomedical research), compete for more grants so one can fund an expanding vision of one’s research, pass a threshold to receive tenure (or rather, not get fired after 6 years). In places with no tenure, pass periodic reviews. Search committees, grant review panels and tenure / periodic review committees judge a scientist by the number of publications, their innovation, how attributable they are to his group as opposed to the collaborating groups and how much impact they carry in the field. Of course the $$$ brought in by grant overheads.  To reach a truly innovative leap in research,  there is a period when you have to play your cards close to the chest, sharing your findings only with your lab, your collaborators and trusted colleagues. Revealing findings too early will get you scooped by a better equipped lab,  or at best dilute the innovative impact: your open lab notebook wiki can and will be construed as a prior publication.

Taking openness and collaboration to the extreme, if you put your notebook on a wiki, and your field is “hot” enough, you can be sure someone will use those ideas to their own benefit, very likely at your expense. It need not be malign: they could make an intuitive leap of reasoning reading your notebook before you can.  Even if they are honest and generous enough to credit you by co-authorship, how much of the innovation would be attributed to you?  And if you receive less credit for research innovation than you could, that would lower your evaluation score at whatever career stage you are in. By and large, this culture does not appear to be changing. The need to be identified with a certain type of research you can call “your own” and the need to innovate trump those collaborations that, in the eyes of your peers and evaluators, only serve to dilute your achievements.

Therefore, in the foreseeable future, I believe that the Open Science vision will be limited to non-competitive  endeavors that don’t have potential for high-impact research papers down the line. Those usually have more to do with tool and technology development rather than innovative research. That is actually a great thing: at least open-notebook science enables protocol, tool and software development more quickly. But anyone who has been involved with Free and Open Source Software has known that for three decades or more.

Different disciplines in science have different cultures. The biomedical field is known to be especially competitive.  Also, the field is going through very fast changes. I am referring to this field. I realize that things are different in physics, for example, where pre-publication of results is encouraged and credited. All the more proof that openness, or lack of it,  is a cultural issue, rather than inherent in academic research.

What does work? Collaborative technologies: wikis, blogs, discussion forums are great for publicizing oneself  (HEY!),  asking general questions about one’s methodologies, protocols, howtos, software or equipment. OpenWetWare is an example of such a success story for the experimental biology community, being a central repository for protocols and general lab how-tos. But the lab notebooks section only contains a handful of notebooks, most of them out of date. Social bookmarking like Delicious or specialized social bookmarking  like citeulike are catching on, maybe a bit slower than expected. Wikis (not open ones) are great for internal lab management as well, as more labs are discovering.

The free and open source software culture, where one is free to modify and distribute software so licensed,  has enabled new feats in scientific computation infrastructure by leveling the playing field so that anyone can use, modify and re-distribute software. In a similar vein,  grid technologies are leveling the field of computational power and hardware. Publications like PLoS-ONE, which accept research based on scientific rigor rather than innovation leaps and “exceptional interest” have filled the gap necessary to communicate research that is of interest, yet will not be accepted to journals demanding an innovative edge. Freely available data, post-publication, makes it easier to validate research by third parties, and build upon it. And of course, Open Access which makes publications available to all: not only to read, but to further publicize.

For another view that advocates a change in scientific culture that will make Open Science part of the academic incentive structure, just as publications are today, read here.

Community annotation

Credit: victoriapeckham Flickr

Credit: victoriapeckham Flickr

What is it? Genomics has become a data rich science. The deluge of genomes and metagenomes are to be too much to handle for a group of curators. The idea some genomic database maintainers have come up with is borrowed from the success of Wikipedia. If enough users would come in to annotate their favorite genes, we will eventually end up with a comprehensive collection of annotations for most if not all genes in a sequenced genome. If  ths system is good for Wikipedia entries, why not for genes?

Why doesn’t it work?

Why would anyone expect—or even worse, depend on—a community annotation effort? Imagine investing millions of dollars into state-of-the-art sequencing facilities, and then expecting volunteers from the community to stop by and run the sequencing machines. One might argue that this analogy is not valid because running a sequencing facility requires well-trained personnel, standardized protocols, clear procedures, quality controls and, most of all, tight coordination. Yet, the same professional standards are required for data curation, and it is precisely these aspects that are rarely achieved through a community contribution approach. Community annotation should be encouraged and facilitated, but the curation of biological data cannot depend solely on volunteer work. High standards and quality implies professionalism, and this, in turn, requires investing in dedicated professionals. Until this is done, data curation—and consequently the whole field of microbial genomics—will not move beyond the amateur stage.

Nikos Kyrpides Nature Biotechnology 27, 627 – 632 (2009)

What does work? The failure of community based annotations has brought the often overlooked but crucial activity of biocurators into the limelight. Recently, the International Society for Biocuration was formed. From the mission statement:

Strong support from the research community, the journal publishers, and the funding agencies is indispensable for databases to continue to provide the valuable tools on which a large fraction of research vitally depends. Structured ways for biocurators and associated developers to increase the sharing of tools and ideas through conferences and high quality peer-reviewed publications need to be developed. This will improve data capture, representation, and analysis. Secondly, biocurators, researchers and publishers need to collaborate to facilitate data integration into public resources. Researchers should be encouraged to directly participate in annotation. This will lead to improved productivity and better quality of published papers as well as stronger integrity of the data represented in databases. Thirdly, funding agencies need to recognize the importance of database for basic research by providing increased and stable funding. Finally, the recognition of biocuration as a professional career path will ensure the continued recruitment of highly qualified scientists to this field, which benefits the wider world of biomedical sciences.

So it’s back to expert handling of data, perhaps with some community assistance. This goes back to the attribution problem discussed above: in the current culture, there is hardly any career-building attribution to community annotations. For true community involvement, this would need to change. At the same time, biocuration needs to be recognized as a valid and important career path.

Virtual Conferences


Credit: NASA

What is it? Why pay over $2000 for an international conference, suffer through delayed flights, lost baggage, forgotten poster tubes, jet lag, overpriced meals and hotels (“conference discount” my a$$), sweaty poster sessions and tight-fisted finance admins when you finally get home and try to get reimbursed (phew!) — when you can attend a conference using webcasting in the comfort of your home for a fraction of the price if not for free?

Why doesn’t it work? First: virtual conferencing technology sucks. It doesn’t matter if you use a free Skype on a $150 netbook, or a state-of-the art teleconferencing equipment with a 52″ screen and Dolby Surround, piped through at hundreds of Gigabits per second. You will get interruptions, cuts, lags, annoyances and embarrassing moments.  Second: social reasons. The important parts of a conference take place in the hallways, poster sessions, meals, banquets and, of course, the pub across the street. Incipient collaborations, exchange of ideas, brainstorming: all those take place around the dinner table and in the halls. With food, coffee and alcohol providing the social lubrication, and the talks and posters the intellectual one. A conference is much more than a series of talks.

To summarize: until we reach a level of virtuality akin to that of the Star-Trek holodeck, or at least something that manages to sync picture & sound without one or the other dropping every 3 minutes, we have no choice but to continue taking off our shoes and belts in front of  uniformed strangers.

What does work? live and archived webcasts can be an acceptable substitute to the lecture part if you could not make it to the [:ttip=”The real life, non virtual” id=”meatspace”]meatspace[:/ttip] meeting. Although you probably will not spend the time at home watching all the webcasts of all the keynote speakers you would have gone to in the conference. Microblogging is emerging as a time-saving device for those who were not there: you don’t need to devote 45 minutes to read a microblog from that talk you really wanted to attend. Done properly, perhaps with the speaker’s slides shared somewhere, it is less time consuming than watching a day’s worth of webcasts. And you can filter your interests using the microblogging notes taken by your colleagues, posted on friendfeed or such. No substitution for the real deal, which is shmoozing in the hallways. But at least you’ll get an idea about the latest & greatest in research in your field.

This is not to say that the Internet obviates socializing and work collaborations, quite the opposite of course.  Most of my collaborators are time zones away from me, and I use email, chat, wikis, Googledocs, and even (shudder) Skype conference calls for working with them. But the experience of a critical mass of people meeting for real and getting things done in a very short space of time has yet to be  duplicated by technological means.

The “End of Theory” science


What is it? I am referring to the Wired article penned by Wired‘s editor-in-chief, Chris Anderson last  year. It generated a large response, and a resounding echo of “me too” and  “he’s so right” articles and blog posts.   The  message of this article was that with such a deluge of data in the natural scientists, scientists can stop going through the “hypothesize, model, test” cycle. Rather, they can simply look for statistical correlation and draw conclusions from them.

Why doesn’t it work? Because it was wrong from the get-go. I don’t think any serious scientist ever went through the cycle Anderson superficially outlined.  He neglected to prefix the “observe” phase to “hypothesize, model, test”. Observation – a.k.a. data collection is the foundation to whatever comes after. Scientists first observe, then if enough observations are made that seem to fit a certain trend, they formulate one or more hypotheses. Those are tested, and the hypotheses refined or discarded based on test results. Finally, some model may or may not emerge.  In any case, the empirical process of research is more of an “(1)observe,  (2)hypothesize, (3)test, (4)observe again, (5)retest, (5)correct hypothesis,(6) bumble through previous 5 stages for quite a while, if you’re lucky you may have a (6)model”. This is the way science is done regardless of whether you have  20 data points or 20 trillion. There are, of course, qualitative differences to large quantities of data: methods of observation and sifting through data become rather different, technology starts playing a major role: you really need that computer cluster power (see also above, on community annotation). It does not preclude the need to go through the previous stages, even more carefully than you have done with 20 data points.  In the end, science is about providing explanations for observed phenomena, and that is what a model is: an explanation, the best we can come up with at this time. If you don’t have hypotheses, models and theories you don’t have science.

What does work?

M. Mitchell Waldrop (2008). Science 2.0 — Is Open Access Science the Future?
Scientific American, 298 (5), 68-73 DOI: 18444327

Hoffmann, R. (2008). A wiki for the life sciences where authorship matters Nature Genetics, 40 (9), 1047-1051 DOI: 10.1038/ng.f.217

Sagotsky, J., Zhang, L., Wang, Z., Martin, S., & Deisboeck, T. (2008). Life Sciences and the web: a new era for collaboration Molecular Systems Biology, 4 DOI: 10.1038/msb.2008.39

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

14 Responses to “Science 2.0: things that work and things that don’t”

  1. LabGrab says:

    First, that is a well written and very inclusive post. With just the amount of conversation revolving around open access in research a cultural shift is coming. Doesn’t it seem like there are many more channels to get published in these days? And many republishing options that aren’t pursued. I agree completely on virtual conferences and what doesn’t work. You have to be there to be in the crowd, make connections, find opportunities.

  2. Certainly there are limitations of doing Open Notebook Science – probably the biggest is preventing a reasonable way to protect intellectual property. But it does not prevent publication in peer-reviewed journals, although it does constrain one to journals that accept work that has been made public. I think the fear of scientists “stealing” work that is well indexed on common search engines is not justified. If you are going to steal work a better strategy is to find documents that are not indexed – like proposals or obscure conference proceedings.

  3. Shari says:

    Failures and inaccurate theories help us to learn and forge ahead. The empirical process has served us well for generations. Why change it? The documentary “Naturally Obsessed: the making of a scientist” ( highlights the beauty of the discovery process by following the lives of 3 PhD candidates. I agree—an overhaul of the scientific process is not only unnecessary, but may even undercut future findings!

  4. Alan Marnett says:

    Thanks for the insightful post! Science is an interesting mix of competition and benevolence. People want to improve society, but they also want to get credit (and subsequent funding) for what they’ve done. The reality in science is that there will almost always be someone who has more money, expertise, contacts or experience than you who could move on your results faster than you. And if given the chance, especially in a funding climate like we’re in now, they will.

    99% of what we do on a daily basis is not a trade secret. It’s the foundation that those trade secrets get built on. I just launched a site, called, that is an open-access platform for scientists to share videos of their own 99%. Tips, tricks, protocols, reaction mechanisms, experiments- the works. It’s the way we all learn and share in the lab, now it’s just on the internet…

    In the next decade, I think we’re going to see major changes in the way people view “publishing”. The old print model is dead. The American Chemical Society just discontinued the print version of all of their journals (with the exception of library versions for archiving). This is just the beginning.

  5. Great post. I have to disagree with the Kyrpides quote, though. He makes the exact arguments that people made against wikipedia, and they’re wrong to the same extent (and for the same reasons). It’s an attitude that’s steeped in a print-based culture, where there’s no way to change things once they’ve been published. In that world, you have to get things right before you make anything public.

    But in a web world, you can make things public first, and then fix and improve them as you go along. That public access allows you to draw on a much, much larger community of people than just your little group of in-house annotators. That’s the lesson of wikipedia: it’s very difficult for your group to beat the entire rest of the world, which turns out to be full of smart, interested, and capable people.

    Some people worry that someone will waste work based on early poor-quality data. And there are several responses to that: one, that even professionally vetted data contains errors, and there needs to be a feedback process for fixing those in any case, and enabling the person who found the error to fix it themselves turns out to be a decent process; two, getting rough data earlier is sometimes better than cleaner data later; three, there can and should be quality information associated with the data that’s richer than the current vetted/not-vetted distinction, and four, what if you could get an email whenever the data you’re working from is updated? Then you know about mistakes as soon as they’re found.

    The real barriers to community annotation are not the ones Kyrpides talks about; they’re the cultural barriers that Iddo talks about in this post. And there are also tool improvements that can help by making individual contributions to community databases more visible and easier to evaluate. Then it’s up to the tenure review boards and granting agencies to make use of that information.

    It’ll happen, but it’s going to be a generational shift. There is an opportunity now for smart institutions to broaden their evaluations beyond journal articles, though.

    There’s also an opportunity to build tools to facilitate community-based processes. If anyone is interested in collaborating on those, please get in touch.

  6. I should add, professional vs. volunteer is an entirely separate question. A community annotation effort may employ lots of dedicated professionals. As I see it, community annotation is not primarily about getting volunteers to do the work; it’s about whether or not you limit yourself to the people on your direct payroll (or with whom you have formal agreements). That imposes transaction costs that drastically limit participation in the annotation process. And that limitation has much greater costs than benefits, in my view.

  7. Ben says:

    The “End of Theory” was the most ridiculous thing I’ve seen coming out of Wired Science. I have no idea how or why anyone would believe that!

  8. Ian Holmes says:

    Iddo, first of all thanks for a thoughtful and provocative post.

    Regarding community annotation, I agree with what Mitch said (perhaps unsurprisingly). The criticism of “community annotation” in your post actually seems like a bit of a straw man argument, apparently railing against the idea that community annotation aspires to do away with curators. This would indeed be a fairly bizarre idea.

    Rather, when I think of “community annotation”, I think of the community as *including* those curators, as well as others who wouldn’t have been able to contribute under a more bottlenecked model.

    You probably know that Nupedia, a precursor to Wikipedia, solicited input solely from experts; and that Wikipedia grew out of this, explicitly rejecting that model in favor of a mixture of expert and amateur input. It’s important to recognize that this was never meant to be EXCLUSIVELY amateur input: the experts who had contributed to Nupedia were encouraged to contribute to Wikipedia too. (Some of them walked off in a huff, but that’s life.)

    In biology, WikiProject RNA is one of the best examples of how this can work. After RFAM decided to migrate their annotations onto Wikipedia, many of the entries grew in size far beyond what their small team of curators could have achieved. Go and look at the Wikipedia entry for the hammerhead ribozyme and tell me again that community annotation does not work.

    Certainly, many questions remain about how best to organize the process of reviewing and approving biological annotations (especially tricky details like gene structures), and you are right that community annotation is never a panacea. Wikipedia had the advantage of a straightforward and easily-understandable model that everyone had already seen (encyclopaedias). Even with that advantage, Wikipedia’s evolved rules and procedures for reviewing entries have become quite complicated. But these are things that can presumably be worked out for biology too.

    I also think you are unnecessarily cynical at times about the possibility of scientific progress in the absence of officially-blessed career incentives. People understand that their own research can benefit from good publicly-available annotations, and this can itself be sufficient incentive to contribute. Wikipedia demonstrates this, and also illustrates another (related) phenomenon: once a community effort gets big enough, it can develop its own internal reputation economy, independent of external incentives.

  9. Ian Holmes says:

    PS, I assume you are going to provide me with some explicit incentive for commenting on your blog post. 😉

  10. Iddo says:

    Wow, so many replies. I am sorry, but I cannot provide an answer y’all deserve (especially you Ian & Mitch). Reason: I wrote this post a week ago, and timed it to be published today. Tomorrow we are starting a road trip to Oxford Ohio: I took up an assistant professorship at MUOhio (dpts. of microbiology & CS).

    So I am up to my eyeballs in boxes & bubblewrap. The movers are showing up in 10 hrs. I got to get some sleep, because the first leg of the journey is Vegas: 7 hr drive from San Diego.

    But I will try, since you all tool the trouble to post comments. The main sticking point seems to be community annotation. The main argument seems to be: “Wikipedia works, so community annotation can too”.

    Well, Wikipedia is not monolithic. There are many partial, bad & wrong entries. Also, certain entries in WP appeal to a larger audience, hence a larger pool of potential entry writers, which enhances quality. On the other hand, certain entries are so obscure or specialized, they never get written, or get written up badly by one person, and never corrected, because there is little or no interest. There was a well known criticism of Wikipedia that devoted a mass of words to a fictional war (Ent wars in LOTR), and very little to the largest conflict in the 20th century after WWII (Congo). The same rules on interest and lack of such apply in academia: for every Hammerhead and p53, there are ORFans with certain motifs and proposed activities that should be annotated somehow, but lack any community interest (or even knowledge). Curators are for that, as well as updating teh p53s and Hammerheads of this world.

    I don’t think Nikos was advocating to doing away with community annotation. He does say not to depend “solely” on community annotation. No strawman argument here, IMHO.

    Again, this is a rather superficial answer, which is the best I can do at this moment, and for a while now. Sorry. Thanks you all for taking the time & trouble to comment.

    Regarding your incentive Ian: a crate of J&B is on its way…

    OK, back to the boxes. I’ll be Tweeting on on the Family Friedberg’s adventures over the next two weeks or so.


  11. Ian Holmes says:

    Congrats on the new job! Microbiology and CS… fun combination.

    Regarding the community annotation debate, I still think it is a straw man, because I don’t really think anyone is realistically talking about depending “solely” on community annotation. I may be wrong, but that’s never how I’ve understood it. The Nupedia->Wikipedia example was meant to illustrate that even Wikipedia does not depend “solely” on community annotation, unless you define “community” to include the carefully-vetted community of experts that was already engaged in writing Nupedia.

    This is how I understand “community annotation” — as an effort that involves curators at every stage (as seeders, reviewers, expert contributors, etc.) — not just a random untargeted exercise in unsupervised crowdsourcing (which clearly is risky unless your desired content is very well-defined & easily-understood).

    At some points you seem dangerously close to an all-out assault on Wikipedia, which frankly, I think would be a doomed rearguard action. You referred to some well-known biases in Wikipedia that essentially reflect the composition of its community: computer-related content and geek-beloved fiction are both overrepresented. Yes, this is true, and there are several other valid criticisms of Wikipedia kicking around out there, but these criticisms are hugely outweighed by its advantages.

    Let me ask, do you use Wikipedia or do you use Encyclopaedia Britannica?? hmmm????

    Enjoy your road trip, I’ll be following on Twitter 🙂

  12. Iddo says:

    The movers are late. So I have some time to kill at Starbucks (my own home wifi is packed).

    @Ian:OK, I guess we agree that annotation can be a community + experts efforts.

    However, I still maintain that the WP model of community annotation (or even monitored community annotation) is problematic when the community is small-ish, as is the case for many biomolecules, cell networks that need to be annotated.

    Wikipedia is, as its name suggests, encyclopaedic. Yes, of course I use Wikipedia for initial reference to most things. What’s this “Britannica”? But I generally find that the more specific the subject, the less well-written is the WP entry, if it is at all. (Yes, I do fix ’em when I can).

    It’s probably high time someone did a survey of community annotation sites to see what works and what doesn’t. I know of a couple of white elephants that are out there, but that does not mean that they are empty because community annotation is bad. It might be due to mismanagement or lack of community interest and awareness. I think that those two factors are the main hindrances

    Finally, I think that incentives, and cultural change are the key. Unfortunately, the CV line “annotated 5 paths in cellnetworkpedia” is considered a null line. Worse, it might be interpreted as if you are wasting your time instead of doing “real” science. You might recognize and commend your grad students for this kind of work, as will I, but we are a minority.

    A reputation economy is still problematic, at least in Life Sciences. I don’t think I am being cynical about it, and I do hope that the culture changes somehow. CS departments will recognize contributions to Open Source projects as a valid part of one’s CV (depending on context), but it seems like LS departments are lagging behind in recognizing the analogous community effort of group annotations.

    If this comment is a bit messy, it’s because I didn’t sleep much last night…. 🙂

  13. Iddo says:

    @Jean-Claude Bradley

    C/N/S can give you serious grief over posters or public conference publications sometimes. Quite a few journals do. Open access journals are more liberal with this kind of information.

    Also, the problem is not stealing, although cases of outright stealing of ideas and getting away with it do happen. I was referring to the less malign grey zone of taking someone’s quarter-baked idea as expressed in their lab notebook, and running with it quicker than they can: formulating it into a proper hypothesis, testing & publishing. Or recognizing a seeming aberration in their data for what it is before they do. Things like that.

  14. widdowquinn says:

    That’s a great, and thought-provoking post. I had two thoughts reading it, both to do with annotation.

    First, I agree with Kyrpides – community annotation doesn’t work (very well), and I don’t think it ever will, so long as the goal is to generate a canonical annotation for all organisms. Realistically, we don’t have time or understanding enough (yet) in the community to do that job. I’ve been involved in a number of sequencing and annotation projects – mostly bacterial, but some eukaryotes. These projects have sometimes involved intensive manual annotation (the early ones), and later ones have been almost entirely community-based. Manual annotation is, everyone seems to agree, labour-intensive. For one bacterial genome, we had four or five annotators at a time, working for four months to get about 4500 genes annotated (with the usual proportion of hypothetical calls) – that’s approximately 18 person-months per genome, being optimistic. This is a long time, given the rate at which sequencing is progressing. Those projects that used community annotation have fallen between two extreme forms: (i) the active involvement of the organism’s research community, who are also invested in the sequencing project; (ii) passive involvement of anyone interested in an organism or gene family who might happen along to the community annotation database and make a contribution. The projects that fell closer to the first kind worked better than those that fell towards the second kind. Sadly, the second kind is nearer to the Wikipedia model that is often held up to ‘work’. In my opinion, the reasons that the Wikipedia model works for Wikipedia and not (so far) for community annotation are complex, but touch on: visibility of the annotation resource; time available to potential annotators; motivation/reward for annotators; perceptions of what an annotation *is*; and the absolute amount of expertise available (there are not, for example, as many people interested in polyketide synthases as there are pedantic Rush fans). Community annotation doesn’t, also, solve the problem of the ‘static’ GenBank annotations, and the model by which changes can only be made by submitters. That ever reliable repository remains relatively immutable and the community impact on annotation is minor.

    Secondly, automated annotation remains a problem. The problem is not just one of faulty gene calls, incorrect annotation transfer, inconsistency and propagation of errors, but of capacity and server load. Public annotation servers like BaSys and RAST rapidly become swamped and less responsive, regardless of their quality or mutual inconsistency. Just as there is no ‘glory’ to be had in community annotation, there doesn’t seem to be enough ‘glory’ in funding a service that can accept the total increasing global demand for annotation capability. I suspect that the answer lies in a distributed service – everyone with their own annotation server, or a grid solution (maybe with a central clearing house), for example – rather than the current bottlenecks.

    On the whole I’m currently of the opinion that we have to treat annotations as rolling hypotheses, and we should be ready to correct whatever is ‘canon’ as evidence comes in. I think that *this* is the role of community annotation, and that the central repositories should recognise and adapt to this.