Archive

Posts Tagged ‘Open Science’

Paweł Szczęsny in TEDx Warsaw

March 30th, 2010 Comments off

Pawel on Open Science. Full disclosure: I consider sharing an office with this guy for over a year to be one of the best experiences of my postdoc.

Thankful for…

November 26th, 2009 1 comment

In no particular order or context. No personal stuff and by no means a complete list:

WordPress (like, duh).

icon_big

Wikipedia (default for looking up new stuff)

600px-Wikipedia-logo.svg

Wikis in general (great lab management tool. Don’t need LIMS)

Open Access Publishing and Creative Commons licensing.

cc.logo.circle

FLOSS licensing (90% of the software I use, and 100% of what I write)

opensource-logo

Science Bloggers (too numerous to link)

Science tweeters and FriendFeeders (too numerous to link. That’s how I keep up with things)

Facebook+Friendfeed-VS-Twitter

BLAST (Sometimes it feels like bioinformatics is should be renamed to blastology)

LaTeX (Wrote my dissertation in LaTeX, and never looked back)

latex_lion

OpenOffice.org (because not everyone uses LaTeX).

OpenOfficeLogo

CiteULike (Keeping my reference library up to date and in good order)

Citeulike_logo

Delicious (Keeping my bookmarks up to date and in good order)

delicious_logo

Gmail (because finding that document you sent me a month ago would be impossible otherwise)

super-gmail-logo

Google Scholar (For standing on the toes of Hobbits. Or something like that)

mainG

GIS (for blogging and making class slides)

Vim (because emacs blows)

vim-editor_logo

Python (ease & power)

python_logo_without_textsvg

Biopython (OK, conflict of interest here, since I contributed a bit)

biopython

Friendly colleagues (They certainly are!)

umured7

Good students (gotta make my lab page).

Goulash for dinner. Can’t stand oven Turkey.

turkey

Music. Especially the latest song that is going around in my head:

Science 2.0: things that work and things that don’t

July 30th, 2009 14 comments

ResearchBlogging.org

Open Notebook

Credit: hippie on Flickr

Credit: hippie on Flickr

What is it? Open Notebook means “no insider information” You lab notebook is on a wiki, out there for everyone to see. Negative results & all.  You share your research process with the world as you go along. There are many shades to this process: you may share some of your data, edit it, sanitize it… but he general idea holds, that you share a major part of your data, methods and thoughts prior to the official publication.

Why doesn’t it work? Social and cultural reasons.  A basic tenet of science culture is that competition breeds quality and innovation.  Researchers need to pass a series of competitive thresholds to be able to continue and expand their research: secure a position to be able to start your independent research, compete for a grant to fund it (at a 10-15% funding rate in the US for biomedical research), compete for more grants so one can fund an expanding vision of one’s research, pass a threshold to receive tenure (or rather, not get fired after 6 years). In places with no tenure, pass periodic reviews. Search committees, grant review panels and tenure / periodic review committees judge a scientist by the number of publications, their innovation, how attributable they are to his group as opposed to the collaborating groups and how much impact they carry in the field. Of course the $$$ brought in by grant overheads.  To reach a truly innovative leap in research,  there is a period when you have to play your cards close to the chest, sharing your findings only with your lab, your collaborators and trusted colleagues. Revealing findings too early will get you scooped by a better equipped lab,  or at best dilute the innovative impact: your open lab notebook wiki can and will be construed as a prior publication.

Taking openness and collaboration to the extreme, if you put your notebook on a wiki, and your field is “hot” enough, you can be sure someone will use those ideas to their own benefit, very likely at your expense. It need not be malign: they could make an intuitive leap of reasoning reading your notebook before you can.  Even if they are honest and generous enough to credit you by co-authorship, how much of the innovation would be attributed to you?  And if you receive less credit for research innovation than you could, that would lower your evaluation score at whatever career stage you are in. By and large, this culture does not appear to be changing. The need to be identified with a certain type of research you can call “your own” and the need to innovate trump those collaborations that, in the eyes of your peers and evaluators, only serve to dilute your achievements.

Therefore, in the foreseeable future, I believe that the Open Science vision will be limited to non-competitive  endeavors that don’t have potential for high-impact research papers down the line. Those usually have more to do with tool and technology development rather than innovative research. That is actually a great thing: at least open-notebook science enables protocol, tool and software development more quickly. But anyone who has been involved with Free and Open Source Software has known that for three decades or more.

Different disciplines in science have different cultures. The biomedical field is known to be especially competitive.  Also, the field is going through very fast changes. I am referring to this field. I realize that things are different in physics, for example, where pre-publication of results is encouraged and credited. All the more proof that openness, or lack of it,  is a cultural issue, rather than inherent in academic research.

What does work? Collaborative technologies: wikis, blogs, discussion forums are great for publicizing oneself  (HEY!),  asking general questions about one’s methodologies, protocols, howtos, software or equipment. OpenWetWare is an example of such a success story for the experimental biology community, being a central repository for protocols and general lab how-tos. But the lab notebooks section only contains a handful of notebooks, most of them out of date. Social bookmarking like Delicious or specialized social bookmarking  like citeulike are catching on, maybe a bit slower than expected. Wikis (not open ones) are great for internal lab management as well, as more labs are discovering.

The free and open source software culture, where one is free to modify and distribute software so licensed,  has enabled new feats in scientific computation infrastructure by leveling the playing field so that anyone can use, modify and re-distribute software. In a similar vein,  grid technologies are leveling the field of computational power and hardware. Publications like PLoS-ONE, which accept research based on scientific rigor rather than innovation leaps and “exceptional interest” have filled the gap necessary to communicate research that is of interest, yet will not be accepted to journals demanding an innovative edge. Freely available data, post-publication, makes it easier to validate research by third parties, and build upon it. And of course, Open Access which makes publications available to all: not only to read, but to further publicize.

For another view that advocates a change in scientific culture that will make Open Science part of the academic incentive structure, just as publications are today, read here.


Community annotation

Credit: victoriapeckham Flickr

Credit: victoriapeckham Flickr


What is it? Genomics has become a data rich science. The deluge of genomes and metagenomes are to be too much to handle for a group of curators. The idea some genomic database maintainers have come up with is borrowed from the success of Wikipedia. If enough users would come in to annotate their favorite genes, we will eventually end up with a comprehensive collection of annotations for most if not all genes in a sequenced genome. If  ths system is good for Wikipedia entries, why not for genes?

Why doesn’t it work?

Why would anyone expect—or even worse, depend on—a community annotation effort? Imagine investing millions of dollars into state-of-the-art sequencing facilities, and then expecting volunteers from the community to stop by and run the sequencing machines. One might argue that this analogy is not valid because running a sequencing facility requires well-trained personnel, standardized protocols, clear procedures, quality controls and, most of all, tight coordination. Yet, the same professional standards are required for data curation, and it is precisely these aspects that are rarely achieved through a community contribution approach. Community annotation should be encouraged and facilitated, but the curation of biological data cannot depend solely on volunteer work. High standards and quality implies professionalism, and this, in turn, requires investing in dedicated professionals. Until this is done, data curation—and consequently the whole field of microbial genomics—will not move beyond the amateur stage.

Nikos Kyrpides Nature Biotechnology 27, 627 – 632 (2009)

What does work? The failure of community based annotations has brought the often overlooked but crucial activity of biocurators into the limelight. Recently, the International Society for Biocuration was formed. From the mission statement:

Strong support from the research community, the journal publishers, and the funding agencies is indispensable for databases to continue to provide the valuable tools on which a large fraction of research vitally depends. Structured ways for biocurators and associated developers to increase the sharing of tools and ideas through conferences and high quality peer-reviewed publications need to be developed. This will improve data capture, representation, and analysis. Secondly, biocurators, researchers and publishers need to collaborate to facilitate data integration into public resources. Researchers should be encouraged to directly participate in annotation. This will lead to improved productivity and better quality of published papers as well as stronger integrity of the data represented in databases. Thirdly, funding agencies need to recognize the importance of database for basic research by providing increased and stable funding. Finally, the recognition of biocuration as a professional career path will ensure the continued recruitment of highly qualified scientists to this field, which benefits the wider world of biomedical sciences.

http://www.biocurator.org/mission.shtml

So it’s back to expert handling of data, perhaps with some community assistance. This goes back to the attribution problem discussed above: in the current culture, there is hardly any career-building attribution to community annotations. For true community involvement, this would need to change. At the same time, biocuration needs to be recognized as a valid and important career path.


Virtual Conferences

VR

Credit: NASA

What is it? Why pay over $2000 for an international conference, suffer through delayed flights, lost baggage, forgotten poster tubes, jet lag, overpriced meals and hotels (“conference discount” my a$$), sweaty poster sessions and tight-fisted finance admins when you finally get home and try to get reimbursed (phew!) — when you can attend a conference using webcasting in the comfort of your home for a fraction of the price if not for free?

Why doesn’t it work? First: virtual conferencing technology sucks. It doesn’t matter if you use a free Skype on a $150 netbook, or a state-of-the art teleconferencing equipment with a 52″ screen and Dolby Surround, piped through at hundreds of Gigabits per second. You will get interruptions, cuts, lags, annoyances and embarrassing moments.  Second: social reasons. The important parts of a conference take place in the hallways, poster sessions, meals, banquets and, of course, the pub across the street. Incipient collaborations, exchange of ideas, brainstorming: all those take place around the dinner table and in the halls. With food, coffee and alcohol providing the social lubrication, and the talks and posters the intellectual one. A conference is much more than a series of talks.

To summarize: until we reach a level of virtuality akin to that of the Star-Trek holodeck, or at least something that manages to sync picture & sound without one or the other dropping every 3 minutes, we have no choice but to continue taking off our shoes and belts in front of  uniformed strangers.

What does work? live and archived webcasts can be an acceptable substitute to the lecture part if you could not make it to the meatspace meeting. Although you probably will not spend the time at home watching all the webcasts of all the keynote speakers you would have gone to in the conference. Microblogging is emerging as a time-saving device for those who were not there: you don’t need to devote 45 minutes to read a microblog from that talk you really wanted to attend. Done properly, perhaps with the speaker’s slides shared somewhere, it is less time consuming than watching a day’s worth of webcasts. And you can filter your interests using the microblogging notes taken by your colleagues, posted on friendfeed or such. No substitution for the real deal, which is shmoozing in the hallways. But at least you’ll get an idea about the latest & greatest in research in your field.

This is not to say that the Internet obviates socializing and work collaborations, quite the opposite of course.  Most of my collaborators are time zones away from me, and I use email, chat, wikis, Googledocs, and even (shudder) Skype conference calls for working with them. But the experience of a critical mass of people meeting for real and getting things done in a very short space of time has yet to be  duplicated by technological means.


The “End of Theory” science

einstein-end-of-science


What is it? I am referring to the Wired article penned by Wired‘s editor-in-chief, Chris Anderson last  year. It generated a large response, and a resounding echo of “me too” and  “he’s so right” articles and blog posts.   The  message of this article was that with such a deluge of data in the natural scientists, scientists can stop going through the “hypothesize, model, test” cycle. Rather, they can simply look for statistical correlation and draw conclusions from them.

Why doesn’t it work? Because it was wrong from the get-go. I don’t think any serious scientist ever went through the cycle Anderson superficially outlined.  He neglected to prefix the “observe” phase to “hypothesize, model, test”. Observation – a.k.a. data collection is the foundation to whatever comes after. Scientists first observe, then if enough observations are made that seem to fit a certain trend, they formulate one or more hypotheses. Those are tested, and the hypotheses refined or discarded based on test results. Finally, some model may or may not emerge.  In any case, the empirical process of research is more of an “(1)observe,  (2)hypothesize, (3)test, (4)observe again, (5)retest, (5)correct hypothesis,(6) bumble through previous 5 stages for quite a while, if you’re lucky you may have a (6)model”. This is the way science is done regardless of whether you have  20 data points or 20 trillion. There are, of course, qualitative differences to large quantities of data: methods of observation and sifting through data become rather different, technology starts playing a major role: you really need that computer cluster power (see also above, on community annotation). It does not preclude the need to go through the previous stages, even more carefully than you have done with 20 data points.  In the end, science is about providing explanations for observed phenomena, and that is what a model is: an explanation, the best we can come up with at this time. If you don’t have hypotheses, models and theories you don’t have science.

What does work?


M. Mitchell Waldrop (2008). Science 2.0 — Is Open Access Science the Future?
Scientific American, 298 (5), 68-73 DOI: 18444327

Hoffmann, R. (2008). A wiki for the life sciences where authorship matters Nature Genetics, 40 (9), 1047-1051 DOI: 10.1038/ng.f.217

Sagotsky, J., Zhang, L., Wang, Z., Martin, S., & Deisboeck, T. (2008). Life Sciences and the web: a new era for collaboration Molecular Systems Biology, 4 DOI: 10.1038/msb.2008.39

The real life, non virtual