Displaying posts tagged with


Short bioinformatics hacks: merging fastq files

So you received your mate-paired reads in two different files, and you need to merge them for your assembler. Here is a quick Python script to do that. You will need Biopython installed.   #!/usr/bin/env python from Bio import SeqIO import itertools import sys import os # Copyright(C) 2011 Iddo Friedberg # Released under Biopython […]

Tweets from AFP/CAFA 2011

The AFP/CAFA 2011 meeting was held on July 15 and July 16. Yes, it was a huge success, and I’m not just saying that beacuse I am one of the organizers.  I will write up something more comprehensive soon; in the meantime, here are my tweets from the meeting. I am learning a lot about […]

ISMB 2011 tweets

ISMB this year had quite a few twiterrers. Hashtag: #ISMB. I tried to collect all the #ISMB tweets, so I wrote my own twitter scavenger script, but it seems to go only 3 days back.  I am not sure if this is a Twitter feature, or something with the library I am using (tweepy) or […]

CAFA Update

Nearly a year ago, I posted about the Critical Assessment of Function prediction with which I am involved. The original post from July 22, 2010 is in the block quote. After that, an update about the meeting which will be held in exactly 2 weeks. The trouble with genomic sequencing, is that it is too […]

Bio-Linux. Now available in the Cloud

For some time now, NERC has been providing us with Bio-Linux. If you don’t want to be bothered with installing all the essential bioinformatic software for your Ubuntu box, you can install Bio-Linux, either as a a Linux distro for installation from scratch, or as a set of packages for an already existing Debian or Ubuntu […]

Function predictor? Submit your work to the CAFA meeting

  Last July I introduced CAFA: Critical Assessment of (Gene and Protein) Function Annotations. Recap: the number of genomic and metagenomic sequences is growing at a horrendous rate. We are inundated with sequence data, yet the fraction of useful information we can glean from these sequences is steadily decreasing. There are simply too many sequences, and they are […]

You know your graduate student is frustrated when…

…you find this on the top of the paper pile on his desk:

The Oxygen Rush: late January, all of February and a Day in November

I have just returned from British Columbia in Canada. I have to admit that their license plate motto is quite accurate: BC is incredibly beautiful. Another thing that struck me is the provincial flag of BC: the Union Jack at the top (OK, it is British Columbia), there are white and blue horizontal stripes, and […]

Lake Arrowhead Microbial Genomics Conference

Quick post: at the Lake Arrowhead Microbial Genomics Conference. I’m a bad microblogger, but thankfully Jonathan Eisen and Ruchira Datta are doing a great job of covering this conference live. There is a friendfeed room. The Twitter hashtag is #LAMG10.  The science, people, food and location are all great. My student, David Ream, is presenting […]

Protein Function: how do we know that we know what we know?

The trouble with genomic sequencing, is that it is too cheap. Anyone that has a bit of extra cash laying around, you can scrape the bugs off your windshield, sequence them, and write a paper. Seriously? Yes, seriously now: as we sequence more and more genomes, our annotation tools cannot keep up with them. It’s […]

Bioinformatics Open Source Conference 2010 (and a poll)

The 11th Annual Bioinformatics Open Source Conference (BOSC) 2010 is coming up in Boston, July 9-10 2010. The BOSC meetings are a great get-together of a community of programmers who are like-minded in their advocacy of open source code for science, and specifically for bioinformatics. The whole thing is run by volunteers who take a […]

Computational Bridge to Experiments

A bit of background information: this is a meeting I am really happy to be part of, and even more so honored to be a co-organizer. One of my main scientific interests is the prediction of the function of genes and proteins of unknown function. Some background information: we have sequenced more than 1000 genomes […]

Closing gaps

Geek alert: this post for coders. So you sequenced your genome, reached an optimally small number of contigs, they look sane, and now you would like to see what you need for the finishing stage. Namely, how many gaps you have and what are their sizes. UPDATE: “might just be worth clarifying this is for […]

Comparative Functional Genomics: Penguin vs. Bacterium

No, not the flesh-blood-and-feathers penguin, but rather Tux, the beloved mascot of the Linux operating system. Compared with Escherichia coli, the model organism of choice for microbiologists. We refer to DNA as “the book of life”; some geeks refer to it as the “operating system of life”. Just like in a computer’s operating system, DNA […]

Combrex: Computational Bridge to Experiments

Combrex is an exciting new project at Boston University to bridge computational and experimental techniques to functionally annotate proteins. They are hiring, see below: JOB POST We are seeking to hire a creative computational scientist for a transformative project: COMBREX: A Computational Bridge to Experiments. The work will involve building a novel resource that combines […]