So what’s new with humans?

Man is the only animal that laughs and weeps, for he is the only animal that is struck with the difference between what things are and what they ought to be.
— William Hazlitt

We like to think that we are the only species capable of emotional self-awareness and therefore the only “animal that laughs and weeps”, but that is quite probably untrue, as other animals have been shown to laugh and perhaps weep.

Credit: Shiny Things, Flickr

 

Whatever that elusive quality is that distinguishes us from our closest cousins, the chimps and the bonobos, it is to be found in our genome. Since human and some great apes and other primate genomes have been sequenced, the basis for comparing these blueprints exists. Many studies have been done comparing the conservation of genes, copy numbers of genes, intergenic regions, control regions, synteny, splicing and other mechanisms that may explain the differences between us and our 96% cousins. As expected, no one factor can  explain why bonobos are peaceful and sexual, chimps are aggressive and patriarchal, and humans worry about taxes and blog.

Are there any new genes in humans that can help explain these differences? New genes can arise in various ways: gene duplication, exon shuffling, horizontal transfer, genes may split up (fission) or merge (fusion).

But how about genes that are completely new in humans? Do we have genes that we can claim as our own and are neither homologous to those in other apes nor have arisen from a mix & match manipulation in the common lineage of all apes? Are there actually human genes that are just that: exclusively human?

A group from China and Canada has decided to tackle that question. They looked specifically for genes that are new in the human lineage, but not in chimp or orangutan. (I’m not exactly sure why they did not look in Gorilla too, which is the other great ape with a mostly sequenced genome, perhaps because the assembly is still very much in progress.)

So how does one go about looking for genes that are human-only? The pipeline Wu and colleagues have set up looks like this:

 

Clockwise, from top left:

1. They scanned the human genome   for genes with a high similarity in the genomes of chimp, orangutan and rhesus macaque. That left them with 584 genes (out of roughly 25,000) which did not have an ortholog in other primates.

2. A simple sanity check: those human genes with no start or stop codons were probably mis-identified. We are now down to 352 genes.

3. Of the 352, they looked for those that have disrupted homologous regions in chimp and/or orangutan. That mans that while the gene is functional in humans, it is not functional in the other primates. Disrupted homologous regions can mean that in non-humans the gene does not have a start codon, or has a premature stop codon, or has some frameshift mutation that renders it non-translatable. From 352 we are now down to 66 new human gene candidates.

4. But a human gene, even if not functional in other primates, may have been functional in a common ancestor of all primates, lost in the orangutan and chimp lineages, but maintained in humans. This history not make the gene as brand-new human-only. So in the 66 remaining genes they looked for sequences where the mutation that rendered them functional (like an ATG start codon, or the removal of a missense mutation) was found only in humans. Now we are left with 46 genes.

5. Great, so we have 46 open reading frames in humans that look like original, human-lineage only genes. But are they functional? Do they actually transcribe into RNA and translate into protein? (RNA-only genes were excluded from this rather conservative pipeline, they are hard enough to identify as it is.)  To find that out, they looked for transcribed regions EST databases (for RNA), and in the PRIDE peptide database (for protein). Now we are left with 27 genes that are novel in humans, and because they are translated are probably active.

Trouble is, some of these genes are listed only in certain versions of Ensembl, the genome database from which the researchers took their data; (they used version 56.) This highlights a problem with the annotation of genes with no homologs: their annotation is volatile, and may change between different versions of the same database of the exact same genome. To overcome this problem, the researchers subjected different versions of Ensembl (40 through 55) to the same pipeline described above. They discovered an additional 33 genes that are candidates for de novo  human-lineage only active genes, bringing the total up to 60.

What are those genes like?  Why are they found only in humans? Can they help explain the differences between human and other primates? Well, for one, they’re short. Only one or, at most, two exons. This makes sense as these relatively new genes had not the time to accumulate splice sites.

The researchers moved on to look where the genes were expressed. They used RNA-Seq data from 11 different human tissues: adipose, whole brain, cerebral cortex, breast, colon, heart, liver, lymph node, skeletal muscle, lung and testes.

Here is what they found:

Levels of expression of de novo genes in 11 tissues. (A) Mean normalized expression levels of de novo originated genes in 11 tissues are defined by the mean level of expression as the numbers of unique reads mapping to coding regions divided by the total length of all the coding regions, divided by the total number of valid reads in the samples (×10−8). The vertical axis represents value of mean the normalized expression levels and abscissa axis represents the 11 tissues. (B) The proportion of the de novo originated genes that have expressed reads in the 11 tissues. The vertical axis represents the values of proportion, and abscissa axis represents the 11 tissues. (C) The proportion of the de novo originated genes having their highest normalized expression levels in each of the 11 tissues. The vertical axis represents the values of proportion, and abscissa axis represents the 11 tissues. doi:10.1371/journal.pgen.1002379.g002

 

Panel C is the  business bit: the expression of the 60 de novo  human genes normalized by the general expression levels of genes in those tissues. (Pray, where are the error bars?). Seems like in Woody Allen’s two favorite organs, the testes and the cerebral cortex, do these genes have the highest expression. This actually makes some sort of sense: the testes are hypothesized to be a hotbed (sorry…) of evolutionary novelty, with all the meiosis going on there. The  high expression of the de-novo human genes in the cerebral cortex also seems to confirm our anthropomorphic prejudice: we are smarter. Yay. EDIT: Following MRR’s comment: yes, we should check de-novo genes and their expression in chimps. Perhaps the high expression of  de-novo genes exclusive to chimp lineage is in the cerebral cortex and testes too.

 

The authors do point out that there may be many other de-novo human lineage genes:

Our estimated rate, though, for de novo origin may be underestimated due to the conservativeness of our pipeline. First, as described above, in our pipeline, translatable open reading frames must have been complete in the human genome and disrupted in both the chimpanzee and orangutan genomes to be candidates as a de novo gene. Genes that did not have a clear ortholog (i.e., a sequence with very high similarity) in either the chimpanzee or the orangutan genomes (both of which are less complete than the human genome, and thus could be a missing genes) were not used. It is also often difficult to determine whether a protein-coding gene originated specifically on the human lineage or if it originated in a primate ancestor but was then lost on both the chimpanzee and orangutan lineages. The conservativeness of our pipeline thus only allowed us to accept genes where we could clearly show human specific mutations generated complete protein-coding reading frames, and that these were conserved for disrupting state in both the chimpanzee and orangutan genomes. As both the chimpanzee and orangutan sequences should be non-functional sequences, and thus not under selection, there is a reasonable likelihood that a second mutation, in addition to the human open reading frame completing mutation, could have occurred in the chimpanzee or orangutan that would prevent us for identifying these genes as having a de novo origin on the human lineage.

Also, PRIDE and PeptideAtlas, the databases of proteins they used may be underpopulated, and not include many other proteins.

ResearchBlogging.org

To conclude, yes, humans do have their own brand-new genes which, together with many other genomic features, may help explain the differences between humans and other primates. And there are probably more of these genes than we have found so far.

 

 

As for what it means to be human:

Far out in the uncharted backwaters of the unfashionable end of the Western Spiral arm of the Galaxy lies a small unregarded yellow sun. Orbiting this at a distance of roughly ninety-eight million miles is an utterly insignificant little blue-green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.

Perhaps it was the late, great Douglas Adams who nailed it.


Wu, D., Irwin, D., & Zhang, Y. (2011). De Novo Origin of Human Protein-Coding Genes PLoS Genetics, 7 (11) DOI: 10.1371/journal.pgen.1002379

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Oh, but to receive such a rejection letter!

It is with no inconsiderable degree of reluctance that I decline the offer of any Paper from you. I think, however, you will upon reconsideration of the subject be of opinion that I have no other alternative. The subjects you propose for a series of Mathematical and Metaphysical Essays are so very profound, that there is perhaps not a single subscriber to our Journal who could follow them.

David Brewster, physicist and mathematician and inventor acting as editor of The Edinburgh Journal of Science to Charles Babbage, mathematician, philosopher, inventor and mechanical engineer; father of the computer circa 1821.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

The genomics programming language

Genomics is a new and exciting programming language based on Brainfsck. Here are the commands:

g    Move pointer to the right.
e    Move pointer to the left.
n    Increment the cell at the pointer.
o    Decrement the cell at the pointer.
m    Jump forward past the matching i if the cell at the current pointer is zero.
i    Jump backward to the matching m unless the cell at the current pointer is zero.
c    Output the value of the cell at the pointer.
s    Input a byte and store it in the cell at the pointer.

As you can probably tell, I spent a lot of time working on genomics, but out pure generosity I am placing this incredibly useful language in the public domain. I’m sure we will see a BioGenomics group on Open Bioinformatics Forum any day now, and that genomics will prove to be a game-changer in the field of, um, genomics.

Allow me to end this post with the following inspirational statement:

nnnnnnnnnnmgnnngnnnnnngnnnnnnnngnnnnnnnnnngnnnnnnnnnnneeeeeoiggnnnnnnnn
nnnnncenncgggnnnnnnnncgncnnnnnnnceoooooooceeecgggnncoocgoooooooocncennn
nnnnncoooocennnnnnnnnnnnnnnnnnncggnnnnceeeencoooooooooooooooooooooooc

Thank you.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Short bioinformatics hacks: reading mate-pairs from a fastq file

If you have a merged file of paired-end reads, here is a quick way to read them using Biopython:

from Bio import SeqIO
from itertools import izip_longest
# Loop over pairs of reads
readiter = SeqIO.parse(open(inpath), "fastq")
for rec1, rec2 in izip_longest(readiter, readiter):
    print rec1.id  # do something with rec1
    print rec2.id  # do something with rec2
    .
    .

izip_longest is fed the same iterator, readiter, twice. However, readiter.next(), which advances the iterator, is called on the first argument and then on the second argument. Since next() is being called on the same iterator, successive records are yielded.

By “merged file” I mean a fastq file where the mate-pairs are one after the other, as in:

@HWUSI-EAS687_112864999:8:1:1980:1055#CGAGAA/1
GTTTGTTTTAATTTCAGTGATTCATCAATTTTAAAAAAAGATGAGAATAATAACTATTATAAAAAGATAAATAAATGTGAAATTTATATTTCAAATTCAA
+
@:DGBGDDD@GGGDGDGDDGD@GGGGE@GGG?EBGGGADDDDGEG4?3BA*::7:GEGGGG>EDDDDAG@G><ADDGBGGGGEGGGGDGGGFEGGGEFDE
@HWUSI-EAS687_112864999:8:1:1980:1055#CGAGAA/2
AATGAATTGAATAAATATAAGAAGGATGATTAATAATAATTCTTGAATTTGAAATATAAATTTCACATTTATTTATCTTTTTATAATAGTTATTATTCTC
+
D?DB:@8EBDB>GG:=<DED79>>A8CEC8DGDGG8CEC<BGGG+BAAEA@D<2D71;:8AG<ABBEEEEBEDC?C>AACDDDCD>AD<@EFFDDDECBB
@HWUSI-EAS687_112864999:8:1:2274:1058#CGAGAA/1
CCTCAGTTAGCTTCTATTGGTATTAACATGGGTGAATTTACTAAACAATTTAATGACCAAACTAAAGATAAAAATGGTGAAGTTATACCTTGTATAATTA
+
GFGGGHHGHHHHHHGHHHHHGHHHHHHHFBGDBGEHHHHFHHEHHHHDFHCGFFFHHHHHHHGHHGGEBHEEFFCEE@E>A>>8A@EBE@BBB>BGEEDB
@HWUSI-EAS687_112864999:8:1:2274:1058#CGAGAA/2
AACTGGAGTTGTTTTAATTTCAAAAGTAAAAGATTTATCTTTAAATGCTGTAATTATACAAGGTATAACTTCACCATTTTTATCTTTAGTTTGGTCATTA
+
IIIIIIIIIIGIIIDHHIIIIDIHD8CGGGGDADEIIIIIIIHIIGBGD>DGDGGDGIGIIIIBGDG@GFHIIII<C<CCGHHHIHIBGDEEB3BEDEE@

The solution is derived from this Stackoverflow entry.

Of course, if the mate-pair files are not merged then you can use this script to merge them. Also illustrates using iterators from two different files in one for loop:

#!/usr/bin/env python
from Bio import SeqIO
import itertools
import sys
import os
def merge_fastq(fastq_path1, fastq_path2, outpath):
    outfile = open(outpath,"w")
    fastq_iter1 = SeqIO.parse(open(fastq_path1),"fastq")
    fastq_iter2 = SeqIO.parse(open(fastq_path2),"fastq")
    for rec1, rec2 in itertools.izip(fastq_iter1, fastq_iter2):
        SeqIO.write([rec1,rec2], outfile, "fastq")
    outfile.close()

if __name__ == '__main__':
    outpath = "%s.merged.fastq" % os.path.splitext(sys.argv[1])[0]
    merge_fastq(sys.argv[1],sys.argv[2],outpath)
Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Brainf**k while waiting for a flight

Warning: NSFW language.

Brainfuck is a Turing-complete programming language consisting of eight commands, each of which is represented as a single character.

> Increment the pointer.
< Decrement the pointer.
+    Increment the cell at the pointer.
-    Decrement the cell at the pointer.
.    Output the ASCII value of the cell at the pointer.
,    Input a byte and store it in the cell at the pointer.
[    Jump forward past the matching ] if the cell at the current pointer is zero.
]    Jump backward to the matching [ unless the cell at the current pointer is zero.

Having arrived almost 3 hours early to JFK, flying back to Cincinnati, I spent the time coding up a Python script which inputs a string and outputs a Brainfuck source code which, when run with a Brainfuck interpreter, outputs said string. So for example:

to_brainfuck "Hello, World!"

Will output:

++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>++++++++
++++.>>+.+++++++..>+.<<<<++++++++++++++.------------.>>>>++++++++.----
----.+++.<.--------.<<<+.-----------------------.

 

The horror above is what Brainfuck source code looks like. When you run the above code with a Brainfuck interpreter, it will print "Hello, world!".

Brainfuck interpreters and compilers can be found here. Ubuntu has a Brainfuck interpreter called bf.

Probably not the best code I wrote, could use some honing. Still, it served the purpose of killing a couple of hours.

#!/usr/bin/env python
import sys

class bf:

    def __init__(self,format_bf=True):
        """
        Initiate brainfuck code string. Pointers are initiated to the
        following values, with their ascii equivalents shown
        ptr0 = 10 ptr0 is used as a loop counter
        ptr1 = 30
        ptr2 = 60  @
        ptr3 = 80  #P
        ptr4 = 100 #d
        ptr5 = 110 #n
        """
        self.bf_code = ''
        self.bf_code += "++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]"        

        # Index: cell number. Value: cell value.
        self.ptrs = {1: 30, 2: 60, 3: 80, 4: 100, 5:110}
        self.ptr_idx = 0 # which pointer is being used
        self.format_bf = format_bf # Format the bf code. Default True.

    def string_bf(self,instring):
        # Accepts a string. Outputs bf code which prints that string
        # when run with a bf interpreter
        for c in instring:
            self.bf_code += self.to_bf(c)
        # add a newline 
        self.bf_code += self.to_bf(chr(10))
        if self.format_bf:
            self.bf_code = self._format_bf_code(self.bf_code) 
        return self.bf_code

    def _format_bf_code(self,bf_code):
        # Format the bf source code to 70 chars / line
        outstr = ''
        for i,c in enumerate(bf_code):
            if i % 70 == 0 and i > 0:
                outstr += '%s\n' % c
            else:
                outstr += c
        return outstr

        
    def to_bf(self,c):
        # accept a character c, generate the bf code to print that
        # character.

        # increment / decrement the data pointer
        if c < '@':
            ptr_target = 1
        elif c >= '@' and c < 'P':
            ptr_target = 2
        elif c >= 'P' and c < 'd':
            ptr_target = 3
        elif c >= 'd' and c <'n':
            ptr_target = 4
        else:
            ptr_target = 5
        ptr_inc_str = self.increment_ptr(ptr_target)
        # Now increment / decrement the value which the pointer points
        ascii_target = ord(c)
        ascii_val = self.ptrs[self.ptr_idx]
        inc_val, inc_val_str = self.increment_val(ascii_val, ascii_target)
        self.ptrs[self.ptr_idx] += inc_val
        return ptr_inc_str+inc_val_str+'.'

    def increment_val(self,ascii_val, ascii_target):
        inc_val = ascii_target - ascii_val
        if inc_val < 0:
            inc_val_str = '-'*abs(inc_val)
        elif inc_val > 0:
            inc_val_str = '+'*abs(inc_val)
        else:
            inc_val_str = ''
        return inc_val, inc_val_str
    

    def increment_ptr(self,ptr_target):
        ptr_inc = ptr_target - self.ptr_idx
        if ptr_inc < 0:
            ptr_str = '<'*abs(ptr_inc)
        elif ptr_inc > 0:
            ptr_str = '>'*ptr_inc
        else:
            ptr_str = ''
        self.ptr_idx += ptr_inc
        return ptr_str

if __name__ == '__main__':
    my_bf = bf()
    if sys.argv[1] == '-f':
        intext = file(sys.argv[2]).read()
    else: 
        intext = sys.argv[1]
    o = my_bf.string_bf(intext)
    sys.stdout.write("%s\n" % o)
    

To run:

chmod +x bf_string.py
./bf_string.py "Brainfork is awesome!" > mycode.bf # generate Brainfuck code into mycode.bf
bf mycode.bf # The brainfuck interpreter bf
Brainfork is awesome!

And mycode.bf will contain:

++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>++++++.>
>>++++.<<+++++++++++++++++.>+++++.>----.<---.>+.+++.<+++++.<<<++.>>>--
.>+.<<<<.>>.>>++++.<----.>----.----.<++++++++.--------.<<<+.----------
-------------.

You can also run it with the -f option, where the input string will be read from a file.
 

UPDATE: and here is a brainfuck interpreter, written in Python.
UPDATE II: Following Vincent's comment, here is a fixed version of the interpreter. This time it should work with nested loops. Thanks Vincent.

#!/usr/bin/env python
import sys
class BfInterpreter:
    def __init__(self,inpath):
        self.iptr = 0
        self.cells =[0]
        self.cmdptr = 0
        self.infile = file(inpath)
        self.bfcode = self.infile.read()
        self.cloop_stack = [] # location of current startloop in bf code
        self.ploop_stack = [] # which ptr is current loopcounter
        self.loop_ended = False # Indicates if a loop counter just
                                # reached zero
    def inc_ptr(self):
        self.iptr += 1
        if self.iptr > len(self.cells) - 1:
            self.cells.append(0)
    def dec_ptr(self):
        self.iptr -= 1
        if self.iptr <= -1:
            raise ValueError,"negative pointer"
    def inc_cell(self):
        self.cells[self.iptr] += 1
    def dec_cell(self):
        self.cells[self.iptr] -= 1
        # Check if this is a loop counter
        if self.ploop_stack:
            if self.ploop_stack[-1] == self.iptr and \
               self.cells[self.iptr] == 0:
                self.loop_ended = True
    def start_loop(self):
        self.cloop_stack.append(self.cmdptr)
        self.ploop_stack.append(self.iptr)
    def end_loop(self):
        if self.cells[self.iptr] > 0:
            if not self.cloop_stack:
                raise ValueError,"no startloop character found"
            else:
                self.cmdptr = self.cloop_stack[-1]
        elif self.cells[self.iptr] == 0 and self.loop_ended:
            self.loop_ended = False
            self.cloop_stack.pop()
            self.ploop_stack.pop()
            
    def putc(self):
        sys.stdout.write("%s" % chr(self.cells[self.iptr]))
    def getc(self):
        self.cells[self.iptr] = ord(sys.stdin.read(1))
    def run_bf(self):
        self.cmdptr = -1
        while True:
            self.cmdptr += 1
            if self.cmdptr >= len(self.bfcode):
                break
            cmd = self.bfcode[self.cmdptr]
            # print cmd,
            if cmd == '>':
                self.inc_ptr()
            elif cmd == '<':
                self.dec_ptr()
            elif cmd == '+':
                self.inc_cell()
            elif cmd == '-':
                self.dec_cell()
            elif cmd == '[':
                self.start_loop()
            elif cmd == ']':
                self.end_loop()
            elif cmd == '.':
                self.putc()
            elif cmd == ',':
                self.getc()
if __name__ == '__main__':
    bf = BfInterpreter(sys.argv[1])
    bf.run_bf()

To run, download the file, name it (say, pybf) and then:

chmod +x pybf
./pybf bf_source_code_file.bf
Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Music Monday: Whole Lotta Love

This excellent cover of “Whole Lotta Love” went viral last week. Michael Winslow of Police Academy fame gives his interpretation to the Led Zeppelin classic:

And if that gave you a taste for the original, go here.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Rumors of The Scientist’s Demise Have Been Greatly Exaggerated

The Scientist is one of my favorite go-to destinations for keeping up with current biomedical research. That’s why I was rather sad when it was recently announced that The Scientist will be closing down. However, it seems like The Scientist will continue to be published after all:

 

NEW YORK, NY–(Marketwire – Oct 14, 2011) – Sciencenow Inc., a member of The Science Navigation Group, and LabX Media Group are pleased to announce that they have signed a nonbinding Letter of Intent specifying terms for the acquisition of The Scientist by LabX Media Group. The parties hope to close a transaction by the end of October.

Sande Giaccone, Sales and Marketing Director of The Scientist, said, “We are delighted that, following the decision to cease publication of The Scientist, LabX Media Group has stepped in to save The Scientist and keep the majority of its existing team together. We hope to return to our normal high service level for all our readers, contributors, and advertisers in the next few weeks. We sincerely appreciate the support of our advertisers in the past and hope to regain their confidence going forward. The Scientist is complementary to LabX Media’s existing stable of products and, subject to closing of a transaction, we look forward to working with them in the future.”

Mary Beth Aberlin, Editor in Chief of The Scientist, said, “Naturally, we were all saddened by the decision to cease publication of The Scientist, and grateful to our readers and contributors for all their kind words concerning the magazine. The editorial team and I are delighted that LabX Media Group has been able to agree on terms with Sciencenow, Inc. with such dispatch. Our dedicated editorial team will remain intact and continue to produce a magazine that maintains our editorial standards.

Bob Kafato, President of LabX Media Group, said, “The quality life science content that The Scientist produces is second to none and we are happy to be adding this to our portfolio of media products for lab professionals.”

Kudos to LabX. You have rescued a fine journal. Thanks to Linda Kosta for calling my attention to this fortunate turn of events.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Not dead yet

 

The Dead Body That Claims It Isn’t: I’m not dead.
The Dead Collector: What?
Large Man with Dead Body: Nothing. There’s your ninepence.
The Dead Body That Claims It Isn’t: I’m not dead.
The Dead Collector: ‘Ere, he says he’s not dead.
Large Man with Dead Body: Yes he is.
The Dead Body That Claims It Isn’t: I’m not.
The Dead Collector: He isn’t.
Large Man with Dead Body: Well, he will be soon, he’s very ill.
The Dead Body That Claims It Isn’t: I’m getting better.
Large Man with Dead Body: No you’re not, you’ll be stone dead in a moment.

Monty Python and the Holy Grail

So it goes between John Cleese, Eric Idle and John Young.

But there is a sea, or rather lake, that is not dead yet either, feels happy, and is going for a walk on Thursday. The Dead Sea, located in the Judean Desert, roughly divided between Jordan and Israel, has a salinity of 33% – more than eight times that of seawater (3.5%). No animals or plants can survive these conditions, and the lake does look dead to the casual observer. But in the late 1930s Benjamin Elazari Volcani discovered that the Dead Sea does, in fact, support several types of microorganisms. A bit of history: Volcani’s research into microbial life in the Dead Sea led to him being awarded the the first doctoral degree in microbiology by the Hebrew University of Jerusalem, 1943. In 1975, Mullakhanbhai and Larsen named Halobacterium volcanii, a halophilic isolate, after B.E. Volcani. It is now know as the archaean, not bacterium, Haloferax volcanii.

At 33% salinity we all float. When you're down here with us, you'll float too.

ResearchBlogging.org
Several other archeael isolates were found, noting that the Dead Sea is not quite dead yet. In fact, the lake sports what might be an underreported biodiversity. In addition to archaea it also has fungi, bacteria, protozoa and mermaids. OK, maybe not mermaids.

Now the Dead Sea has been found to be more alive than ever. A groups of Israeli and German divers have found freshwater springs deep in the Dead Sea. The springs are about 30m deep, and lie in of large craters 30meters in diameter. Look at the video below, taken by the divers. Between 1:54 and 2:10 you can see the freshwater mixing with the saltwater. The stark differences in salinity makes for a surreal underwater smoke effect. And, the real kicker, at 2:26 you can see a thick microbial mat, like gunk all over the rocks near the spring.

It would be very interesting to find out who, exactly, comprises this mat. As far as I know, this analysis has not been published yet. But the initial results are reported in National Geographic:

Preliminary analyses of samples collected in the craters suggest that the springs’ bacterial communities are very diverse—akin to what you’d find living on rocks in a regular saltwater sea, he added.

The top of the springs’ rocks are covered with green biofilms, which use both sunlight and sulfide—naturally occurring chemicals from the springs—to survive. Exclusively sulfide-eating bacteria coat the bottoms of the rocks in a white biofilm.

Not only have the organisms evolved in such a harsh environment, Ionescu speculates that the bacteria can somehow cope with sudden fluxes in fresh water and saltwater that naturally occur as water currents shift around the springs.

 

All I can say is: wow. Microbial mats in the Dead Sea, which we only find about now. The  Dead Sea thriving with whole carpets of life. Who’d’ve thunk?


ELAZARI-VOLCANI, B. (1943). Bacteria in the Bottom Sediments of the Dead Sea Nature, 152 (3853), 274-275 DOI: 10.1038/152274c0

Mullakhanbhai, M., & Larsen, H. (1975). Halobacterium volcanii spec. nov., a Dead Sea halobacterium with a moderate salt requirement Archives of Microbiology, 104 (1), 207-214 DOI: 10.1007/BF00447326

Buchalo, A., Nevo, E., Wasser, S., Oren, A., & Molitoris, H. (1998). Fungal life in the extremely hypersaline water of the Dead Sea: first records Proceedings of the Royal Society B: Biological Sciences, 265 (1404), 1461-1465 DOI: 10.1098/rspb.1998.0458

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

The Friedberg Lab is Recruiting Graduate Students

 

The Friedberg Lab is recruiting graduate students, for both Master’s and Ph.D.

WE ARE:  A dynamic young lab  interested in gene, gene cluster and genome evolution, understanding microbial communities and microbe-host interactions by metagenomic analyses, developing algorithms for understanding gene cluster evolution, and prediction of protein function from protein sequence and structure.

YOU ARE: an independent, hard-working problem-solving, energetic and motivated scientist-to-be. You have graduated or are about to graduate in computer science and/or biology or related fields. The Friedberg Lab is a “dry” lab, so some programming skills are required (Python preferred).

Existing and planned projects include:

1. Computational protein function prediction and assessment of function prediction algorithms. The Friedberg Lab is among the leaders of the Critical Assessment of Function Annotations (CAFA), an international effort of dozens of research groups to asess and improve function prediction algorithms. We are looking for students that are excited about prediction of protein function from sequence and structure. Also, how well can we assess how well our algorithms are doing? The next CAFA meeting will take place in Berlin, July 2013 and the Friedberg Lab will play a central role in  answering these questions.

2. Metagenomics:  we are studying the interaction between the microbiome and the host using metagenomic and metatranscriptomic data. In collaboration We are looking at how the human microbiome affects gene expression in the host. Together with Robb Chapkin’s lab at Texas A&M we are analyzing microbial genomes and their effect on transcription in the human gut. We are also developing algorithms for context-based function prediction in metagenomic data. Simply put: how well can we prediction the function of a gene from its neighbors? Since many of the genes in metagenomic data have no known homologs, we are developing creative ways to computationally discover their function.

3. Microbial Evolution: we are researching the evolution of Mycoplasma, a bacteria genus which serves us as model clade for understanding genome evolution. Mycoplasma have the smallest genomes of any organism, and being parasitic evolve quickly. Together with the Balish Lab we expect to sequence several new species and strains in the next year, and we are developing computational methods and a central community database  for analyzing the Mycoplasma tree of life. Besides the biological aspect, this project is also a great opportunity to get into web programming, database design, and learn how top design and code community-based scientific software. 

4. Biopython: Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics. If you would like to become a Biopython developer, part of an international community of open-source scientific software developers, the Friedberg Lab is the place for you. This option is especially attractive for Master’s students seeking to enter bioinformatics in Industry.

5. Insert your brilliant idea here! I love new projects!

The lab is equipped with its own 10-node cluster computer, several workstations, and has access to Miami University’s Supercomputing Center, and the Ohio Supercomputer Center at Ohio State University.  Students have an excellent research environment, and many opportunities to collaborate with labs on and off campus.

Students can apply to the Friedberg Lab via the following graduate programs at Miami University:

1. Microbiology (Master’s and PhD).

2. Cell, Molecular and Strcutural Biology (PhD only).

3. Computer Science (Master’s only).

You are welcome and encouraged  to inquire further. I love talking with prospective students. If you would like to set up a phone/Skype chat please send your CV to:

friedberg.lab.jobs ‘at gmail ‘dot’ com

Looking forward to hearing from you.

 

Iddo Friedberg, PhD

Assistant Professor, Microbiology and Computer Science (affiliate)

Miami University

Oxford, OH, USA

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Funny Science Friday: The IgNobels, Wall Street Journal

The IgNobel prizes were awarded this week. Yes, the Nobel prizes too, but the IgNobels are the really interesting ones. (For an thoughtful piece about why the Nobel Prizes in the sciences do not enhance or may even hurt scientific recognition, read Carl Zimmer’s piece at The Loom) .

The IgNobel prizes are awarded annually for research that “makes you laugh, and then makes you think”. Actually, I liked their previous motto better: “research that cannot or should not be reproduced”. But like the Nobel prizes, IgNobels are not awarded only for science. For example, The mayor of Vilnius received the IgNobel Peace Prize for fulfilling every urban driver’s dream and running over an illegaly parked car with a BTR-60 (an armored personnel carrier mistakenly identified as “tank” by the IgNobel prize awarders, but what do these Harvard peaceneaks know about military stuff).


The Physiology Prize was given to Anna Wilkinson (of the UK), Natalie Sebanz (of THE NETHERLANDS, HUNGARY, and AUSTRIA), Isabella Mandl (of AUSTRIA) and Ludwig Huber (of AUSTRIA) for their study “No Evidence of Contagious Yawning in the Red-Footed Tortoise.
REFERENCE: ‘No Evidence Of Contagious Yawning in the Red-Footed Tortoise Geochelone carbonaria,” Anna Wilkinson, Natalie Sebanz, Isabella Mandl, Ludwig Huber, Current Zoology, vol. 57, no. 4, 2011. pp. 477-84.

The prize I like best was the medicine prize awarded to : Mirjam Tuk (of THE NETHERLANDS and the UK), Debra Trampe (of THE NETHERLANDS) and Luk Warlop (of BELGIUM). and jointly to Matthew Lewis, Peter Snyder and Robert Feldman (of the USA), Robert Pietrzak, David Darby, and Paul Maruff (of AUSTRALIA) for demonstrating that people make better decisions about some kinds of things — but worse decisions about other kinds of things‚ when they have a strong urge to urinate.

In other entertaining, for want of a better term, the Wall Street Journal came out with an op-ed which made quite a few heads explode. Basically using a rather heavy-handed non-sequitur the author tried to unravel climate science:

The science [global warming] is not settled, not by a long shot. Last month, scientists at CERN, the prestigious high-energy physics lab in Switzerland, reported that neutrinos might—repeat, might—travel faster than the speed of light. If serious scientists can question Einstein’s theory of relativity, then there must be room for debate about the workings and complexities of the Earth’s atmosphere

For a full dissection of this weirdness, please go to Phil Plait’s response in Bad Astronomy.

 

For one, there is always room for questioning science. But that questioning must be done by science, using a scientific basis, and above all else be done above board and honestly. But that’s not how much of the climate science denial has been done. From witch hunts to the climategate manufactrovery, much of the attack on climate science has not been on the science itself, but on the people trying to study it. And when many of those attacks have at least a veneer of science, it’s found they are not showing us all the data, or are inconclusive but still getting spun as conclusive by climate change deniers. And if you point that out, the political attacks begin again (read the comments in that last link).

Second, the neutrino story has nothing to do with climate change at all. It’s a total 100% non sequitur, a don’t-look-behind-the-curtain tactic. Just because one aspect of science can be questioned — and I’m not even saying that, which I’ll get to in a sec — doesn’t mean anything about another field of science. Bryce might as well question the idea that gravity is holding us to the Earth’s surface.

 

 

The whole thing generated the delightful hashtag #WSJScience. Read the tweets before they expire.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Probably a good time to talk about pancreatic cancer

 

Percent of pancreatic cancer patients diagnosed when the disease is still localized: 8

Their 5-year relative survival: 21.5%

Percent of pancreatic cancer patients diagnosed when the disease has metastasized: 53

Their 5-year relative survival: 1.8%

Mean survival rate after diagnosis: < 1 year

Ranking in cause of cancer-related deaths worldwide: 4

Number of effective screening procedures for early detection of pancreatic cancer: 0

Number of people who will be diagnosed with pancreatic cancer during their lifetime: 1 in 71.

Estimated number of people in the US who were diagnosed with pancreatic cancer in 2010: 43,140

Estimated number of people in the US who died from  pancreatic cancer in 2010: 36,800

Median age of death from pancreatic cancer in the US, 2003-2007: 73

Number of National Institutes of Health funded grants having the words “pancreatic  AND (adenocarcinoma OR carcinoma OR cancer)” in their abstracts: 477

Estimated annual  funding for these grants: $34,897,600

Estimated cost of Kim Kardashian’s wedding: >$18,000,000

Estimated total annual health care costs in the US for treating pancreatic pancer: $2,600,000,000

Average cost of treating a pancreatic cancer patient in the US: $87,784

Average cost of treating a pancreatic cancer patient in Sweden: $21,899

Total annual costs of pancreatic cancer to the US economy, including lost work days, lost wages: $4,900,000,000

Percent of pancreatic cancer patients diagnosed with major depression: 76

 

 

 

People near me who I loved and died of pancreatic cancer: 1*

 

 


Sources:

http://seer.cancer.gov/statfacts/html/pancreas.html

http://www.cancer.org/Cancer/PancreaticCancer/DetailedGuide/pancreatic-cancer-detection

http://projectreporter.nih.gov/reporter.cfm

http://www.wikeez.com/en/people/pictures-kim-kardashian%E2%80%99s-extravagant-wedding-12871

http://www.medscape.com/viewarticle/409001_2

http://www.asco.org/ascov2/Meetings/Abstracts?&vmview=abst_detail_view&confID=40&abstractID=33928

http://www.ncbi.nlm.nih.gov/pubmed/21850604

http://www.medscape.com/viewarticle/468138_6

 


 
* Not S.J.
 

 

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Using phylogenetics to reconstruct a 59 million year old drug

Good news:

Press Release
2011-10-03
The Nobel Assembly at Karolinska Institutet has today decided that
The Nobel Prize in Physiology or Medicine 2011
shall be divided, with one half jointly to
Bruce A. Beutler and Jules A. Hoffmann
for their discoveries concerning the activation of innate immunity
and the other half to
Ralph M. Steinman
for his discovery of the dendritic cell and its role in adaptive immunity

 

(Unfortunately, Steinman died between the committee’s decision and the announcement. He still received the Prize, though.)

However, it is not news (and not good)  that we are losing the arms race against bacteria. We are overusing antibiotics in medicine and in agriculture, virtually nurturing today’s and tomorrow’s killers. A report  in Wired earlier this year paints a bleak picture:

Truly new antibiotics are critically needed because bacteria, having no experience of them, cannot immediately mount resistance to them — something that does happen with me-too compounds featuring some slight molecular change. But they’re rare. As this chart from the research group Extending the Cure shows, antibiotic development has slowed dramatically over the past 30 years, and among the few drugs being brought forth, most share the mechanisms of already-existing classes.

To understand the extent of the crisis, we have to remember that antibiotics are the foundation of a huge number of  medical procedures, from cancer treatment to dentistry. Taking away this foundation would cripple modern medicine. Together with vaccines and public hygiene, antibiotics are the reason that many of us live longer — and better — than our grandparents’ generation and before that.

So lacking drug company motivation (more on that in McKenna’s report) and facing a dwindling supply of effective antibiotics, what are we to do?

A good start is to take a second look at nature. For example, here is a Kangaroo being born:

Awww…

The interesting thing about Kangaroos and other marsupials is that their young are being exposed at a very precarious stage to the outside world. Roo is born in a fetal state: underdeveloped, blind, barely moving and has little to no body temperature regulation. All of these problems are taken care of by Kanga’s protecting Roo’s six minute trip to her pouch where things are warm and cozy, and maternal milk flows in abundance. All except one: pathogens. During his trip, Roo can acquire a whole bunch of nasty bugs from Kanga’s fur and the air. Also, the pouch is not exactly a sterile environment and there is a possibility of infection there. So how is little Roo to survive the trip to the pouch and subsequent stay?

Joey in pouch. Photograph by Geoff Shaw (Zoology, University of Melbourne, Australia). From Wikimedia Commons.

The answer is innate immunity, that collection of mechanisms which protect the host in a non-specific manner. While his adaptive immunity is quite undeveloped, Roo does produce some killer all-purpose peptides he can use against microbes. The same class of peptides are produced in Kanga’s milk. Collectively they are known as cathelicidins. Only about 30 amino acids long, these highly charged molecules kill both gram-positive and gram-negative bacteria.

 

Since the tammar wallaby genome has recently been sequenced,  a group of Australian scientists has decided to hunt for cathelicidins in the tammar’s genome. They also looked for cathelicidins in the genomes of the duck-billed platypus (a monotreme) in the opossum (an American marsupial), in human, mouse, cow & sheep (all placentals).  Here is what they found. Each leaf on the tree represents a cathelicidin gene. The leaf colors and shapes are for different species of origin.

From Wang J,  et al. 2011 PLoS ONE 6(8): e24030. doi:10.1371/journal.pone.0024030

From Wang J, et al. 2011 PLoS ONE 6(8): e24030. doi:10.1371/journal.pone.0024030 Reproduced under CC license.

The marsupials and monotremes have a much higher diversity of cathelicidin genes than the placental mammals. Note that there are many duplicates of the cathelicidin gene in Pig and Cow, but these duplications are very recent as we can tell by the highly similar sequences of the genes. These duplications are probably because herd animals are more susceptible to pathogens.  In any case,  pig and cow have higher copy number of cathelicidin genes, and thus perhaps a large number of proteins, but not many different ones as in marsupials and monotremes. Diverse cathelicidin genes in the genome can offer a wider protective umbrella against many different types of pathogens.

Back to the antibiotic crisis: can we use marsupial cathelicidins as an antibiotic in humans? Do cathelicidins kill human pathogens? Are they toxic to humans? The authors checked the first two. They used cathelicidins from wallaby and platypus to kill human pathogens: P. aeruginosa, K. pneumoniae and A. baumanii, including antibiotic resistant strains. Cathelicidins were much more effective than, well, antibiotics against those bacteria. Also, cathelicidins did not kill human red blood cells, which makes them a potential drug. Of course,  immune reaction against cathelicidins as a foreign  still needs to be checked, among many, many other things, but the whole idea of looking at marsupials is that, as mamals, they may be able to supply us with clues on how to synthesize a cathelicidin to be used as a drug in humans.
ResearchBlogging.org
The really cool thing about this study is that the authors used phylogenetics to design an ancestral wallaby cathelicidin. They aligned all the wallaby cathelicidin protein sequences, which were conserved in 40 of the 46 amino-acid positions. They used PAML, GASP and Ancescon to reconstruct an ancestral wallaby sequence, estimated to be 59 million years old, marked by an asterisk (*) in the above gene tree.  They then synthesized the ancestral WAM (Wallaby Anti Microbial) peptide and used it to kill bacteria. Guess what: the ancestral WAM was even better than the existing WAMs, as even a lower concentration was needed to kill bacteria. They are still working on red blood cell toxicity (yay for PLoS-enabled comments!)

So maybe the answer to  the increase in bacterial resistance to antibiotics lies in Kanga’s pouch. After all, she was very protective of Roo.

EDIT: clarified that the part about synthesizing and testing ancestral-WAM to kill bacteria.

 


Wang, J., Wong, E., Whitley, J., Li, J., Stringer, J., Short, K., Renfree, M., Belov, K., & Cocks, B. (2011). Ancient Antimicrobial Peptides Kill Antibiotic-Resistant Pathogens: Australian Mammals Provide New Options PLoS ONE, 6 (8) DOI: 10.1371/journal.pone.0024030

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

The power of science blogging

 

Hats off to Jonathan Eisen for hosting this activity on his blog. (I’ll keep mine on, thank you. It’s raining cats and dogs here right now).

A couple of weeks ago I posted a discussion about two papers that challenged the ortholog conjecture. Briefly, both papers stated that orthologs may not be such great predictors for molecular function. One study from  Indiana University by  has shown that paralogs may be better predictors than orthologs for molecular function. Or, at the very least, paralogs should not be excluded as predictors. This paper has generated quite a bit of interest and controversy. Consequently, Eisen has invited Matthew Hahn, the lead author to write about “the story behind the story” in Eisen’s well-read blog. The post is a great read, and has generated an animated discussion in the comments area. You do need to clear quite a bit of time to go through both Hahn’s guest post and the comment thread: the topic is a rather complex one, and as explained in the comments thread, one problem is that the ‘ortholog conjecture’ itself seems to be not well-defined.

I kept checking in to Eisen’s blog to read the elongating comment thread. It seems that now a special session on the topic may be in the works for the 2012 annual meeting of the Society for Molecular Biology and Evolution following this discussion. So great to see such an involved community getting together.

 

 

 

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Music Monday: Koop

I found this rather addictive online game, Red. Of course, I only play it when I am, er, compiling. Um, yeah. That’s the ticket.

Compiling

Source: xkcd.com

The thing is, the game also has a great soundtrack, which introduced me to a band called Koop. Their music alternates between eerie ambient and big-band-ish. They may have a big-band sound, but Koop is just two guys, Magnus Zingmark and Oscar Simonsson. Their sound is completely sampled, from thousands of different sources. The result is, well, judge for yourself:

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Microbial marketing

An original viral (or rather, fungal and bacterial) marketing campaign for the movie Contagion. Although the film tells the story of a fictional viral outbreak, the marketers of Warner Brothers Canada kept it in the realm of microbiology by teaming up with 25 microbiologists and creating what is probably the first agar-plated billboard, which they placed in a storefront in Toronto. The bacteria and mold were plated and grew in the letters, but it looks like contaminants formed their own random colonies, purposefully creating a rather eerie effect. Thanks to Zack Moss, a student in my lab, for pointing me to this. More details in the Vancouver Sun.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks