2009 Nobel Prize in Physiology or Medicine

And the winners are…

Title: 2009 Nobel Prize in Physiology or Medicine
Description: Elizabeth H. Blackburn, Carol W. Greider and Jack W. Szostak for the discovery of 'how chromosomes are protected by telomeres and the enzyme telomerase'
Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Weekly poll: which category would you add to the Nobel prize?

Yup, it’s those two weeks again, when that prize is being announced.  Sadly, BsB probably will not get it this year. Might have something to so with there being no category for blogging.

Credit: wikimedia commons

Credit: wikimedia commons

Prestige and controversy go hand in hand, mix in science and you have a concoction more explosive than the one Mr. Nobel himself invented. Who won, who didn’t win  and which achievement was never recognized.  This week’s BsB’s poll asks: which category would you add to the Nobel prize? Feel free to mark “other” and add your own in the comments.

Credit: wikimedia commons

Credit: wikimedia commons

By the way, Alfred Nobel’s wife did not sleep with a mathematician. That is not the reason why there is no Nobel prize in math. Get real.

Also, it might be a good idea to remember the spirit in which Nobel wanted the prize to be awarded:

The capital shall be invested by my executors in safe securities and shall constitute a fund, the interest on which shall be annually distributed in the form of prizes to those who, during the preceding year, shall have conferred the greatest benefit on mankind. The said interest shall be divided into five equal parts, which shall be apportioned as follows: one part to the person who shall have made the most important discovery or invention within the field of physics; one part to the person who shall have made the most important chemical discovery or improvement; one part to the person who shall have made the most important discovery within the domain of physiology or medicine; one part to the person who shall have produced in the field of literature the most outstanding work of an idealistic tendency; and one part to the person who shall have done the most or the best work for fraternity among nations, for the abolition or reduction of standing armies and for the holding and promotion of peace congresses.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

2009 Ig Nobel Prizes

A bit late in the day, but here are the Ig Nobel prize winners for 2009. Cut and pasted from Wikipedia. The prizes were awarded Thursday, Oct. 1, 2009 at
Sanders Theater, Harvard University, Cambridge, Massachusetts

At the 2009 ceremony, Public Health Prize winner Dr. Elena Bodnar demonstrates her invention — a brassiere that, in an emergency, can be quickly converted into a pair of face masks, one for the brassiere wearer and one to be given to some needy bystander. She is assisted by Nobel laureates Wolfgang Ketterle (left), Orhan Pamuk, and Paul Krugman (right). PHOTO: Alexey Eliseev. Source: improbable.com

At the 2009 ceremony, Public Health Prize winner Dr. Elena Bodnar demonstrates her invention — a brassiere that, in an emergency, can be quickly converted into a pair of face masks, one for the brassiere wearer and one to be given to some needy bystander. She is assisted by Nobel laureates Wolfgang Ketterle (left), Orhan Pamuk, and Paul Krugman (right). PHOTO: Alexey Eliseev. Source: improbable.com

2009

  • Veterinary medicine: Catherine Douglas and Peter Rowlinson of Newcastle University, UK, for showing that cows with names give more milk than cows that are nameless.
  • Peace: Stephan Bolliger, Steffen Ross, Lars Oesterhelweg, Michael Thali and Beat Kneubuehl of the University of Bern, Switzerland, for determining whether it is better to be smashed over the head with a full bottle of beer or with an empty bottle.
  • Biology: Fumiaki Taguchi, Song Guofu and Zhang Guanglei of Kitasato University Graduate School of Medical Sciences in Sagamihara, Japan, for demonstrating that kitchen refuse can be reduced more than 90% in mass by using bacteria extracted from the feces of giant pandas.
  • Medicine: Donald L. Unger of Thousand Oaks, California, US, for investigating a possible cause of arthritis of the fingers, by diligently cracking the knuckles of his left hand but not his right hand every day for more than 60 years.
  • Economics: The directors, executives, and auditors of four Icelandic banks — Kaupthing Bank, Landsbanki, Glitnir Bank, and Central Bank of Iceland — for demonstrating that tiny banks can be rapidly transformed into huge banks, and vice versa (and for demonstrating that similar things can be done to an entire national economy).
  • Physics: Katherine K. Whitcome of the University of Cincinnati, Daniel E Lieberman of Harvard University and Liza J. Shapiro of the University of Texas, all in the US, for analytically determining why pregnant women do not tip over.
  • Chemistry: Javier Morales, Miguel Apatiga and Victor M. Castano of Universidad Nacional Autonoma in Mexico, for creating diamond film from tequila.
  • Literature: Ireland’s police service for writing and presenting more than 50 traffic tickets to the most frequent driving offender in the country – Prawo Jazdy – whose name in Polish means “Driving Licence”.
  • Public Health: Elena N. Bodnar, Raphael C. Lee, and Sandra Marijan of Chicago, US, for inventing a bra that can be quickly converted into a pair of gas masks – one for the wearer and one to be given to a needy bystander.
  • Mathematics: Gideon Gono, governor of Zimbabwe’s Reserve Bank, for giving people a simple, everyday way to cope with a wide range of numbers by having his bank print notes with denominations ranging from one cent to one hundred trillion dollars.
Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Richard Dawkins and Francis Collins on Colbert Nation

Stephen Colbert had an interesting lineup for the past two nights: Richard Dawkins on Sep 30, and Francis Collins last night. Enjoy the vids:

The Colbert Report Mon – Thurs 11:30pm / 10:30c
Richard Dawkins
www.colbertnation.com
Colbert Report Full Episodes Political Humor Michael Moore
The Colbert Report Mon – Thurs 11:30pm / 10:30c
Francis Collins
www.colbertnation.com
Colbert Report Full Episodes Political Humor Michael Moore
Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

It ain’t necessarily so

ResearchBlogging.org

First, a short glossary.

Homologous genes are descended from a common ancestral gene.

There are two types of homology:

  • Orthology is homology due to a speciation event. So if there is a gene A’ in humans and A” in mice, and they are obviously similar in sequence, we infer that they homologous. We usually also infer that they are orthologous, as the common gene ancestor A existed in the common ancestor of humans and mice, some 600 million years ago. Once the ancestral lines diverged, the genes carried over into the respective progeny.
  • Paralogy is homology due to a duplication event. A gene has been duplicated in a species genome, and the genome now has two copies of this gene in place of one.
Orthology, Paralogy and Function

It has been proposed that paralogous genes would generally have different functions. The rationale being that in-species duplication, two copies of the same gene are redundant. One copy maintains its function, while the other is “free to explore” other functions. The flipside of this hypothesis is that  orthologs maintain functional similarity, because the progeny species inheriting the orthologous genes need to maintain their function.

orthologs-paralogs

Formation of orthologs and paralogs. The evolutionary tree shows six homologous genes from three species designated A, B and C. Genes are represented by circles and each color represents a different species; genes with paralogs are circled by a thicker line (only the gene in the A lineage does not have a paralog). Boxes at nodes represent duplication events. Duplication 1 produced paralogs α and β in the ancestor of B and C, whereas duplication 2 produced paralogs β1 and β2 in the C lineage. All genes from B and C are co-orthologs to the gene from A. Genes α and β are in-paralogs relative to speciation 1, but are out-paralogs relative to speciation 2. Genes β1 and β2 are in-paralogs relative to both speciations in the tree. Genes Bα and Cα are one-to-one orthologs. From doi:10.1016/j.tig.2009.03.004

Functional innovation through duplication has been hailed as a major driving force in evolution.  After all, it is hard to accept the Darwinian tenet that random changes — even if directionally selected — can constantly produce innovative complexities.  A duplicate gene provides an already existing complexity. Imagine many such duplications, and you can see how duplicate genes provide an genomic “functional opportunity bank” for the biosphere.

Only, maybe not. Romain A. Studer and Marc Robinson-Rechavi challenge common wisdom by publishing a study that says: “it ain’t necessarily so”. They look at three alternative models of molecular function evolution: (i) subfunctionalization after duplication; (ii) neofunctionalization after duplication; and (iii) the ‘alternative model’ of equal change after duplication or speciation. Subfunctionalization holds that after duplication, each of the two copies of the gene performs only a subset of the functions of the ancestral single copy. Neofunctionalization holds that one of the two genes possesses a new, selectively beneficial function that was absent in the population before the duplication. The ‘alternative model’ states that the gain of new function is not preferential to paralogs and that orthologs may gain new functions at the same rate that paralogs do.

Studer and Robinson-Rechavi claim that few studies have been made to study the scope of any of these proposed models. They then lay out study designs for doing so, challenging other evolutionary biologists (and themselves?) to conduct these studies and examine whether the common wisdom that orthologs maintain function while paralogs gain function. What I like about this paper is that it not only makes a strong case for challenging conventional wisdom, it also lays out a series of possible routes of study to be taken up by others.

Update: MK pointed out an obvious lacuna in this post:


Bronski Beat – It Ain't Necessarily So
Uploaded by jpdc11. – Explore more music videos.

Studer, R., & Robinson-Rechavi, M. (2009). How confident can we be that orthologs are similar, but paralogs differ? Trends in Genetics, 25 (5), 210-216 DOI: 10.1016/j.tig.2009.03.004

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Weekly Poll: will you have your own genome sequenced?

CLARIFICATION: the events described here have not happened. Yet.

We are a few years into the future. Whole human genomes can be sequenced relatively cheaply and accurately. Direct to Consumer Genomics companies offer true genomic analyses now, not just marker analyses. They BLAST* your sequence against known genotype & disease databases, looking for known genotypic associations.  Furthermore, individuals who are “bioinformatics savvy” can analyze their own genome. We hear of the first life-saving BLAST: a person found an association between one of his SNPs and pancreatic cancer, and managed to undergo a life-saving operation in time. We also hear, tragically, of the first BLAST related murder: a molecular biologist  killed her infant child and herself after she discovered on her own she and her son are both destined to have Huntington’s chorea.  Another, similar suicide took place, but in that second case the person misdiagnosed himself. In a few US states as well as in Italy, the police have successfully subpoenaed  DNA sequences from DTC genomics companies. In Singapore, a mandatory database of the genome of all citizens has been announced.

Credit: Adrian Cousins, Wellcome Images

Credit: Adrian Cousins, Wellcome Images

Worldwide, calls for legislation abound that would limit individuals’ access to their own genomic data. At the same time, a loose coalition of political activists, scientists and journalists advocate a “Genomic Freedom Movement”  to legislate a governmental and insurance company “hands off” policy. Finally, insurance companies (not just health), financial companies and employers are all interested in the new field of “genomic personality studies”, or “Tarot card genomics” as those studies are called by their opponents.  With the advent of many complete human genomes, there has been an explosion of studies that tie personality traits, life-expectancy, lifestyle, earning power, accident prone-ness and even sexual prowess to genomic  data. These studies, some of questionable quality, are gaining strong public attention. Cosmopolitan has just published “Is He Right for You?: how to Get his Genome and What you can Learn From It”. A whole industry of “compatibility genomics” for couples to be married is flourishing.  The Leubavitcher Hassidim are maintaining a “shidduch” genomic database for eligible singles.

The future of genomic data, who can access it and for what reasons seems murky at best. Under those conditions will you have your own genome sequenced? Note that there is no company that will give up that data (you can have your DNA sequence file, but they wish to keep it too, although they promise complete anonymity and privacy).

So will you have your genome sequenced?

——

(*) BLAST is used, as a generic name for any sequence based database searching software. We may have something else that rules the roost 5 years from now.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

An Ontology for Biological Similarities

I griped here twice about the abuse of the term homology in biology. And to quote the Bellman in The Hunting of the Snark:  “What I tell you three times is  true”.

But while I gripe, someone is actually doing something about the whole terminology muddle. Specifically, Marc Robinson-Rechavi and his group in The University of Lausanne have created an ontology for describing the “relation between biological objects which resemble or are related to each other sufficiently to warrant a comparison“.

An ontology is a formal representation of concepts and the relationships between them.  It is usually hierarchical, with the terms going from the general to the specific. You may be familiar with the Gene Ontology as  standard representation of the different function of genes.

cytokinesisDAGrels

Example of the Biological Process ontology in the Gene Ontology

Marc’s group is creating an ontology for describing biological similarities in a hierarchical fashion, going from the general to the specific. At the top they have “similarity”. The four terms under that are “homology”, “homoplasy”, “functional equivalence” and “homocracy”.

Homocracy is a term suggested in 2003 by Claus Nielsen and Pedro Martinez for describing organs/structures which are organised through the expression of identical patterning genes. The rationale being that many homologous organs may be homocratic, but some homocratic organs may not be homologous.  Homoplasy means similarity due to convergent evolution, but not due to common ancestry. Fins on a tuna and a dolphin are homoplasic, but not homologous. However, the  fore fins on a dolphin are homologous to our arms, being descended from the forelimbs of the common ancestor of humans and dolphins.

The deepest annotated branch is homology, and going into the whole thing here would be long and arduous. But it is very well-crafted ontology. You can play around with the HOM ontology to see more of the terms, and also see their annotations at the OBO foundry.

Top terms ot the HOM ontology. You can explore more on http://keg.cs.uvic.ca/ncbo/flexviz/FlexoViz.html#

Top terms of the HOM ontology. You can explore more on http://keg.cs.uvic.ca/ncbo/flexviz/FlexoViz.html#

Now, if someone could sort the terminology muddle between the different dialects of the English language…

Peter (watching Cricket on British TV): What the hell is he talking about?
Englishman: Oh, it’s Cricket. Marvelous game, really. You see, the bowler hurls the ball toward the batter who tries to play away a fine leg. He endeavors to score by dashing between the creases, provided the wicket keeper hasn’t whipped his bails off, of course.
Peter: Anybody get that?
Cleveland: The only British idiom I know is that “fag” means “cigarette.”
Peter: Well, someone tell this “cigarette” to shut up.

family-guy-peter-griffin8

Source TV Guide courtesy Fox

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

New: weekly poll

I will try to maintain a weekly poll on BsB, for matters biologick, bioinformatick, generally scientifick or otherick. As in any poll, if read too much into its questions or answers, you should seriously chill. That being said, comments are most welcome.

The poll is on the sidebar that’a’way.—> (Scroll a bit down if you cannot see). Inspired mostly by Anna Tramontano’s book: The Ten Most Wanted Solutions in Protein Bioinformatics.

Happy polling. And Happy (Hebrew) New Year.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

What they really found in Niger

ResearchBlogging.org

A big buzz over the discovery of a skeleton of an early Sauropod dinosaur in Niger. The finding looks amazing even to my paleontologically-ignorant eyes. It is beautifully intact and well-ordered, as opposed to the mixed jumble of bone fragments that are usually found. It has that lovely aesthetic quality that would cause anyone to go “wow” at such a sight.

sauropod-skeleton

Spinophorosaurus nigerensis, holotype skeleton GCP-CV-4229 in situ during excavation in the region of Aderbissinat, Thirozerine Dept., Agadez Region, Republic of Niger.doi:10.1371/journal.pone.0006924.g001

Like I said, I am no paleontologist, but I understand that there is some controversy as to what was really unearthed in Aderbissinat.  The authors’ reconstruction is below. Click on it to see the alternative one:

sauropod-cartoon

Skeletal reconstruction of Spinophorosaurus nigerensis. Dimensions are based on GCP-CV-4229/NMB-1699-R, elements that are not represented are shaded. Scale bar = 1 m. doi:10.1371/journal.pone.0006924.g005


Remes, K., Ortega, F., Fierro, I., Joger, U., Kosma, R., Marín Ferrer, J., , ., , ., Ide, O., & Maga, A. (2009). A New Basal Sauropod Dinosaur from the Middle Jurassic of Niger and the Early Evolution of Sauropoda PLoS ONE, 4 (9) DOI: 10.1371/journal.pone.0006924

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

“Micro homology”. Wut?

I ranted in a previous post about the use of homology as a quantitative term, rather than a qualitative term. Ben Blackburne commented on that post introducing me to “micro homology”, a term I did not know existed. I ignored its existence, until I heard it spoken yesterday at a talk, which sort of rubbed me the wrong way. Going back to my office to chill, I discovered there are 152 papers indexed in PubMed that use that term in their abstract or title. Not a good way to chill… here we go again: misusing “homology” by overselling it. Apparently microhomology is used to indicate an identity of a short nucleotide sequences in two non-complementary DNA strands. This identity may facilitate strand annealing constructions of chromosomal breakpoints such as the proposed Microhomology-Mediated Break-Induced Replication or microhomology-mediated end joining for DNA repair. There should  be a term for this phenomenon, but why use “microhomology“? The use of “homology” implies that the short identical sequences originated from a common ancestor. “Micro” would mean short region from otherwise homologous sequences. This is possibly derived from “homologous recombination“, where, indeed, homologous sequences are involved.  But in the microhomology case, it may not be so. Also, even if the identity is between short subsequences of otherwise homologous sequences, “microhomology” is somewhat of a confusing term, as it implies a quantitative relationship.  Why not simply use “microidentity” as a drop-in replacement? (Heh: non-homologous replacement).

Of course nothing will change, since I am too late in the game, no one listens to me anyway and I do not see the six readers of this blog rallying to eradicate microhomology.

No I am not bitter. Mild and bitter perhaps, but only after 5 o’clock.

lolwut

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

The new natural history

Before the 20th century biology was, to a large extent, “Natural History”. It was an observational rather than the experimental science it is considered to be today. At that time, the typical biologist, a natural historian, was going about the (European colonized) world, collecting specimens of new and fossilized species, classifying and recording them for posterity. Armed with small pick-axes and hammers for extracting fossils, ether-laced jars for insect specimens, formaldehyde for preserving tissues, butterfly nets, drawing pads in lieu of cameras, guns and various other paraphernalia. In some cases he (no “she” in science at the time) went to field carrying his own hypothesis about nature and her laws and looking to prove it. But in mostly it was the gathering knowledge for knowledge’s sake. Reported in natural societies’ meetings and compiled in tomes of encyclopedias. Every marine exploration voyage carried its own naturalists, as well as many merchant and naval ships. From the plethora of data, the patterns and laws describing life emerged. The most famous of the theories developed by a naturalist was evolution by natural selection. Darwin went through years of observation and classification while developing his landmark theory.

Tables of natural history, from the 1728 Cyclopaedia. Credit: wikimedia commons.

Tables of natural history, from the 1728 Cyclopaedia. Credit: wikimedia commons.

Towards the end of the 19th century, biology began to be studied on a molecular level. Chemists have become interested in the underlying molecular machinery of life, and the field of biochemistry was born. Armed with a keen experimental philosophy of science, and the tools to execute it, they took biology into the lab. Decomposing the basic processes of life, and the molecules that they facilitate and act upon. Formulating hypotheses, testing them with controls, executing rigorous protocols to discover natural processes of life. Biology has also become intertwined with medicine through microbiology and physiology. Again, experimentation came to the forefront, as medicine adopted scientific means to study diseases and develop treatments. Within the realm of larger scales of life science, ecology has in many ways inherited natural history, transformed into a science of understanding the interactions among living things.

And then came molecular biology. With the discovery of the structure of DNA and the genetic code, many of the mechanism by which life is encoded and perpetuates its information have been deciphered. We consequently learned how to manipulate genetic material at the level of a single nucleotide or amino acids, studying the function of genes by studying artificial mutants; expressing genes of one organism in another. Molecular biology has enabled to study life at a basic level, creating carefully controlled environments not dreamed of by the natural historians whose scientific mandate was to observe and record. Biology has reached an experimental apex. We also learned how to easily and cheaply read, or sequence, genes.

Today we are sequencing whole genomes on a regular basis. The genome of the first free living organism, H. influenzae was sequenced in 1995. Human in 2001. Rat, mouse, nematode, fruit fly, water cress and other model organisms had their genomic sequences read, with the hope of understanding what they do. The plethora of information brought about the field of bioinformatics: computational biology applied to analyzing informational biomolecules, such as nucleic acids and proteins. Sequencing technologies became exponentially cheaper over the past 50 years. Genomes and metagenomes are now sequenced as a routine matter. A typical prokaryotic microbe can be sequenced for under $1,000. Whole microbial communities can now be “fed into” sequencers, to extract the genomic information not of one, but of thousands of microbes in one sequencing run. A century after the dusk of natural history, biologists are again going outside to collect knowledge for knowledge’s sake. This time the natural historian is armed with microbial and tissue collection kits and freezers in the field, and sequencing machines and cluster computers in the lab. She (yay progress!) is going to alkaline lakes, mine shaft drains, termite guts and cow rumen to sample the microbial life. Indeed,the soil and water of the Galapagos, whose flora and fauna inspired Darwin have recently been revisited by another ship. This time, the microbes, not the Finches, were studied. Back in the lab, the specimens are not mounted, Rather, their DNA, RNA and proteins are sequenced. Terabytes of information are stored, while methods better storage, retrieval and analysis of the growing volume of data are developed. Universities and research institutes have sequencing centers, and a whole industry of sequencing service centers is flourishing and growing.

Illumina sequencing machine. Credit: joncallas on Flickr

Illumina sequencing machine. Credit: joncallas on Flickr

Biology is descriptive yet again. Biologists are recording data for posterity. They are cataloging. Natural history is back. The fraction of papers reporting new genomes is constantly growing. Indeed, many new genomes are not even reported in papers, but rather deposited directly into genomic databases. Encyclopedias are back in fashion. Only this time around, they are not large volumes of books taking up shelves in the library of the scholar’s home or university office. Rather, they are web-sites, holding more raw data than whole university libraries, unconstrained by the cost of paper and printing labor, and usually free for anyone to explore: the genome databases. Genomic encyclopedias exist for all large genomes, and for the smaller ones that are found interesting. The other genomes that are generated on a daily basis are incorporated into the larger all-encompassing such as GenBank which contains virtually all the genetic and genomic data that has been made public.

There are differences between the natural historians of yore (is 100 years “yore”?) and the sequencing natural historians of today. Chiefly, the nature of the data has changed: from a purely informatic point of view, we are not dealing with hundreds or even thousands of new animal and plant species. We are dealing with billions of bytes of character data associated with thousands of animals and plant species, and an uncounted number of microbial species. The information is extracted from this data by complex and constantly evolving computational means. The following questions are typically asked: what is the actual genomic sequence? (Sequence assembly). Where are the genes? (Gene finding). What is there function? (Functional annotation). How are they activated? (Computational systems biology on the one hand, and structural biology on the other). What is the genotypic variation in the community and how does this variation contribute to evolution and adaptation of a population? (Metagenomic / metadata analysis). Each one of these questions has spawned a discipline that is concerned with optimizing its ability to extract the relevant information (e.g. gene location, gene function) from the raw genomic data.

The TCA cycle from the Kyoto Encyclopedia of Genes and Genomes. Click for the full encyclopedia entry.

The TCA cycle from the Kyoto Encyclopedia of Genes and Genomes. Click for the full encyclopedia entry.

Welcome back, natural history! Observing and describing the richness and diversity of life has always been part of biology. With genomics and metagenomics, descriptive biology, after being somewhat back stage for so long, is back in the limelight.

Henry Walter Bates from Naturalist on the River Amazon. Credit: wikimedia

Henry Walter Bates from Naturalist on the River Amazon. Credit: wikimedia commons

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

The Craigslist of Antibiotic Resistance

ResearchBlogging.org

(Before we get going: this the the 100th post on Byte Size Biology. Happy Birthday to me!)

Resistance to antibiotics is a huge clinical problem. In the US, more people die of  methicillin-resistant Staphylococcus aureus (MRSA) infections (nearly 19,000 in 2006) than of AIDS (14,627).  We know that antibiotic resistance is carried on mobile genetic elements between bacterial species in lateral gene transfer events.  In fact, most of the MRSA’s resistance genes are traceable to other species. It is as if MRSA made some great choices when purchasing antibiotic resistance in the bacterial community market, and now it is one tough bug that is heavily armored and very hard to mess with.

How common then are antibiotic resistance genes? Very. But a recent article in Science from George Church’s group in Harvard Medical School shows us just how common and diverse those genes are.  They analyzed 572 bacterial strains from the stool and saliva of two healthy persons who have not taken antibiotics for at least one year. They also analyzed metagenomic sequences, obtained directly from the samples without culturing them first. Many of the benign bacteria had antibiotic resistance genes similar to those found in pathogenic (disease causing) bacteria.  But the real kicker is that most of the antibiotic genes isolated — in the metagenomic samples — were evolutionarily distant from the currently known antibiotic resistance genes. The researchers checked the products of those distantly related genes for functionality, and found that despite a rather low similarity to known resistance gene products (50-60% identity in the amino acid sequence, and sometimes as low as 35%) the genes conferred antibiotic resistance when expressed in E. coli: a normally non-resistant bacterial strain. So the diversity of evolutionary related resistance genes is much larger than we thought. It is like being a child who grew up in a cookie-cutter suburb, who upon his first visit to the city finds out that there are many other types of houses and buildings people live and work in.

Church & co. have discovered three interesting things. First, that there is a broad evolutionary spectrum of antibiotic resistance genes. Until now, we have only known of a very biased sample of those genes, namely those that were sequenced in known human pathogens. But there is a huge evolutionary reservoir out there. This reservoir is sitting in normally benign gut bacteria,  and even this work has only scratched the surface of how extensive this reservoir is.

Second, these new resistance genes may not transfer well to human pathogens. They conclude this because since those genes were not found in human pathogens. However, absence of proof is not proof of absence: the new, low similarity genes may still exist in serovars (specific bacterial strains) that were never sequenced, we do not know for sure they are absent in pathogens. But having not found them before in pathogens, and having found them now in non-pathogens, strongly suggests there is some sort of barrier that limits transfer between certain species, and indeed between non-pathogens to pathogens. If indeed there is such a barrier, we still don’t know what it is.

Third, among the new resistance genes, some had very low sequence similarity (as low as 35% protein sequence identity). Usually at such low identity percentages, the functions of the proteins are different. But in this case, they found a high conservation of function. This is interesting, since it shows that the resistance proteins evolve widely, yet their functionality is robustly conserved.

Methicillin Resistant Staph aureous. Credit: Janice Haney Carr, Centers for Disease Control and Prevention. http://commons.wikimedia.org/wiki/File:CDC-10046-MRSA.jpg

Methicillin Resistant Staph aureous. Credit: Janice Haney Carr, Centers for Disease Control and Prevention. http://commons.wikimedia.org/wiki/File:CDC-10046-MRSA.jpg

This is not the first time this group has made a surprising discovery about antibiotic resistance. Last year they have shown that many soil bacteria are not only resistant to antibiotics, but actually eat them for lunch: antibiotics as a food source.

In their most recent article, Sommer, Dantas and Church have hit upon the craigslist for antibiotic resistance used by bacteria living in the human body.  The basic genomic material needed for antibiotic resistance is  readily available at your local bacterial community. Resistance genes are everywhere, and it is clear that, despite transmission barriers between species, they are transmitted. Bacteria have a large pool form which to draw new resistance genes. The arms race between drug developers and bacteria just got this much tougher.

One small gripe: this article uses “low homology” and “high homology”. Arrrgghh….

Finally, speaking of craigslist


Sommer, M., Dantas, G., & Church, G. (2009). Functional Characterization of the Antibiotic Resistance Reservoir in the Human Microflora Science, 325 (5944), 1128-1131 DOI: 10.1126/science.1176950

Dantas, G., Sommer, M., Oluwasegun, R., & Church, G. (2008). Bacteria Subsisting on Antibiotics Science, 320 (5872), 100-103 DOI: 10.1126/science.1155157

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

A FLORA of Protein Structure to Protein Function

ResearchBlogging.org

Proteins are the machinery of life, and they facilitate most of life’s functions. Traffic into and out of the cell? Protein pumps, pores and channels. Respiration? Proteins. Metabolism and catabolism? Proteins. Immune system, signaling, development…  all complex networks of interacting proteins. Understanding a protein’s  structure can tell us a lot about how it performs its function. If we know what a protein does, we can look at it’s molecular workings, and generally figure out how it does it. Hemoglobin carries oxygen in most animals, something that has been known since 1840. However, it is only when Max Perutz and John Kendrew solved the structure, that the actual mechanism of oxygen binding and release has been elucidated. Since Perutz’s and Kendrew’s discovery in 1949, the structures of some 35,000 proteins have been solved.

Animation showing binding and release of oxygen molecules to hemoglobin

Animation showing binding and release of oxygen molecules to hemoglobin

When we know the protein’s structure we know a lot about how it performs its function.That would be the equivalent of looking at a  diagram of a car engine, and then exclaiming: “oh, so that’s how it works!” But the converse does not hold true. If we have the structure, we may not be able to infer the protein’s function. Imagine having the diagram of a new engine which you have never seen before. It might be a car engine, but which make and model? Or it might not be a car engine at all, but that of a lawnmower, or a boat, an electric generator. The point is, without knowing what the diagram represents, we would only have a general idea that we have a machine that burns some sort of fuel to power something.

We face the same problem with protein structures. It does happen that we solve the structure of a protein, whose function is unknown. Oh. Kay. What now? We are stuck with a diagram for a machine which we do not know what it does.  Therefore, any kind of method we can devise to predict a protein’s function from its structure would be very helpful. Christine Orengo’s group at University College London, UK has been tackling this problem for quite a while. Her group has recently published a paper in PLoS Computational Biology where they describe an algorithm that can classify engines enzymes: a subgroup of proteins that catalyze chemical reactions. The classification algorithm works as follows:

1) They partitioned all enzymes of known function into functional subgroups, or FSGs. Within an FSG, all proteins have the same function. Two proteins from different FSGs will have different functions.

2) Next, they selected a set of conserved vectors from a given domain in a given FSG which, when compared against relatives of different functions/FSGs, would produce a low score. Conversely, when proteins from the same FSG are compared, they should have a significantly higher score.The vectors are measurements of distance and direction along the side chains of conserved amino acid residues. They found that this differentiating set of vectors is best obtained when the proteins are aligned within and between FSGs, and the vectors are taken from the conserved residues in the FSG alignments.

Graphical outline of FLORAMake algorithm. doi:10.1371/journal.pcbi.1000485.g002

Graphical outline of FLORAMake algorithm. Click to enlarge. doi:10.1371/journal.pcbi.1000485.g002

3) Once they determined which vectors are more conserved within a given functional sub-group (FSG), they created a library of conserved vectors within FSG, a sort of an FSG bar-code. Although the constriction is technically unsupervised, limiting the vectors to conserved residues within an FSG naturally lands them with lots of active site residues.

Having created the template library, they can now find vectors on test proteins, and scan those against the library of conserved vectors, using a simple similarity function. Although (or because) their method is quite simple, they receive very high sensitivity and precision. The methods they compare against are all global structure aligners (such as CE and CATHEDRAL), and by virtue of simply adding spatial information of the conserved / functional residues they greatly improve the function annotation. The great thing about this work is the jump in improvement by adding this very simple, yet so far mostly neglected, attribute.

Unfortunately, no software yet. Too bad because…..

funny-pictures-relevant-to-my-interests


Redfern, O., Dessailly, B., Dallman, T., Sillitoe, I., & Orengo, C. (2009). FLORA: A Novel Method to Predict Protein Function from Structure in Diverse Superfamilies PLoS Computational Biology, 5 (8) DOI: 10.1371/journal.pcbi.1000485

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Public service announcement: how to make hummus

Soulico Crew teach us the magical art of making a perfect hummus. Apparently it involves a lot of dancing.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Short bioinformatics hacks pt. 3: more FASTA counting

A few one-liners to kick off the workweek:

To order a set of fasta files by the number of sequences each one contains. If anyone knows how to put a tab as the output delimiter, please let us know:

grep -c ">" fasta-files/*.fna | cut --fields=1,2 -d ":" --output-delimiter="  " | sort -k 2 -nr | less

The same, but for the total length of sequences. My gawk-fu is somewhat stronger than my shell-fu, so this is mostly in gawk. But you can cut and paste this line directly into your shell line, or shell script file, and it should work. If anybody knows how to do this as a pure shell script, please comment.

[sourcecode language=”plain” gutter=”false”]
gawk ‘ />/ {next} {a[FILENAME]+=length($0)} END {for ( i in a ) print i "\t" a[i]}’ fasta-files/*.fna | sort -k 2 -nr | less
[/sourcecode]
This is a bit weird, but I needed it, and some of you doing multiple metagenomic analyses might too. This gives the total sequence length, and the average sequence length for each file, as a gawk one-liner:

gawk '/>/ {next} {a[FILENAME]+=length($0); b[FILENAME]++} END {for ( i in a ) print i "\t" a[i] "\t" a[i]/b[i]}' fasta-files/*.fna

Comments, gripes and improvements are welcome.

Broken shell with a bug. Credit: geograph.co.uk

Broken shell with a bug. Credit: geograph.co.uk

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks