Archive

Archive for the ‘Biochemistry’ Category

“Codon” is now a four letter word

February 17th, 2010 5 comments

ResearchBlogging.org

As part of the process of manufacturing  a new car,  the designers will take the blueprints to the factory floor. There they will set up an experimental assembly line, tinkering with the manufacturing process of the prototype until it is ready for mass-production. Can we do the same with the machinery of life – the assembly of proteins? Can we set up an alternative assembly line for a new protein prototype — and then actually set up a working assembly line for the whole new protein?  A proof-of-concept has been published this week in Nature by Jason Chin’s group at the Medical Research Council Laboratory of Molecular Biology, Cambridge UK.

If there is a single common denominator to all life, it is the genetic code. All life is built around DNA encoding information for proteins  nucleotide triplets or codons. Since there are four types of nucleotides (A,T,G,C)  that are read in words of thee, there are 43 = 64 possible codons: more than enough to encode for the 22 amino acids that make up proteins. There is nothing more basic and fundamental to life on Earth than the three-letter based genetic code.

Until now.

Chin’s group has created a four-nucleotide codon system.  It is not that the DNA is different: it is the way the cellular machinery decoding  RNA transcripts interprets the nucleotide sequence. Ribosomes –large RNA and protein complexes  which are the platform upon which messenger RNA is read and decoded — are set to serve up messenger RNA three nucleotides at a time. (Messenger RNA or mRNA is a transcript of the DNA which is carried to the ribosome.)  Transfer RNA or tRNA is a short RNA molecule that shuttles the proper amino acid to the ribosome, but will only attach if the proper codon is served up by the ribosome. The whole protein synthesis “assembly line” looks something like this:

Protein synthesis. Credit: Wikimedia Commons.

To change the interpretation of the genetic code from three lettered words  to four, Chin and his colleagues had to make new ribosomes, and new tRNAs.  To create these new ribosomes, they designed orthogonal ribosomes, or o-ribosomes. O-ribosomes are genes inserted to produce extra ribosomes that operate in the cell alongside the regular ribosomes. The cell functions because it has the regular ribosomes to maintain its viability. The ribosomal RNA in the o-ribosomes is free to be mutated to create new unnatural traits: in this case, the ability to serve as a platform read four-letter codons. They selected for Escherichia coli bacterial cells that expressed a o-ribosomes which translated a four-letter codon in a gene, which would otherwise go untranslated by the regular ribosome. The gene gives the bacterial cells resistance to the antibiotic chloramphenicol. So cells that survive a dosage of chloramphenicol are those which have functioning o-ribosomes, as they have the chloramphenicol resistance gene that is being translated by the o-ribosomes.

They also needed to create new tRNAs that have an four-nucleotide anticodon (the part that complementarily binds to the messenger RNA –  see figure above.)  So the surviving E. coli cells have a population of working o-ribosomes, regular ribosomes, modified tRNA (with a  four-letter anticodon) and regular tRNA.

Then they took their work a step further. Each three-letter tRNA carries a specific amino-acid, depending on its anticodon. Thus tRNAAAG will always have a phenylalanine attached, because CTT (the complement of AAG on the messenger RNA) codes for phenylalanine. If you start messing with that, the translation machinery will produce non-functional proteins, which will probably kill the cells pretty quick. But with the orthogonal 4-letter code machinery, that is not really a problem: the orthogonal machinery operates alongside the normal one. Also, there are no amino acids naturally assigned to any four letter code, because this code does not appear in nature in the first place! So Chin’s lab assigned an unnatural amino acids to a four-letter code. The non-naturally occurring p-azido-l-phenylalanine amino acid was assigned to tRNAUCCU. They then showed that the whole alternative translational machinery worked by synthesizing a mutant of the protein calmodulin which used this amino-acid in its structure.

Why do it? Well, personally I don’t see the need for justification: just being able to do it is so cool!  But seriously: think of the ability to design proteins from up to 44=256 different amino acids other than the 22 we have now.  The possibilities of tinkering with existing proteins using this orthogonal, four-letter based machinery are huge. The other benefit of this orthogonal synthesis setup is the ability to control this orthogonal translational machinery: because it does not use the three-letter vocabulary, this orthogonal machinery would be much easier to manipulate, tinker with and switch on and off without getting in the way of regular cellular translational machinery. The analogy to a car assembly line breaks here. It is as if two different models are being assembled on the same line just by using different robots. The better analogy is for a program source code to be read by two different compilers, each producing a different program. Awesome.


Neumann, H., Wang, K., Davis, L., Garcia-Alai, M., & Chin, J. (2010). Encoding multiple unnatural amino acids via evolution of a quadruplet-decoding ribosome Nature DOI: 10.1038/nature08817

The polypharmacome

January 23rd, 2010 1 comment

ResearchBlogging.org
This post was chosen as an Editor's Selection for ResearchBlogging.org
Pharmaceutical companies are always on the lookout for secondary drug targets. After all, if you invest billions developing a single drug, you would be more than happy to sell it as a treatment for two, three, or more different ailments. Sildenafil citrate was developed to treat angina and hypertension, but during phase I clinical trials, it was found that Sildenafil induces penile erections. The drug was branded Viagra, and the rest is history. Eflornithine, an anti-cancer drug, is also effective against the agent of African sleeping sickness, Trypanosoma brucei. African Sleeping Sickness is known as a “neglected disease”, for which drug development is not profitable and therefore not a priority. However, having a drug already on hand makes it easier to distribute in affected areas, since the R&D costs are recovered elsewhere.

Another example of polypharmacology is a drug that binds to multiple targets in the human body. This could be used for overcoming drug resistance, a known problem with cancer. Cancer tumors often develop a resistance to anti-cancer drugs by simple natural selection: the protein that the drug binds to mutates, and no longer binds the drug. However, if the drug acts by binding redundantly to several proteins, it would be more effective, since several mutations would be required to effect drug resistance.

Another important polypharmacological consideration  is toxicity. If a drug binds to one protein, its drug target, it may also bind to another one which it should not bind to as it disrupts the normal functions and the health of the patient. If the side effects outweigh the cure, the drug is no good.

How one drug (cyan) can bind to two different proteins with different overall shapes (pink and green), but with similar binding sites

Because it can either increase  profits, or conversely derail a whole process of drug development, predicting polypharmacophoric effects is very much something drug developers want. A study published yesterday in PLoS Computational Biology by Jacob Durrant and his colleagues suggest a way  bioinformatics and theoretical biophysics can help in identifying multiple drug targets. Durrant’s goal was simple: given the molecular structure of a candidate drug, which proteins are expected to bind it? The strategy this group took is a combination of bioinformatic and experimental screening.

Finding additional drug targets in four  steps

Reproduced under CC license from doi:10.1371/journal.pcbi.1000648.g001. Click for original image.

Step 1 (A-C in the figure above): identify the known target protein. Now pick all the protein structures that are not similar to it. Why those that are not similar? Similar proteins could be potential drug targets, since they have a similar shape to the known target protein. But here they are interested in finding targets from proteins that are of a different shape, and have no homology to the known target protein: secondary targets.

Step 2 (D): take this set of dissimilar proteins, and look for binding site similarities. Binding sites are clefts in the protein that may bind drugs. If those clefts are similar in shape to the cleft in the known target proteins, they may bind the drug. Leave only those non-homologous proteins that have similar binding sites (D in the figure)

Step 3 (E): Now add all the proteins that are homologous to the set generated in step 2. This increases the number of possible targets to homologs of the proteins that were initially selected only for binding site similarity.

Step 4 (F): take the drug molecule, and try to dock it to the various protein structures. Rank the druggability of each protein according to the score provided by the drug docking software (Autodock).

Experimental verification

Now for the cool part. Durrant and colleagues tested  the computational prediction in the lab, using the compound NSC-45208. NSC-45208 inhibits a protein responsible for RNA processing in Trypanosoma brucei. So we know it is a potential drug against African sleeping sickness. What else is it good for?

(NSC-45208), 4,5-dihydroxy-3-(1-naphthyldiazenyl)-2,7 -naphthalenedisulfonic acid, a recently discovered inhibitor of T. brucei RNA editing ligase 1 (TbREL1)

“The predicted secondary targets that gave the best docking scores, H. sapiens mitochondrial 2-enoyl thioester reductase (HsETR1), T. brucei UDP-galactose 4′ epimerase (TbGalE), H. sapiens phosphodiesterase 9A (HsPDE9A2), and Streptococcus pneumoniae teichoic acid phosphorylcholine esterase (SpPce), were subsequently tested experimentally.”

Durrant and his colleagues tested their predictions that NSC-45208 also binds to two human proteins (HsETR1 and HsPDE9A2), one additional Trypansome protein (TbGalE), and a bacterial (Streptococcus pneumoniae) protein (SpPce). Not only binds, but also inhibits their enzymatic activity. Their predictions worked well on the top two predicted targets: NSC-45308 inhibited the enzymatic activity of HsETR1and TbGalE, but HsPDE9A2 and SpPce were not affected by the drug.

At first blush, this does not seem to be much of a batting average: two positives and two false positives. But we have to remember that the experimental verification of these predictions — even if you have a good enzymatic assay to check predictions– can be very time- and resource consuming. An exhaustive test of the predictions for several compounds and many target enzymes is still not possible. As initial proof-of-principle, this work goes much farther than most other joint experimental / computational works I have read. The authors also go into lengthy and detailed discussions on limitations and improvements, as well as on another form of non-specific inhibition of the secondary targets, makes for a really interesting read on polypharmacological considerations in drug screening.


Durrant, J., Amaro, R., Xie, L., Urbaniak, M., Ferguson, M., Haapalainen, A., Chen, Z., Di Guilmi, A., Wunder, F., Bourne, P., & McCammon, J. (2010). A Multidimensional Strategy to Detect Polypharmacological Targets in the Absence of Structural and Sequence Homology PLoS Computational Biology, 6 (1) DOI: 10.1371/journal.pcbi.1000648

Fold.it: wasting time in a good cause

January 16th, 2010 3 comments

I just spent the better part of a Saturday playing with Foldit. Foldit is an ongoing experiment in finding protein structures by harnessing the power of the mob – or gamers, as is the case here. The player is presented with a backbone & sidechain configuration, with the secondary structures mostly pre-determined. The problem is to get the protein to fold into the correct conformation. You can tweak the secondary structures, rubberband the beta strands together into sheets and rotate the sidechains. The residues are colored by hydrophobicity, so you know who should be facing where. The 23 tutorial cases walk you through the  simple yet powerful interface to folding the structures. You can rotate helices, rubberband strands, flip sheets, etc. The interface gives you feedback on sidechain clashes and voids in the structure among other things.  The examples also teach you the basic of protein structural considerations: maximize hydrogen bonds, hydrophobic side chains should be buried in the structure, strands should combine to sheets, and so on. When you feel you are ready, you can start solving the various folding puzzles presented online. You can work solo, or as part of a team. The “correct solution” is, of course, unknown. The best you can do is accumulate as many points as you can, which represent how stable is the conformation you are building. Foldit, like many other cool things in structural biology, is the product of David Baker’s lab at the University of Washington. Here is a video from the YouTube UWFoldit channel showing the coolness of it all. If you decide to get your fold on, make sure you can make the time. It’s flippin’ addictive.

Structuregate?

December 10th, 2009 5 comments

The University of Alabama at Birmingham issued a statement last week asking that 11 structures be removed from the Protein Data Bank, as they are quite possibly fabricated. Wow. Very little detail was given by UAB’s statement (below), or by the media. Apparently all the structures are tied to one person, HMK Murthy, who could not be reached or traced, as reported by the Birmingham News.

The structures’ PDB codes are:

1CMW, 1DF9/2QID, 1G40, 1G44, 1L6L, 2OU1, 1RID, 1Y8E, 2A01, and 2HR0 Some of them are still in the databank.

The University of Alabama at Birmingham has requested that the Research Collaboratory for Structural Bioinformatics Protein Data Bank remove certain protein structure files deposited by a former UAB employee. UAB also has identified nine publications related to the same protein structures that should be retracted from various scientific journals, and is making those journals aware of this matter.

Allegations of data fabrication and/or falsification were made concerning certain protein structures published by the former UAB employee. In accordance with UAB’s scientific integrity policy, and that of the Office of Research Integrity of the U.S. Department of Health & Human Services, UAB empanelled a committee of experts with no conflicting interests to investigate these allegations. After a thorough examination of the available data, which included a re-analysis of each structure alleged to have been fabricated, the committee found a preponderance of evidence that structures 1BEF, 1CMW, 1DF9/2QID, 1G40, 1G44, 1L6L, 2OU1, 1RID, 1Y8E, 2A01, and 2HR0 were more likely than not falsified and/or fabricated and recommended that they be removed from the public record.

“Scientific misconduct is absolutely unacceptable,” said UAB Scientific Integrity Officer Richard B. Marchase, Ph.D., vice president for Research and Economic Development. “It was important that the files be removed from the database and the articles be retracted to ensure that future research in the areas of macromolecular structure analysis and the function of proteins could continue uncompromised by faulty data.”

Some of these structures date back to 2002; this has been going on for quite a while then.  Apparently the investigation ended May 2009, but UAB only  issued a statrement today. The associated papers are also being retracted.  If anyone has more information on this strange affair, please share here.

Categories: Science, Structural biology Tags:

The Warren L. DeLano Memorial Award for Computational Biosciences

November 15th, 2009 No comments

Warren DeLano passed away suddenly and at a young age at his home Nov 3, 2009. He was the author of PyMol, a very popular molecular visualization program, and a strong advocate of open source software. The family of Warren Lyford DeLano has created a “In Memorium” page and blog. Also, a memorial award is being set up in his name, as per this email circulated on various mailing lists.

Dear friends and colleagues:

It’s now been over a week since Warren has passed away.  We are trying to
move toward a permanent way to honor Warren’s memory and what
he stood for: Open Source Computational Biosciences and molecular
visualization. To do this, Jim Wells and I put together a mission statement
with the approval of Warren’s family:
The Warren L. DeLano Memorial Award for Computational Biosciences

This award shall be given to a top computational bioscientist in
recognition of the contributions made by Warren L. DeLano to creating powerful
visualization tools for three dimensional structures and making them freely accessible.
The award, accompanying lecture, and honorium will be given annually in the context of a
national bioscience meeting or a Bay Area gathering of
computational bioscientists at Stanford, UCSF or UC Berkeley. For the award special emphasis
will be given for Open Source developments and service to the bioscience community.
The award selection committee, consisting of experts in the computational and
biological sciences, will accept nominations from anyone.
To make something like this happen in perpetuity would take about ~100K for
the endowment.

For donations, Warren’s family has set up a tax deductible fund:

Silicon Valley Community Foundation
memo:  Warren L. DeLano Memorial Fund
2440 West El Camino Real, Suite 300
Mountain View, CA 94040
tel: 650.450.5400

We hope that you’ll consider making a contribution (not matter
how small) in Warren’s honor.  Also, please forward this message
to anybody who might be able be willing to contribute.

Best regards,
Axel

Axel T. Brunger
Investigator,  Howard Hughes Medical Institute
Professor of Molecular and Cellular Physiology
Stanford University

Warren DeLano

November 5th, 2009 No comments

For those who are not in the structural biology community: Warren DeLano wrote and maintained PyMol, the software of choice for molecular visualization. Practically anyone who published anything requiring a biomolecular image used PyMol. It is a great piece of software, powerful and extensible. Warren was strongly committed to writing quality product that served the community well. He was also strongly committed to maintain an open source licence for PyMol. This must be one of the saddest emails I have ever received:

Dear CCP4 Community:
I write today with very sad news about Dr. Warren Lyford DeLano.
I was informed by his family today that Warren suddenly passed
away at home on Tuesday morning, November 3rd.
While at Yale, Warren made countless contributions to the computational tools
and methods developed in my laboratory (the X-PLOR and CNS programs),
including the direct rotation function, the first prediction of
helical coiled coil
structures, the scripting and parsing tools that made CNS a universal
computational
crystallography program.
He then joined Dr. Jim Wells laboratory at USCF and Genentech where he pursued
a Ph.D. in biophysics, discovering some of the principles that govern
protein-protein interactions.
Warren then made a fundamental contribution to biological sciences by
creating the
Open Source molecular graphics program PyMOL that is widely used throughout
the world. Nearly all publications that display macromolecular
structures use PyMOL.
Warren was a strong advocate of freely available software and the Open Source
movement.
Warren’s family is planning to announce a memorial service, but
arrangements have
not yet been made. I will send more information as I receive it.
Please join me in extending our condolences to Warren’s family.
Sincerely yours,
Axel Brunger

Axel T. Brunger
Investigator,  Howard Hughes Medical Institute
Professor of Molecular and Cellular Physiology
Stanford University

Weekly poll: Replicators First vs. Metabolism First

October 11th, 2009 2 comments

ResearchBlogging.org

I am preparing a class on the origins of life for next week. The textbook I am using does not  go into the Replicators First vs. Metabolism First argument, but I probably will, if I have time. Below, a quick refresher for those who know of the competing theories, and an unsatisfying introduction for those who don’t. In the end, you will ask to weigh the evidence and vote. Remember: your vote is important. I had a lousy week and seeing some numbers on the sidebar would be a nice ego-boost. Yes,  that lousy.

From Jarown's lab, NC State University http://www.mbio.ncsu.edu/JWB/soup.html

From James W. Brown's lab, NC State University http://www.mbio.ncsu.edu/JWB/soup.html

Replicators First

Aka RNA World: RNA emerges as the first molecule that can replicate and perform enzymatic processes. It stores information and it is biochemically active. Thus it can both replicate and control a primitive meabolism. Later came the transition to DNA as an information storage, and the enzymatic role was mostly relegated to proteins.The first replicators might not even have been RNA molecules, but some pre-RNA nucleic acid such as PNA or TNA.

This theory is supported by the present-day existence of ribozymes, RNA enzymes. Especially the ribozymic activity in the ribosome, the platform of protein translation. RNA can also catalyze its own replication, up to a certain length (189 bases was the longest self-replicating RNA synthesized in a lab).  Finally, RNA can also catalyze the formation of peptide bonds between amino-acids, setting the stage for the transition to an RNA+protein world. At some point, these reactions were cellularized by liposomes or other protobionts (pre-cellular structures with a protein, fatty or water boundary).

The arguments against the RNA World / Replicators First hypothesis are that RNA is labile, especially in water. Hence, an RNA world may not have been sustainable to become complex enough to recruit protein and bootstrap itself to the next level. Also, RNA is too complex to have been any kind of first player, and there were probably many chemical selective events prior to the appearance of RNA, as argued by the Metabolism First proponents.

Metabolism First

Metabolism First holds that metabolic processes assembled prior to the existence of replicators. Günter Wächtershäuser proposed that the pioneer organism originated in high (>100C) temperatures in hydrothermal vents.  This organism resembled the catalytic converter in a car, more than a primitive cell: it had a composite structure of a mineral base with catalytic transition metal centers, such as iron-sulfide and nickel-sulfide. Dissolved volcanic gases would flow over this natural catalytic converter, yielding more complex compounds. Some of those more complex compounds would stick around, and incrementally form more complex molecules, eventually capable of catalysis. Once strong experimental evidence in favor of Metabolism First is the ability to recreate most of the Citric Acid cycle — both universal and essential in all life — without enzymes, and in high temperature and pressure conditions, such as those existing in underwater volcanic vents, favored for being the crucible of life.
Information bearing molecules like nucleic acids, came last, rather than first. Metabolism First explains the chemical evolution of catalytic versatility before the appearance of complex polymers. Also, the argument made by Metabolism First proponents is that  RNA  itself is a precondition, but a molecule too complex to have arisen by initial chemical selection. Metabolism First offers the necessary chemical scaffolding enabling replicators to appear on the stage.

RNA First vs. Metabolism First

Replicators (genetics) First vs. Metabolism First. Barbara Aulicino and Morgan Ryan

There is a lot more to the two hypotheses, of course. Including experimental evidence supporting both. Here are two reviews. Read them, and don’t forget to cast your vote here → →


In support of Metabolism First:

Trefil, J., Morowitz, H., & Smith, E. (2009). The Origin of Life American Scientist, 97 (3) DOI: 10.1511/2009.78.206

In support of the RNA World (Replicators first):

Müller, U. (2006). Re-creating an RNA world Cellular and Molecular Life Sciences, 63 (11), 1278-1293 DOI: 10.1007/s00018-006-6047-1

Finally: a Nobel prize for the ribosome structure

October 7th, 2009 1 comment

This has been a topic of discussion since I was in grad school: when will the Nobel prize for the structure of the ribosome be finally awarded? Well, it finally has. Ada Yonath, Thomas Steitz and Venkatraman Ramakrishnan received the Nobel for work that has spanned three decades and an equal number of continents.

 

First, a victory dance:

 



 

Next, the scientific background:

 

 

And part of Ada Yonath’s model in this clip:

 



A FLORA of Protein Structure to Protein Function

September 3rd, 2009 2 comments

ResearchBlogging.org

Proteins are the machinery of life, and they facilitate most of life’s functions. Traffic into and out of the cell? Protein pumps, pores and channels. Respiration? Proteins. Metabolism and catabolism? Proteins. Immune system, signaling, development…  all complex networks of interacting proteins. Understanding a protein’s  structure can tell us a lot about how it performs its function. If we know what a protein does, we can look at it’s molecular workings, and generally figure out how it does it. Hemoglobin carries oxygen in most animals, something that has been known since 1840. However, it is only when Max Perutz and John Kendrew solved the structure, that the actual mechanism of oxygen binding and release has been elucidated. Since Perutz’s and Kendrew’s discovery in 1949, the structures of some 35,000 proteins have been solved.

Animation showing binding and release of oxygen molecules to hemoglobin

Animation showing binding and release of oxygen molecules to hemoglobin

When we know the protein’s structure we know a lot about how it performs its function.That would be the equivalent of looking at a  diagram of a car engine, and then exclaiming: “oh, so that’s how it works!” But the converse does not hold true. If we have the structure, we may not be able to infer the protein’s function. Imagine having the diagram of a new engine which you have never seen before. It might be a car engine, but which make and model? Or it might not be a car engine at all, but that of a lawnmower, or a boat, an electric generator. The point is, without knowing what the diagram represents, we would only have a general idea that we have a machine that burns some sort of fuel to power something.

We face the same problem with protein structures. It does happen that we solve the structure of a protein, whose function is unknown. Oh. Kay. What now? We are stuck with a diagram for a machine which we do not know what it does.  Therefore, any kind of method we can devise to predict a protein’s function from its structure would be very helpful. Christine Orengo’s group at University College London, UK has been tackling this problem for quite a while. Her group has recently published a paper in PLoS Computational Biology where they describe an algorithm that can classify engines enzymes: a subgroup of proteins that catalyze chemical reactions. The classification algorithm works as follows:

1) They partitioned all enzymes of known function into functional subgroups, or FSGs. Within an FSG, all proteins have the same function. Two proteins from different FSGs will have different functions.

2) Next, they selected a set of conserved vectors from a given domain in a given FSG which, when compared against relatives of different functions/FSGs, would produce a low score. Conversely, when proteins from the same FSG are compared, they should have a significantly higher score.The vectors are measurements of distance and direction along the side chains of conserved amino acid residues. They found that this differentiating set of vectors is best obtained when the proteins are aligned within and between FSGs, and the vectors are taken from the conserved residues in the FSG alignments.

Graphical outline of FLORAMake algorithm. doi:10.1371/journal.pcbi.1000485.g002

Graphical outline of FLORAMake algorithm. Click to enlarge. doi:10.1371/journal.pcbi.1000485.g002

3) Once they determined which vectors are more conserved within a given functional sub-group (FSG), they created a library of conserved vectors within FSG, a sort of an FSG bar-code. Although the constriction is technically unsupervised, limiting the vectors to conserved residues within an FSG naturally lands them with lots of active site residues.

Having created the template library, they can now find vectors on test proteins, and scan those against the library of conserved vectors, using a simple similarity function. Although (or because) their method is quite simple, they receive very high sensitivity and precision. The methods they compare against are all global structure aligners (such as CE and CATHEDRAL), and by virtue of simply adding spatial information of the conserved / functional residues they greatly improve the function annotation. The great thing about this work is the jump in improvement by adding this very simple, yet so far mostly neglected, attribute.

Unfortunately, no software yet. Too bad because…..

funny-pictures-relevant-to-my-interests


Redfern, O., Dessailly, B., Dallman, T., Sillitoe, I., & Orengo, C. (2009). FLORA: A Novel Method to Predict Protein Function from Structure in Diverse Superfamilies PLoS Computational Biology, 5 (8) DOI: 10.1371/journal.pcbi.1000485

Distant homology and being a little pregnant

July 15th, 2009 13 comments

ResearchBlogging.org

(Thanks to F.B.  for the inspiration).

Sigh… people don’t seem to learn. It’s been almost 22 years (yikes!) since a distinguished group of scientists published a letter in Cell calling for a responsible use of the word “homology”. If you were born when that letter was published, then in the US you can already drink legally. And you may very well want to, by the time you finish reading this post.

As of today there are one hundred and sixty seven articles listed  in PubMed with the phrases “distant homology” or “remote homology” in either the title or the abstract.

Please: make it stop.

Humpty1

Homology is a qualitative term.  It means having a common evolutionary origin. Two genes / proteins / organs are either homologous, or they are not. They cannot be “somewhat homologous” or “partially homologous” or (a favorite among molecular and structural biologists) “distantly / remotely homologous”.

Homology is inferred from similarity.  Similarity is quantitative. If organs are sufficiently similar, like mammalian forelimbs, then they are considered to be homologous. They maybe more similar (like the hands of humans and chimpanzees), or less similar (like human hand and a bat wing). Nevertheless, once they pass a certain similarity threshold, homology is inferred. The same applies to sequences of proteins and nucleic acids.  Similarity can be measured. Different degrees of similarities can be compared and scaled.

homology-limbs

If two protein sequences are aligned, and 40% of the amino acids in the alignment are identical, then the two sequences have a 40% identity. The do not have a 40% homology. They are  homologous, and the homology is inferred from the similarity.  We observe that the two sequences are similar, and then we conclude that they are homologous. We use the sequence similarity, as measured by percent identity, to trace a line of common descent for those proteins we deem homologous.

(As an aside I should say that the percentage of sequence identity, or %ID is not a very good measure for inferring homology, nor is it for measuring similarity. It is an easy one to use: but it is very coarse and prone to errors. There are many better measures out there, including statistical ones like e-values, p-values or information theoretic ones like bit scores. But I digress, and this is a matter for another post.)

But once we confuse observations with conclusions, things quickly become an impossible muddle.

Am I not not just picking nits here? I mean, surely when the term “distant homology” comes up in a paper or in conversation, we all know the meaning. Distant homology means having a common evolutionary origin,  but with a common ancestor that was around a long time ago. “Distant homology” is intuitive, brief yet understandable. it is less cumbersome than: “homologous, with a distant common ancestor, as concluded form a low yet statistically significant similarity” which is what we really should say if we properly separate observations from conclusions, as captain nitpick would have us do.

Allow me to answer with two examples.  First, I have read several papers discussing “structural homology”  in the context of protein structure. Those papers that discuss structural homology were actually using a verbal shortcut for  a homology inferred from structural similarity. That is, they inferred common descent from protein structural similarity. This kind of inference is highly contentious, and while not necessarily wrong, must be done with great care and proper caveats. However, once the researchers rolled up observations with conclusions by using the “structural homology” verbal shortcut, they absolved themselves from convincing the reader that structural similarity is indeed a good measure of homology, and jumped directly to the conclusion that there is indeed an homology here. The framework for inferring homology from sequence similarity is well worked out, but not so for structure, yet.   Therefore, even if we do use the verbal shortcut “distant homology”, we can only use it by virtue of having a certain measure of similarity well-established already, as in sequence based similarity. If it is not well established, and in using structural similarities, we fail to go through the proper scientific channels that consist of providing convincing observations prior to providing conclusions.

Second: even worse is the use of the term “functional homology”. This is a clear case of the word homology used as a drop-in synonym for similarity. The misnomer “functional homology”  is typically used in studies where proteins that are clearly not homologous perform similar functions. Why infer evolutionary descent when clearly that was not intended in the first place? Well, once you start confusing similarity with homology, observations with conclusions, and make them synonymous, this is what happens.

So don’t even start this confusion.  Separate observations from conclusions, and make the former support the latter. Homology is qualitative, similarity is quantitative.  Genes cannot be distantly homologous any more than a woman can be a little pregnant.

Now you can have that drink. Unless you are a little pregnant.


Gerald R. Reeck, Christoph de Haëna, David C. Teller, Russell F. Doolittle, Walter M. Fitch, Richard E. Dickerson, Pierre Chambon, Andrew D. McLachlan, Emanuel Margoliash, Thomas H. Jukes and Emile Zuckerkandl (1987). “Homology” in proteins and nucleic acids: A terminology muddle and a way out of it Cell, 50 (5) DOI: 10.1016/0092-8674(87)90322-9

The workings of a cellular water pore, and something about obesity

July 13th, 2009 No comments

ResearchBlogging.org

Maintaining a water balance is essential to life.  Cells must regulate their water content carefully and within a very narrow margin. Too much water intake, and the cell bursts like a water balloon; too much water outflow, and it shrivels like a raisin.

The cell itself is contained in a waterproof membrane. But there are gateways in that membrane, to import solutes and food, extract waste and also maintain a water and electrolyte balance. One way to maintain a water balance is the aquaporin: a protein complex running through the membrane that lets water flow into the cell, in a controlled fashion Aquaporins are essentially very narrow tubes, the width of the tube is the width of a single water molecule.

How is this water pore controlled?

Two groups at the University of Gothenburg in Sweden and at the Max Planck institute in Goettingen, Germany have solved the structure of yeast aquaporin 1 Aqy1 at a resolution of 1.15 Å (1.15×10-10m),  enabling them to resolve water molecules and positions of atoms in the protein at a very rare clarity. They reported their findings in June’s PLoS Biology.

Gerhard Fischer, Urszula Kosinska-Eriksson and their colleagues discovered that on  the side of the aquaporin that is in the cell there is a rather elaborate gating mechanism to control water influx. Just how elaborate is not exactly clear. The end of the protein chain  forms a complex termed “helical bundle” with a Tyrosine residue in the channel, which blocks the water flow.

A: General view of the view of the yeast aquaporin, wiht the single-molecule-wide water channel in the middle; B: a close up showing the constriction near the end of the channel, and the gating mechanism. doi:10.1371/journal.pbio.1000130.g002

A: General view of the view of the yeast aquaporin, with the single-molecule-wide water channel in the middle; B: a close up showing the constriction near the end of the channel, and the gating mechanism. Click to enlarge. doi:10.1371/journal.pbio.1000130.g002

Aqy1 is actually a tetramer — four protein pore molecules arranged together in the membrane, all facing the same direction. So it’s more like a sheaf of four tubes.

A brief animation from the Protein Data Bank showing the four subunits together from different angles. (You need Java to see this, takes a bit of time to download and start).

Fischer and Kosinska-Eriksson also suggest the possibility of other control mechanisms. First, by phosphorylation: binding organic phosphate molecules to proteins is a common control mechanism in the cell, and a way to pass on signals. they found that when a certain well-located amino acid that can be phosphorylated is replaced by one that cannot be phosphorylated, the ability ot regulate water flow is hampered. Another suggestion they have is mechanosensitivity, or membrane movement,such as occurs during environmental stress. It is known that Aqy1 plays a role in cold shock: when the yeast is exposed to freezing temperature. There is definitely much more to learn about how aquaporins are controlled, triggered, and blocked.

Here is an animation of another aquaporin,  Escherichia coli’s aquaglyceroporin GlpF, showing water molecules (red & white spheres) moving through the channel.  It was created by the Theoretical and Computational Biophysics Group at the University of Illinois Urbana-Champaign. Red is negative charge, blue is positive. Look for the yellow sphere about 1/3 form the top… (outside the cell) it eventually reaches the bottom (inside the cell), but that takes a while, due to Brownian motion. Electrostatic charges along the pore eventually force an overall one-way traffic into the cell.

The animation was created using VMD (they say that at the end credit, but that is a very short frame, and hard to catch).

You can read more about aquaporins on the aquaporin page of the Theoretical and Computational Biophysics Group. The page has great graphics, animations, and a high resolution version of the movie, and much more information about different types of aquaporins, and how they control the cell’s water content.

Aquaporins have also been implicated in obesity.  Mice that had the gene for aquaporin-7 knocked out have started to accumulate fat at the age of 12 weeks, and become heavier than their counterparts: a mouse form of adult onset obesity. The reason is that aquaporin-7 is expressed in fat cells and this particular aquaporin also transfers glycerol. Glycerol is a small three-carbon carbohydrate synthesized in fat cells from glucose as a building block for to fat molecules. Aquaporin-7 acts as a glycerol pressure valve, letting some of it back into the bloodstream. If the aquaporin-7 is missing, glycerol accumulates in the fat cells, forcing the cell to synthesize more fat form this basic fat building block. The fat cells become larger, and the animal bearing them -  fatter. This has been known for four years now, but I haven’t seen any recent progress on this aspect of obesity, at least not through an (admittedly cursory) scan of PubMed.

Fat cell metabolism. Glucose may end up as glycerol. If aquaporin-7 is not expressed, excess glycerol get stored as fat. FFA: free fatty acids. After Gema Frühbeck  Nature 438, 436-437 (24 November 2005)

Fat cell metabolism. Glucose may end up as glycerol. If aquaporin-7 is not expressed, excess glycerol get stored as fat. FFA: free fatty acids. After Gema Frühbeck Nature 438, 436-437 (24 November 2005)


Fischer, G., Kosinska-Eriksson, U., Aponte-Santamaría, C., Palmgren, M., Geijer, C., Hedfalk, K., Hohmann, S., de Groot, B., Neutze, R., & Lindkvist-Petersson, K. (2009). Crystal Structure of a Yeast Aquaporin at 1.15 Å Reveals a Novel Gating Mechanism PLoS Biology, 7 (6) DOI: 10.1371/journal.pbio.1000130

Tajkhorshid, E. (2002). Control of the Selectivity of the Aquaporin Water Channel Family by Global Orientational Tuning Science, 296 (5567), 525-530 DOI: 10.1126/science.1067778

Frühbeck, G. (2005). Obesity: Aquaporin enters the picture Nature, 438 (7067), 436-437 DOI: 10.1038/438436b

Da Vinci, F0-F1 ATPase: a copyright-driven Update

June 7th, 2009 1 comment

Harvard University has removed from YouTube the video I embedded  in my Leonardo Da Vinci and the F0-F1 ATPase post, due to copyright concerns. It is a pity. I believe the main sufferer from this step is the lab that actually created this video, and now has one outlet less to publicize its work. One would think that after a projected loss of 30% of their endowment, Harvard would come up with more creative ideas for freely publicizing their researchers’ fine work, not less. (Yeah, I know no one reads my blog, but everyone goes to YouTube, including people who don’t normally read Nature).

Whatever. I hope that the IP admins at the  MRC in Cambridge (UK) have a more advanced view on these matters than their concurrents in Cambridge (US), and will keep the following videos up. Here are two F0-F1 ATPase videos from Dr. John E. Walker’s lab. Incidentally, John E. Walker received the 1997 Nobel prize for physiology or medicine for his work on the ATPase enzymatic mechanism. You may find some of these movies on his web page.

The first is a general overview of the F0-F1 in action:

The second shows views from above and then below the F1 domain around the rotating gamma subunit (that’s the blue eccentric stator in the middle):


The third is a group of what appear to be Japanese grad students /postdocs demonstrating the ATPase dance. I have no idea where this came from. I give them a “C-” in dancing, but an “A” in structural biology (to get an A+ they should have tossed tennis balls to represent synthesized ATP):

Leonardo Da Vinci and the F0-F1 ATPase

May 27th, 2009 1 comment

ResearchBlogging.org Offspring #2 (O2) and I  spent last weekend visiting the Da Vinci Experience exhibit at San Diego’s Air & Space Museum. The exhibit is engineer’s heaven: large wood models based on and inspired by LDV’s drawings. Gears, crankshafts, pulleys. O2 was interested in the military stuff:  catapults, the tank , a mobile bridge. I did not know LDV designed a mobile bridge:

Artoo and bridge

Ohtoo and the mobile bridge, built according to LDV specs

LDV Ball Bearings. Credit: <a href=

There were a lot of gear exhibits, which I did not photograph unfortunately.  Efficient and accurate transfer of motion was a big thing with LDV.  Another thing I did not know: he probably invented the ball bearing.

Gears, by LDV

Gears, by LDV

So when I saw Nature’s special section on membrane protein biophysics , something clicked.   We normally associate the rotary mechanism and gear transfer with human engineering, rather than with nature. Maybe it is human arrogance having invented the wheel and later the gear transfer as a form of locomotion. No other creature uses wheels for locomotion (but see below), hence we are smarter than nature since we came up with an engineering solution She did not.

Only that pride is misplaced. Two of the most common protein complexes in nature rely on a rotary motor, gears, torque, kinetically efficient transfer of motion, and all that jazz that powers our vehicles. One is the bacterial flagella: technically bacteria do use wheels for locomotion then (although the transfer of motion they use it more like a ship’s corkscrew, another LDV first).  The other is the F0-F1 ATPase.  It is a mushroom shaped complex motor embedded in our mitochondrial membranes and is powered by the electric potential across the membrane to generate Adenosine Tri Phosphate. ATP is the universal coin of energy in the living cell, and the F0-F1 ATPase generates 32 out of every 36 ATP molecules which cells need to exist. It is a heavily researched complex and intricate piece of machinery, central to almost all life. The F0-F1 ATPase’s gear-transfer mechanism is described in the article as “… composed of two rotary motors/generators that are mechanically coupled by a central rotor and an eccentric stator“. (Junge et al Nature 459, 364-370). And here is a movie explaining F0-F1 mechanics in detail. LDV would have liked this, I’m sure. He might have even broken into a Mona Lisa smile, or gone into a Mona Lisa Overdrive.
Update: Harvard University removed this video from Youtube. As to the whys and wherefores, see here.

Junge, W., Sielaff, H., & Engelbrecht, S. (2009). Torque generation and elastic power transmission in the rotary FOF1-ATPase Nature, 459 (7245), 364-370 DOI: 10.1038/nature08145

Light for Cellular Communication?

May 22nd, 2009 4 comments

ResearchBlogging.org

Don't you know
We're all light
Yeah, I read that someplace
 --XTC

This is interesting: an article in PLoS ONE that claims that Paramecia can communicate using light. The author, Daniel Fels from the Swiss Tropical Institute in Basel, separated two Paramecia populations using quartz or glass vials, grew them in the dark, and checked whether the separated, but close populations, affect each others growth and feeding rates. The correlation he found was strong. Large populations in one vial affected the growth rate of small populations in the adjacent vial.

Uses light to communicate? Credit: <a href=

The main problem I have with this study (and similar ones he cites) is that the evidence is mostly negative: i.e. by eliminating other possible causes, he arrives at the conclusion that the only probable signalling mechanism is self-emitted light. Fels controlled for other ambient effects  such as heat and diffusion by evaporation. Also, he used glass and crystal vials in different experiments, and has shown that results differ depending on the vial material.  Since glass and crystal filter different wavelengths this was interpreted as  having at least two different wavelengths convey signals. (one spectrum above 340nm, and the other below).  However, Fels did not directly detect the proposed photons.  The claim is that the electromagentic radiation is too weak to be picked up by external sensors. But I believe there are microsensors that are sensitive to single-photon emissions that could be used in the medium. Nor did he use an independent source of photons to simulate the population radiation. Again, something I believe can be tried.

Apparently this is not the first study of the mysterious and elusive (or non-existent?) biophotonic activity: Fels cites a whole slew of previous studies, in the same vein, conducted with yeast, onion roots and some animal tissue cells. The emerging picture is that there may be such a phenomenon, but it hasn’t been shown directly, yet.

Anyhow, here is a FriendFeed discussion I started, (update: I removed the framing of the FriendFeed discussion from this post since it went from discussing science, to discussing discussions) You are welcome to comment here, or better yet, in the comments section in the PLoS ONE article itself (login required).

SlashdotLogo

This article has< been slashdotted. Exercise extreme caution.


Fels, D. (2009). Cellular Communication through Light PLoS ONE, 4 (4) DOI: 10.1371/journal.pone.0005086

Ribosomal paleontology

April 11th, 2009 1 comment

ResearchBlogging.org

In the latter epoch of those  2 billion-odd years between non-life and life on early Earth, our ancestral molecular replicators were quite probably RNA, not DNA. There are many arguments for this RNA world hypothesis: RNA can store information in its sequence, and self -duplicate; it can also catalyze reactions as a ribozyme. So technically, RNA has all the facilities necessary to be a replicator. Not only that, RNA can catalyze the formation of peptide bonds, providing a plausible link between the RNA world and the protein world we live in today. One strong supporting evidence for the RNA world is the ribosome: the ubiquitous ribozyme / protein complex that translates messenger RNA to proteins. The ribosome is mostly composed of RNA, it exists in all three super-kingdoms of life, and is conserved within the super-kingdoms. But the ribosome is a very complex machine: can we find one of the ribosome’s less complex precursors?  One that post-dates the RNA-only world, but not quite as complex and specific as today’s ribosome?

Isabella Moll’s group at the Max F. Perutz Laboratories (MFPL) in Vienna may have just discovered a candidate for an ancestral ribosome.  In bacteria, messenger RNA or mRNA has a 5` untranslated region, that commonly includes the Shine-Dalgarno sequence: a six nucelotide consensus sequence AGGAGG located just before the AUG codon, which marks the translation start site. However some mRNA molecules lack a Shine- Dalgarno, yet still get translated.  Moll’s group were looking at such leaderless messenger RNA, or lmRNA. lmRNA might be a remnant dating back to a more primitive era of translation, where the Shine Dalgarno motif has not yet evolved.

Credit: Byte Size Biology

Credit: Byte Size Biology

Moll’s group were using kasugamycin, an antibiotic that blocks transcription, to inhibit the translation of mRNA. While the translation of mRNA containing Shine-Dalgarno sequences was inhibited by kasugamycin, translation of lmRNA was not inhibited. When they examined the ribosomal particles that were translating lmRNA despite the presence of kasugamycin, they found their molecular weight to be lighter: 61S instead of the usual 70S in bacteria. The ribosome is a complex piece of molecular machinery, consisting of many RNA and protein components. Nevertheless, Moll’s group has shown that the smaller ribosome, lacking in many parts, can still translate lmRNA,  but not leadered (Shine-Dalgarno containing) RNA.  This “bare-bones” ribosome may very well be a version of the proto-ribosome, dating back to the dawn of life, before the separation of life’s superkingdoms.


Kaberdina, A., Szaflarski, W., Nierhaus, K., & Moll, I. (2009). An Unexpected Type of Ribosomes Induced by Kasugamycin: A Look into Ancestral Times of Protein Synthesis? Molecular Cell, 33 (2), 227-236 DOI: 10.1016/j.molcel.2008.12.014