Archive

Posts Tagged ‘protein-function’

Protein Function: how do we know that we know what we know?

July 22nd, 2010 6 comments

ResearchBlogging.org

The trouble with genomic sequencing, is that it is too cheap. Anyone that has a bit of extra cash laying around, you can scrape the bugs off your windshield, sequence them, and write a paper. Seriously?

Yes, seriously now: as we sequence more and more genomes, our annotation tools cannot keep up with them. It’s like unearthing thousands of books at some vast archaeological dig of an ancient library, but being able to read only a few pages here and there. Simply put: what do all these genes do? The gap between what we do know and what we do not know is constantly growing. We are unearthing more and more books (genomes) at an ever-increasing pace, but we cannot keep up with the influx of new and strange words (genes) of this ancient language. Many genes are being tested for their function experimentally in laboratories. But the number of genes whose function we are determining using experiments is but a drop in the ocean compared to the number of genes we have sequenced and whose whose function is not known We may be sitting on the next drug target for cancer or Alzheimer’s disease, but those proteins are labeled as “unknown function” in the databases.

The red line is the growth of protein sequences deposited in TrEMBL, a comprehensive protein sequence database. The blue line illustrates the growth proteins in TrEMBL whose function is know, or at least can be predicted with some reasonable accuracy. The green line is the growth in the proteins whose 3D structure has been solved. Note the logarithmically increasing gap between what we know (blue) and what we do not know (red). Image courtesy of Predrag Radivojac.

Enter bioinformatics. CPU hours are cheaper than high throughput screening assays. And if the algorithms are good, software can do the work of determining function much cheaper than experiments. But therein lies the rub: how do we know how well function prediction algorithms perform? How do we compare their accuracy? Which method performs best, and are different methods better for different types of function predictions? This is important because most of the functional annotations in the databases come from bioinformatic prediction tools, not from experimental evidence. We need to know how accurate these tools are. Think about it this way: even an increase of 1% in accuracy  would means that hundreds of thousands of sequence database entries are better annotated, which in turn means a lot less time in the lab or in high throughput screening labs going after false drug leads.

So a few of us got together and decided to run an experiment to compare the performance of different function prediction software tools.  We call our initiative the CAFA challenge: Critical Assessment of Function Annotation. There are many research groups that are developing algorithms for gene and protein function prediction, but those have not been compared on a large scale, yet. OK then: let’s have some fun. We, the CAFA challenge organizers, will release the sequences of some 50,000 proteins whose functions are unknown. The various research groups will predict their functions using their own software. By January 2011 all the predictions should be submitted to the CAFA experiment website. Over the net few months, some of these proteins will get annotated experimentally. Not many, probably no more than a few hundred judging by the slow growth of the experimental annotations in the databases. But we don’t need that many to score the predictions. A few dozen will do.

On July 15, 2011 we will all meet in Vienna, and hold the first-ever CAFA meeting as a satellite meeting of ISMB 2011. This will be the fifth Automated Function Prediction meeting we have been holding since 2005. Only this time, there won’t just be the usual talks and posters, there will be the results of a very interesting experiment. The International Society for Computational Biology is generously hosting our meeting, and judging by the response we are getting so far, we will need one of the larger halls.

Learn more at http://biofunctionprediction.org If computational protein function prediction is your thing, join the CAFA challenge. If you are just an interested observer, keep an eye on the site. In any case, please spread the word.  Finally, if your company wants some publicity, get in touch! We could use the sponsorship ^_^

Acknowledgements: I would like to thank the CAFA co-organizers, Michal Linial and Predrag Radivojac. The CAFA steeering committee: Burkhard Rost, Steven Brenner, Patsy Babbitt and Christine Orengo for supporting us, keeping us on the straight and narrow and for incredibly useful and insightful suggestions.  Sean Mooney and Amos Bairoch for hashing out the assessment.  Tal Ronnen-Oron and the rest of Sean Mooney’s group for setting up the CAFA website. The International Society for Computational Biology for sponsoring us. The community of computational function predictors that have participated in and supported past meetings on computational function prediction, the research groups that have registered to CAFA so far, and those that will register soon :)   Finally, Inbal Halperin-Landsberg for coining the name CAFA. I apologize in advance if I left someone out.

GO CAFA!


Godzik, A., Jambon, M., & Friedberg, I. (2007). Computational protein function prediction: Are we making progress? Cellular and Molecular Life Sciences, 64 (19-20), 2505-2511 DOI: 10.1007/s00018-007-7211-y

Protein function, promiscuity, moonlighting and philosophy

June 12th, 2010 5 comments

ResearchBlogging.org

I recently received an email from a graduate student in Philosophy regarding protein function. Not sure if that person wants his name advertised, so I will keep it to myself.

“I am a fan of your blog, and interested in the philosophy of biology. One particularly interesting question is what makes something have a function; when it comes to artifacts, we just check with whoever designed the thing. It gets more complicated when functions change, and things are used for purposes other than what they were originally designed for, but it’s still pretty straightforward. However, biological functions can’t go that route (unless maybe one is a fan of intelligent design). I’m curious what you think about this, after seeing you mention your interest in predicting the function of genes and proteins. Is the function of something just the causal role that it plays in some larger mechanism? Do you have to include evolutionary considerations? If you ever have the time, I’d love to hear your thoughts about this.”

Thanks very much


My rather rambling answer follows:
“Ouff, you’ve opened a pretty big can of worms, which many of us are having a problem with.

Function in biology is context dependent. An enzyme catalyzes a biochemical reaction, say, removing a phosphate molecule from a protein, However, by removing that phosphate from the protein, the enzyme changes something in the function of the cell, as phosphate molecules are the ‘signaling currency’ of the cell. So the enzyme fulfills a cellular function as well. Finally, suppose this cell is in a developing embryo, and the phosphate removal in this type of protein in many catalyzes the creation of a limb, or a particular organ or tissue: now we have a whole organismal functional context. Which one of those: the biochemical, cellular or organismal is the ‘real’ function of the cell? Well, obviously all three are ‘real’.

To add a twist, suppose that a this enzyme is also active in removing phosphates from proteins in the adult animal. Now the animal has reached maturity, and because of a mutation in one of the cells that enzyme does not work anymore. The intra-cellular signaling becomes defective and the protein accumulates in its ‘phosphorylated’ form. This signals a division of the cell, and suddenly you have a pre-cancerous situation. So from a health point of view, this mutant plays a role in the survival and proliferation of cancer cells. Interestingly, a protein that causes our spittle to froth (don’t try doing this around other people, gross), was first discovered as a nasopharenygeal cancer associated protein, and it is named as such. Many genes and proteins are named after they are found to do one thing, even though we generally associate them with something else, simply because of the context in which they were discovered.

Also, there are moonlighting proteins, which may simply perform different functions. A protein called APIS is part of the proteasome: a cellular protein shredder which is itself a rather large protein complex. APIS also plays a role in transcribing DNA to RNA: thus, it is part of a protein creation complex, and of a destruction complex. See this short paper on Moonlighting proteins.

Yes, evolutionary considerations always come in to play, it is the lens through which we examine all biological phenomena. Evolution does cause certain proteins to be ‘multi-purposed’, also, some types of protein structures are more amenable to a certain set of functions than others. Furthermore, certain proteins are ‘promiscuous‘: certain enzymes may work on more than a single substrate (“Promiscuous” is different from “moonlighting”, where enzymes do completely different jobs; being “promiscuous” means a single enzyme does the same thing, but with different partners: i.e. catalyze the destruction of a sugar, but with different types of sugar molecules). Promiscuous enzymes can clearly show a ‘trajectory of evolution’ i.e. going from being very specific for one substrate, to non-specific for several substrates (or vice-versa). Promiscuity is a good example of molecular adaptation and tradeoff: a promiscuous enzyme means you have a jack-of-all-trades in your genomic complement, and you have to spend less energy on controlling the production of several different enzymes for several different tasks. However, the flipside of having a jack of all trades is that he is the master of none: the catalysis reactions are generally less efficient, which may cause problems for the cell/organism.

Phew, I hope I managed to convey some of the complexities of this issue, and how we try to deal with them in a systematic fashion.
[... edited out]

Cheers,

me”

The difference between moonlighting...

...and promiscuous


Khersonsky O, Roodveldt C, & Tawfik DS (2006). Enzyme promiscuity: evolutionary and mechanistic aspects. Current opinion in chemical biology, 10 (5), 498-508 PMID: 16939713

Jeffery, C. (2003). Moonlighting proteins: old proteins learning new tricks Trends in Genetics, 19 (8), 415-417 DOI: 10.1016/S0168-9525(03)00167-7

Combrex: Computational Bridge to Experiments

May 4th, 2010 Comments off

Combrex is an exciting new project at Boston University to bridge computational and experimental techniques to functionally annotate proteins. They are hiring, see below:

JOB POST

We are seeking to hire a creative computational scientist for a
transformative project: COMBREX: A Computational Bridge to Experiments.

The work will involve building a novel resource that combines databases,
science, social networking and machine learning.

The position is available immediately.

For some preliminary information pls. see

www.combrex.org

BS or MS in Computer Science, Informatics, Engineering or related field is required.

Applicants with PhD’s would be considered for a separate Research Associate position.

Pls send CV and names (emails) of two references to:

Prof. Simon Kasif

kasif@bu.edu

Subject Line: COMBREX POSITION

Gene and protein annotation: it’s worse than you thought

December 14th, 2009 2 comments

ResearchBlogging.org

Sequencing centers keep pumping large amounts of sequence data into the omics-sphere (will I get a New Worst omics Word Award for this?)  There is no way we can annotate even a small fraction of those experimentally and indeed most  annotations are automatic, done bioinformatically. Typically function is inferred by homology: if the protein sequence is similar enough to that of a protein whose function has been determined, then homology is inferred: that is, the unknown and the known protein are descended from a common ancestor. Even more so,  functional identity to the known protein are inferred: the assumption being, that the function did not change if the common ancestral protein is recent enough: that is, if the sequence identity is high enough.  But there are problems: what is the threshold for determining not only homology, but functional identity? Even if two proteins are 95% identical in their amino-acid sequence, if the remaining 5% happen to include active site residues, these proteins may do completely different things. However, most new sequences are annotated just this way, with some variations.

Because of its volume, the veracity of the electronic annotation is rarely checked by experts.  Also, the electronic annotations come from far and wide, with different annotation software using different databases to infer gene and protein function. This sets the stage to a huge game of Broken Telephone, where  wrong annotations can propagate through many databases, accumulating errors. Imagine that we have an annotation program with a 90% accuracy rate. This means that given a query protein sequence and a “gold standard” 100% correct reference database, this programs infers the query sequence’s correct function 90 out of every 100 times.  For a typical bacterial genome of 5000 genes, this would mean that 500 genes are wrongly annotated. Let’s cal our bacterium Bug1. Now we place those 500 wrong annotations (along with the 4500 correct ones) in the “definitive database” for this bacterium, called Bug1DB.  Now this Bug1DB is used as  a “gold standard”, and  another genome is annotated, this time of Bug2. Let’s suppose, for argument’s sake, that the two genomes contain roughly the same homologous genes.  Since every gene in Bug1 has a 10% probability of being wrongly re-annotated when transferred to Bug2, this would mean a compounding error of  0.10 * 500 = 50 genes from the original  wrong 500 genes (we assume that “two wrongs do not make a right” and that an incorrect annotation of any incorrectly annotated gene would not revert to a correct annotation my mis-annotating it again).  But it would also mean that, on average, 500-50 =450 genes from A that were correctly annotated the first time would  be incorrectly annotated the second time. This means that Bug B now has 500+450= 950 mis-annotated genes. And this is through two filters of a Broken Telephone game using a highly accurate annotation program.

The trouble is, that a 90% accuracy rate is unrealistically optimistic. Also, having all 5000 genes in a genome annotated with some function (as opposed to simply “unknown”) is rather fanciful. So the mis-annotation problem is worse, even if transfer and re-annotation does not take place exactly as described. But just how much worse?

The question is answered in  a rather disturbing study published in PLoS Computational Biology by Alexandra Schnoes and her colleagues in Patricia Babbit’s group at th University of California, San Francisco. They used 37 experimentally characterized enzyme families to test different databases.  They found a high level of misannotation, but also a highly variable one. For example, the manually curated SwissProt database had a very low level of errors. On the other hand, TrEMBL, which uses simple sequence similarity for annotations, had a high level. So did NR, the combined GenBank coding sequence translations+RefSeq Proteins+PDB+SwissProt+PIR+PRF; pretty much the default reference database against which biologists BLAST their sequences.  They found that 40% of the genes they examined were mis-annotated in NR. They also went back in time, examining the misannotaion fraction of their gold standard 37 families, and found that the fraction of misannotated genes has increased,  from 15% in 1995 to 40% in 2005.

growing-over-time
The change in misannotation over time in the NR database for the 37 families investigated. Sequences are plotted by the year when they were originally deposited in the database (x-axis). The number of sequences (left y-axis, bar graph) found to be correctly annotated is shown in green. The number of sequences found to be misannotated is shown in red. The bars for each year represent only the sequences deposited into the database in that year. The fraction (right y-axis, line plot) of sequences deposited each year into the NR database that were misannotated is given by the open nodes, connected by the black line to aid in visualizing the overall trend. This fraction represents the number of sequences in the 37 test families predicted to be misannotated divided by the total number of sequences deposited each year from the test set, i.e. the sum of the sequences depicted in the red and green bars for each year. (From: Schnoes AM, Brown SD, Dodevski I, Babbitt PC, 2009 Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies. PLoS Comput Biol 5(12): e1000605. doi:10.1371/journal.pcbi.1000605)

There are also many ways to be wrong, as Schnoes and her colleagues have discovered. Overprediction is one, where proteins are annotated with functions that are more specific than the available evidence supports. 85% of misannotations were found to be overpredictions. Of the remianing 15%, about half were found to be missing important amino acid residues, which means that they could not carry out the functions by which they were annotated. The other half were simply not within the similarity threshold necessary to include them in one of the superfamilies they have examined.

By now you are wondering, who is validating the validators? That is, if Schnoes and her colleagues determine a single cutoff for inclusion in a protein family, they might also include falsely annotated proteins as correctly annotated (false positive), or exclude correctly annotated proteins as mis-annotated (false negative). To avoid that, they set three different similarity thresholds to their 37 superfamlies, and examined which proteins the similarity searches attract. In the lowest of these threshold, they purposefully included the ability to include up to 5% false positives. This they called the “lenient threshold”, and they did check their results using these different thresholds (three of them). They found there was a slight increase, but no overall substantial change, in the discovered level of misannotation in the databases, even when lowering the bar to the lenient threshold.

So how bad is the level of misanntoation in the databases? It depends on the protein superfamily they checked against, and on the database. Here is an excerpt from another figure, showing the misannotation of protein families in the HAD haloacid dehalogenase (HAD) and amidohydrolaseand (AH)  superfamilies of enzymes. Each rectangle represents a different database. The bar is the mean error in that database for that particular superfamily, and each colored circle is a protein family, placed and the level of  average misannotation for that family. The circle size indicates the family size.

Percent misannotation in the families and superfamilies tested

Percent misannotation in the families and superfamilies tested

Note that SwissProt fares very well, although lacks some families (those with an “X” through the blank circle).  For the HAD superfamily, we see an error of 60% in the three other heavily used databases, and for AH we see a 40% error. That is brutally high, and quite worrying. Other families fared little better when checked against those databases. Some went up to 80% and 100%(!)

So what can be done? Schnoes and her colleagues suggest several remedies. First, include “evidence codes” with the annotations. Those will let us know how each annotation is inferred, and thus how trustworthy it is. Additionally, avoid overprediction, which accounts for 85% of wrong annotations. Many protein functions are described too specifically, without enough evidence to support the annotation claim. Taking a step back and giving a more general description of the function would go a long way towards cleaning up the databases. The manually curated databases such as SwissProt did fare very well in their examination, but manual curation is not possible anymore with the post-genomic and metagenomic data deluge. Large databases  have to clean up the mess pretty much the same way it was created: by automated means.  Let’s hope it will happen soon enough. A 40% error rate in the database you are looking at can really put a damp on your analysis.


Schnoes, A., Brown, S., Dodevski, I., & Babbitt, P. (2009). Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies PLoS Computational Biology, 5 (12) DOI: 10.1371/journal.pcbi.1000605

A FLORA of Protein Structure to Protein Function

September 3rd, 2009 2 comments

ResearchBlogging.org

Proteins are the machinery of life, and they facilitate most of life’s functions. Traffic into and out of the cell? Protein pumps, pores and channels. Respiration? Proteins. Metabolism and catabolism? Proteins. Immune system, signaling, development…  all complex networks of interacting proteins. Understanding a protein’s  structure can tell us a lot about how it performs its function. If we know what a protein does, we can look at it’s molecular workings, and generally figure out how it does it. Hemoglobin carries oxygen in most animals, something that has been known since 1840. However, it is only when Max Perutz and John Kendrew solved the structure, that the actual mechanism of oxygen binding and release has been elucidated. Since Perutz’s and Kendrew’s discovery in 1949, the structures of some 35,000 proteins have been solved.

Animation showing binding and release of oxygen molecules to hemoglobin

Animation showing binding and release of oxygen molecules to hemoglobin

When we know the protein’s structure we know a lot about how it performs its function.That would be the equivalent of looking at a  diagram of a car engine, and then exclaiming: “oh, so that’s how it works!” But the converse does not hold true. If we have the structure, we may not be able to infer the protein’s function. Imagine having the diagram of a new engine which you have never seen before. It might be a car engine, but which make and model? Or it might not be a car engine at all, but that of a lawnmower, or a boat, an electric generator. The point is, without knowing what the diagram represents, we would only have a general idea that we have a machine that burns some sort of fuel to power something.

We face the same problem with protein structures. It does happen that we solve the structure of a protein, whose function is unknown. Oh. Kay. What now? We are stuck with a diagram for a machine which we do not know what it does.  Therefore, any kind of method we can devise to predict a protein’s function from its structure would be very helpful. Christine Orengo’s group at University College London, UK has been tackling this problem for quite a while. Her group has recently published a paper in PLoS Computational Biology where they describe an algorithm that can classify engines enzymes: a subgroup of proteins that catalyze chemical reactions. The classification algorithm works as follows:

1) They partitioned all enzymes of known function into functional subgroups, or FSGs. Within an FSG, all proteins have the same function. Two proteins from different FSGs will have different functions.

2) Next, they selected a set of conserved vectors from a given domain in a given FSG which, when compared against relatives of different functions/FSGs, would produce a low score. Conversely, when proteins from the same FSG are compared, they should have a significantly higher score.The vectors are measurements of distance and direction along the side chains of conserved amino acid residues. They found that this differentiating set of vectors is best obtained when the proteins are aligned within and between FSGs, and the vectors are taken from the conserved residues in the FSG alignments.

Graphical outline of FLORAMake algorithm. doi:10.1371/journal.pcbi.1000485.g002

Graphical outline of FLORAMake algorithm. Click to enlarge. doi:10.1371/journal.pcbi.1000485.g002

3) Once they determined which vectors are more conserved within a given functional sub-group (FSG), they created a library of conserved vectors within FSG, a sort of an FSG bar-code. Although the constriction is technically unsupervised, limiting the vectors to conserved residues within an FSG naturally lands them with lots of active site residues.

Having created the template library, they can now find vectors on test proteins, and scan those against the library of conserved vectors, using a simple similarity function. Although (or because) their method is quite simple, they receive very high sensitivity and precision. The methods they compare against are all global structure aligners (such as CE and CATHEDRAL), and by virtue of simply adding spatial information of the conserved / functional residues they greatly improve the function annotation. The great thing about this work is the jump in improvement by adding this very simple, yet so far mostly neglected, attribute.

Unfortunately, no software yet. Too bad because…..

funny-pictures-relevant-to-my-interests


Redfern, O., Dessailly, B., Dallman, T., Sillitoe, I., & Orengo, C. (2009). FLORA: A Novel Method to Predict Protein Function from Structure in Diverse Superfamilies PLoS Computational Biology, 5 (8) DOI: 10.1371/journal.pcbi.1000485

Blood, sweat and spit

June 5th, 2009 2 comments

ResearchBlogging.org

A short follow up to the previous post on latherin. A quick reminder: latherin is a protein that exists in the horse’s sweat and saliva. In the sweat, latherin acts like a detergent, wetting the horse’s coat to allow for better water evaporation and hence better cooling. In the saliva, it helps wet the horse’s dry feed, aiding digestion. It’s an interesting example of a protein performing different physiological functions in different contexts.

Widdowquinn made an interesting observation about our tendency to color a protein with a function of our liking. Quoting the comment:  “Is the ‘function’ of latherin to aid heat transfer, or digestion, or both? Or does it make no sense to imbue the protein with any such ‘function’ – only to note that it is able to act in both these ways (potentially among others)“.

No sweat. Credit: ishkamina from Flickr

No sweat. Credit: ishkamina from Flickr

This observation is very true and is  interesting especially with latherin. Latherin is part of a large group of evolutionarily related proteins containing a domain known as Bactericidal permeability-increasing protein (BPI) / Lipopolysaccharide-binding protein (LBP) / Cholesteryl ester transfer protein (CETP). The observed common function of all proteins that have this domain is that they bind fatty molecules (lipids) that constitute the cell membrane.  The differences lie in the context of which lipids they bind and what happens as a consequence. For example, BPI proteins serve as potent antibacterial agent, binding lipopolysaccharide (LPS), a bacterial toxin expressed on the outer layer of the bacterial membrane. LPS causes a severe inflammatory response when in the blood stream, but BPI, secreted by our immune system, dampens down the response, and also kills the bacteria it binds.  LBP also binds LPS, but acts as an alarm system, increasing the inflammatory response. Another family similar proteins is found on the lung surface, and are also a line of defense against bacteria, by similar mechanism. This is the Palate, Lung, and Nasal epithelium Carcinoma associated protein (PLUNC).  The names is scary, but it was given in the context of their discovery, cancer research. PLUNC family proteins are used as cancer markers, but cancer has nothing to do with their primary function. BASE is another interesting and puzzling homolog, which may be a “dying gene”: expressed in a small quantity, but does not seem to be producing a viable well-folded protein product. It is expressed in mammary glands, and in saliva. Sounds familiar?  Remember that mammary glands are evolutionarily modified sweat glands: BASE is also used clinically as a marker for breast cancer.

CETP has nothing to do with bacterial membranes; rather, it transports a precursor of cholesterol, which is a building block for animal membranes.

So we see here an interesting case of functional radiation: while the different proteins in the family maintain very similar sequences, their functions differ in physiological  context, and also in  organisms which express them (even bacteria use an LPS against other bacteria!), and the tissues that express them.

BPI and relatives on human chromosme 20

BPI and relatives on human chromosme 20. Click to enlarge.

Moreover, with the exception of CETP, the genes coding for those proteins are clustered together in the same region in chromosome 20. This indicates that they are not only homologs (i.e. arose form a common ancestor) but paralogs: homologs that have arisen due to gene duplication. Gene duplication is a potent evolutionary mechanism for acquiring new functions: while the ancestral gene continues to do its work, the duplicate, which is redundant, has less selective pressure on it, can may adopt other functions. We also see a duplication in the same chromosomal neighborhood of the latherin gene in the horse, but only two, or possibly three neighboring paralogs.

Latherin and a neighboring homolog in horse chromosme 21. If you like playing around with the UCSC genome browser, try finding the third.

Latherin and a neighboring homolog in horse chromosme 21. Click to enlarge. If you like playing around with the UCSC genome browser, try finding the third.

So we went from horse sweat, to bacterial defense, to a dying gene in human which is very much alive and lathering in horses. Ah, the zigzagging wonders of protein evolution! Was latherin initially a bacteria-killing  protein that just happened to work well as a sweat enhancer? Possibly, seeing that even bacteria have a BPI domain.  It might even still serve in a capacity as an immune defense protein, both in the sweat glands and in the salivary glands. Yes, we do color genes with the brush we happen to hold in our hand at the moment, but it seems like nature uses many different brushes, and it’s fun to try and find them all. (Does this metaphor even make sense?)

Disclaimer: genomic map pictures were generated using the fantastic UCSC Genome Browser. No horses or unicorns were harmed during the making of this post.


Bingle, C., & Craven, C. (2004). Meet the relatives: a family of BPI- and LBP-related proteins Trends in Immunology, 25 (2), 53-55 DOI: 10.1016/j.it.2003.11.007

Bingle, C. (2004). Phylogenetic and evolutionary analysis of the PLUNC gene family Protein Science, 13 (2), 422-430 DOI: 10.1110/ps.03332704

Glowing like a horse

June 3rd, 2009 4 comments

ResearchBlogging.org

Dennis Mitchell: "Margaret, you are all sweaty"
Margaret Wade: "Dennis, girls don't sweat. Horses sweat,
boys perspire and girls glow"
Dennis Mitchell: "Margaret, you are glowing like a horse".
                              -- Dennis the Menace / Hank Ketcham

Horse sculpture, Louisville Kentucky. Credit: Atelier teee, Flickr.

Horse sculpture, Louisville Kentucky. Credit: Atelier teee, Flickr.

Horses and humans sweat but most other mammals do not. Sweating lowers the body’s surface temperature by evaporating off the surface of the skin. The heat drawn by evaporation is removed from the surface, thereby cooling it.  But as anyone who has been skiing in a poorly-ventilated jacket can tell you, this does not work well if the sweat is not allowed to evaporate. Indeed, most mammals have fur, which would trap the sweat not allowing it to evaporate quickly. They use alternative cooling mechanisms, like evaporative cooling from the respiratory tract, or panting. The horse’s solution is to mix in its sweat a protein called latherin which acts as a surfactant. This means it lowers the surface tension of the water in the sweat, allowing the water to  it wet the horse’s coat hairs better and allowing for faster evaporation. Latherin acts like it’s name suggests: it is basically a kind of naturally produced soap, and racehorses are known to lather up during a race.

Wild Horse Monument, Washington. Credit: ankeyd, Flickr

Wild Horse Monument, Washington. Credit: ankeyd, Flickr

Horses are also known to foam at the bit. In an article published today in PLoS ONE, Rhona McDonald and her colleagues at the Universities of Glasgow and Manchester show that lathering up and foaming at the bit are two facets of the same phenomenon as latherin is also produced by the horse’s salivary glands. Horses’ food is unusually dry, and latherin in the salivary glands serves to make it nice and mushy, turning oats into oatmeal. Here is an interesting case of adaptation of the same protein to different functions: helping initial digestion, and helping the cooling mechanism, through the same biophysical principle of a surfactant agent.

Horse sculpture, fountain hills Arizona. Credit: Dan Shouse, Flickr

Horse sculpture, fountain hills Arizona. Credit: Dan Shouse, Flickr

We use artificial surfactants, such as soap to clean ourselves. That includes toothpaste, which is mostly detergent, explaining the foam that we generate while brushing our teeth. It may also be that the latherin acts as a tooth cleaning agent for the horse: a third use. But that possibility is  is not mentioned in the paper. Maybe the authors did not want to look a gift horse in the mouth.

Update: this post has been selected as Blog Pick of the Month for June 2009 by EveryOne, PLoS-ONE’s community blog.



McDonald, R., Fleming, R., Beeley, J., Bovell, D., Lu, J., Zhao, X., Cooper, A., & Kennedy, M. (2009). Latherin: A Surfactant Protein of Horse Sweat and Saliva PLoS ONE, 4 (5) DOI: 10.1371/journal.pone.0005726