Archive

Posts Tagged ‘homology’

An Ontology for Biological Similarities

September 23rd, 2009 3 comments

I griped here twice about the abuse of the term homology in biology. And to quote the Bellman in The Hunting of the Snark:  “What I tell you three times is  true”.

But while I gripe, someone is actually doing something about the whole terminology muddle. Specifically, Marc Robinson-Rechavi and his group in The University of Lausanne have created an ontology for describing the “relation between biological objects which resemble or are related to each other sufficiently to warrant a comparison“.

An ontology is a formal representation of concepts and the relationships between them.  It is usually hierarchical, with the terms going from the general to the specific. You may be familiar with the Gene Ontology as  standard representation of the different function of genes.

cytokinesisDAGrels

Example of the Biological Process ontology in the Gene Ontology

Marc’s group is creating an ontology for describing biological similarities in a hierarchical fashion, going from the general to the specific. At the top they have “similarity”. The four terms under that are “homology”, “homoplasy”, “functional equivalence” and “homocracy”.

Homocracy is a term suggested in 2003 by Claus Nielsen and Pedro Martinez for describing organs/structures which are organised through the expression of identical patterning genes. The rationale being that many homologous organs may be homocratic, but some homocratic organs may not be homologous.  Homoplasy means similarity due to convergent evolution, but not due to common ancestry. Fins on a tuna and a dolphin are homoplasic, but not homologous. However, the  fore fins on a dolphin are homologous to our arms, being descended from the forelimbs of the common ancestor of humans and dolphins.

The deepest annotated branch is homology, and going into the whole thing here would be long and arduous. But it is very well-crafted ontology. You can play around with the HOM ontology to see more of the terms, and also see their annotations at the OBO foundry.

Top terms ot the HOM ontology. You can explore more on http://keg.cs.uvic.ca/ncbo/flexviz/FlexoViz.html#

Top terms of the HOM ontology. You can explore more on http://keg.cs.uvic.ca/ncbo/flexviz/FlexoViz.html#

Now, if someone could sort the terminology muddle between the different dialects of the English language…

Peter (watching Cricket on British TV): What the hell is he talking about?
Englishman: Oh, it’s Cricket. Marvelous game, really. You see, the bowler hurls the ball toward the batter who tries to play away a fine leg. He endeavors to score by dashing between the creases, provided the wicket keeper hasn’t whipped his bails off, of course.
Peter: Anybody get that?
Cleveland: The only British idiom I know is that “fag” means “cigarette.”
Peter: Well, someone tell this “cigarette” to shut up.

family-guy-peter-griffin8

Source TV Guide courtesy Fox

“Micro homology”. Wut?

September 16th, 2009 3 comments

I ranted in a previous post about the use of homology as a quantitative term, rather than a qualitative term. Ben Blackburne commented on that post introducing me to “micro homology”, a term I did not know existed. I ignored its existence, until I heard it spoken yesterday at a talk, which sort of rubbed me the wrong way. Going back to my office to chill, I discovered there are 152 papers indexed in PubMed that use that term in their abstract or title. Not a good way to chill… here we go again: misusing “homology” by overselling it. Apparently microhomology is used to indicate an identity of a short nucleotide sequences in two non-complementary DNA strands. This identity may facilitate strand annealing constructions of chromosomal breakpoints such as the proposed Microhomology-Mediated Break-Induced Replication or microhomology-mediated end joining for DNA repair. There should  be a term for this phenomenon, but why use “microhomology“? The use of “homology” implies that the short identical sequences originated from a common ancestor. “Micro” would mean short region from otherwise homologous sequences. This is possibly derived from “homologous recombination“, where, indeed, homologous sequences are involved.  But in the microhomology case, it may not be so. Also, even if the identity is between short subsequences of otherwise homologous sequences, “microhomology” is somewhat of a confusing term, as it implies a quantitative relationship.  Why not simply use “microidentity” as a drop-in replacement? (Heh: non-homologous replacement).

Of course nothing will change, since I am too late in the game, no one listens to me anyway and I do not see the six readers of this blog rallying to eradicate microhomology.

No I am not bitter. Mild and bitter perhaps, but only after 5 o’clock.

lolwut

Categories: Biology, Evolution, genetics Tags: ,

Distant homology and being a little pregnant

July 15th, 2009 13 comments

ResearchBlogging.org

(Thanks to F.B.  for the inspiration).

Sigh… people don’t seem to learn. It’s been almost 22 years (yikes!) since a distinguished group of scientists published a letter in Cell calling for a responsible use of the word “homology”. If you were born when that letter was published, then in the US you can already drink legally. And you may very well want to, by the time you finish reading this post.

As of today there are one hundred and sixty seven articles listed  in PubMed with the phrases “distant homology” or “remote homology” in either the title or the abstract.

Please: make it stop.

Humpty1

Homology is a qualitative term.  It means having a common evolutionary origin. Two genes / proteins / organs are either homologous, or they are not. They cannot be “somewhat homologous” or “partially homologous” or (a favorite among molecular and structural biologists) “distantly / remotely homologous”.

Homology is inferred from similarity.  Similarity is quantitative. If organs are sufficiently similar, like mammalian forelimbs, then they are considered to be homologous. They maybe more similar (like the hands of humans and chimpanzees), or less similar (like human hand and a bat wing). Nevertheless, once they pass a certain similarity threshold, homology is inferred. The same applies to sequences of proteins and nucleic acids.  Similarity can be measured. Different degrees of similarities can be compared and scaled.

homology-limbs

If two protein sequences are aligned, and 40% of the amino acids in the alignment are identical, then the two sequences have a 40% identity. The do not have a 40% homology. They are  homologous, and the homology is inferred from the similarity.  We observe that the two sequences are similar, and then we conclude that they are homologous. We use the sequence similarity, as measured by percent identity, to trace a line of common descent for those proteins we deem homologous.

(As an aside I should say that the percentage of sequence identity, or %ID is not a very good measure for inferring homology, nor is it for measuring similarity. It is an easy one to use: but it is very coarse and prone to errors. There are many better measures out there, including statistical ones like e-values, p-values or information theoretic ones like bit scores. But I digress, and this is a matter for another post.)

But once we confuse observations with conclusions, things quickly become an impossible muddle.

Am I not not just picking nits here? I mean, surely when the term “distant homology” comes up in a paper or in conversation, we all know the meaning. Distant homology means having a common evolutionary origin,  but with a common ancestor that was around a long time ago. “Distant homology” is intuitive, brief yet understandable. it is less cumbersome than: “homologous, with a distant common ancestor, as concluded form a low yet statistically significant similarity” which is what we really should say if we properly separate observations from conclusions, as captain nitpick would have us do.

Allow me to answer with two examples.  First, I have read several papers discussing “structural homology”  in the context of protein structure. Those papers that discuss structural homology were actually using a verbal shortcut for  a homology inferred from structural similarity. That is, they inferred common descent from protein structural similarity. This kind of inference is highly contentious, and while not necessarily wrong, must be done with great care and proper caveats. However, once the researchers rolled up observations with conclusions by using the “structural homology” verbal shortcut, they absolved themselves from convincing the reader that structural similarity is indeed a good measure of homology, and jumped directly to the conclusion that there is indeed an homology here. The framework for inferring homology from sequence similarity is well worked out, but not so for structure, yet.   Therefore, even if we do use the verbal shortcut “distant homology”, we can only use it by virtue of having a certain measure of similarity well-established already, as in sequence based similarity. If it is not well established, and in using structural similarities, we fail to go through the proper scientific channels that consist of providing convincing observations prior to providing conclusions.

Second: even worse is the use of the term “functional homology”. This is a clear case of the word homology used as a drop-in synonym for similarity. The misnomer “functional homology”  is typically used in studies where proteins that are clearly not homologous perform similar functions. Why infer evolutionary descent when clearly that was not intended in the first place? Well, once you start confusing similarity with homology, observations with conclusions, and make them synonymous, this is what happens.

So don’t even start this confusion.  Separate observations from conclusions, and make the former support the latter. Homology is qualitative, similarity is quantitative.  Genes cannot be distantly homologous any more than a woman can be a little pregnant.

Now you can have that drink. Unless you are a little pregnant.


Gerald R. Reeck, Christoph de Haëna, David C. Teller, Russell F. Doolittle, Walter M. Fitch, Richard E. Dickerson, Pierre Chambon, Andrew D. McLachlan, Emanuel Margoliash, Thomas H. Jukes and Emile Zuckerkandl (1987). “Homology” in proteins and nucleic acids: A terminology muddle and a way out of it Cell, 50 (5) DOI: 10.1016/0092-8674(87)90322-9