2010 Homology High-Low Count
Previously on our show: ‘ Homology is Not a Quantitative Term‘. Homology is a drop-in replacement for the “common ancestry”. It does not make any sense to say “low common ancestry” “high common ancestry” “micro common ancestry” or (egads!) “70% common ancestry”. You cannot be 70% homologous any more than you can be 70% pregnant.
Why am I harping on this again? Because the term “low homology” managed to sneak itself, of all places, into the title of a paper published in Bioinformatics. Ouch. Bioinformaticians should know better.
Just for kicks, I decided to look at how many papers were published this year (January 1 through today) using the misuse of terms in their title or abstract. Here are the results:
- “high homology” 134
- “low homology”: 13 (well, that’s low)
- “highly homologous”: 140
- “distant homologs”: 7
- “close homologs”: 7
- “percent homology”: 1
I could not find others such as “weak homologs”, “strong homologs”. Small mercies. Well, there is some work to do still in removing bad habits.
“Distant”, “Remote”, and “Close” are perfectly good adjective to modify “homolog”. They express in qualitative terms how many generations back the common ancestor is. “Percent” is not, as one cannot be fractionally homologous in a single sequence, and there are other ways to talk about having multiple alleles.
I agree that terms like high, low, and (worst of all) percent homology makes no sense, since they suggest that some homologs are more homologous than others. But like the first commenter, I cannot see anything wrong in talking about close or distant homologs since these terms merely describe (in rough terms) the evolutionary distance that separates the two homologs without making any claims about close homologs being “more homologous” than distant homologs.
I have to agree with the first two commenters on the legitimacy of “close and distant” as modifiers for the word homolog. They are both perfectly well accepted terms within molecular evolution. Homolog referring to the fact that the sequences in question are indeed homologs and the close/distant referring to evolutionary distance between them, or more generally between the taxa in question (unless we are in a paralog situation of course).
Also, if I am not mistaken, anytime I have heard someone use the term microhomology it wasn’t in the sense of “more or less” which is of course, wrong like you point out. Rather it was used in cases of short homologous regions due to recombination, domain swapping, etc. So in that case only a short region of the sequences in question were actually homologs, but the sequences as a whole were not.
do you disagree with the use-case of “high homology” where it’s in reference to a syntenic region and not a single gene. e.g. “this region with 34 out of 55 genes retained as orthologs has a high (level of) homology”. i suspect that’s where at least a few of the “highly homologous” came from.
I was about to agree with the previous posters (After all, are there any 2 species on Earth that *don’t* have a common ancestor?) But then I read your previous article, which explained it much better. This article says “Homology is a drop-in for common ancestry”, which is the cause of the confusion.
A better way to say it would be “Homology means a particular feature _originated_ in a common ancestor”. So if frogs and mammals developed from a common ancestor with 4 legs, the _legs_ are probably homologous. But if the common ancestor had 2 legs, then that feature can’t be homologous.
The problem is of mixing observations (% sequence similarity, or p-value) with conclusions (“I therefore conclude these sequences are homologous”). By using a verbal shortcut “distantly homologous” you are mixing an observation with a conclusion. And that is dangerous. For one, you do get people saying “more homologous” and “less homologous” as a result. See http://www.ncbi.nlm.nih.gov/pubmed/19819561 and the 115 other abstracts saying “less homologous” in PubMed, (although only one is from 2010). If you would like to describe phylogenetic distance in a semi-quantitative manner, just say: “distantly related”, or “closely related”. For comparative statements, “more similar” or “less similar” are good. When mixing a conclusion with an observation for the sake of shortcuts, you are starting all sorts of weirdnesses.
See also: http://bytesizebio.net/index.php/2009/07/15/distant-homology-and-being-a-little-pregnant/
@Anonymous
“(After all, are there any 2 species on Earth that *don’t* have a common ancestor?)”
I was referring to an ancestral trait (body part, protein sequence, DNA sequence), not to an ancestral species. Sorry for the confusion.
@Iddo:
I still disagree. I agree that saying sequences are “distantly homologous” fits what you are describing, but at least in my area I don’t see that. What I do see is the term “distant homologs”. While the “distant” portion does, as you say, refer in some ways to the low percent identity of the homologs it is meant to refer to the evolutionary distance of the taxa.
And those that I see use this commonly, including me, like you, harp frequently on the horrible use of phrases like more homologous, less homologous, percent homology, etc. I just don’t think it is the same thing.
@Dan Gaston
“Distant homologs” and “distantly homologous”: I fail to see the difference, aside from semantics.
What’s wrong with “(these chromosomal regions…) have a high fraction of homologous genes.”?
@Iddo I should have been clearer and realize I phrased it bad. They are only different semantically. What I meant is that I can see how people using the phrase distantly homologous could create confusion and lead to others to think of degrees of homology.
My overall point though, is that the phrase distant homolog is widely used in molecular evolution circles and is, I feel, quite legitimate. Homolog and homology is still being used as a categorical statement. Either sequences are homologs or they are not. It would of course be more correct to say something like “evolutionarily distant homologs” but given the field the phrase is being used in, the evolutionarily part is understood and implied.
Just came across this: “The misuse of terms in scientific literature.” published in Bioinformatics http://www.ncbi.nlm.nih.gov/pubmed/20696735