Metagenomes as a diagnostic tool?

By Iddo on February 15th, 2009

Sorcerer II, the research yacht used to collect the Global Ocean Survey data

Can we learn about an environment by looking at the bacteria living in it? Can we sequence a metagenome, and then say: ”according to the active genes in this water sample it appears to be too rich in metal ions / sewage products / other pollutants” ? In the foreseeable future could we sequence a sample of our bodies’ bacteria to reveal an imminent yet currently hidden disease? To answer this question we first need to understand why using microbial sequences as a clinical or environmental diagnostic tool would be a good idea.

It seems that every time we check, there is a faster, cheaper sequencer out there. While I am writing this paragraph, some 1,000,000 base pairs could have been sequenced using next generation sequencing techniques. That’s enough information for 1,000 genes (if we sequenced a coding region, that is, and your mileage may vary by a lot). And with this kind of technology we can now obtain good coverage of the DNA sequences in a clinical or environmental sample. Now suppose that we find a typical set of microbial genomes in human feces that is correlated colon cancer, or an irritable bowel syndrome: given the cheap cost of sequencing, those could be a viable alternative to other more expensive or invasive treatments. The same can be true for environmental conditions: sewage rich in heavy metals can ostensible be enriched for a certain type of bacteria, or metabolic pathway contents: say, ABC transporters for metal removal and/or metal sequestering protein complexes.

The idea of using bacteria as indicators is not new: the presence of fecal colifoms has been used for decades to check water and food contamination. However now a more complex genomic picture can be obtained. The question is, does a complex genomic picture correlate with the environmental attributes well enough for us to use it as an indicator? And if so, how can we do that?

Also, the study of the effects of the environment on microbes is as old as microbiology itself. Anton van Leeuwenhoek noted that the “animalcules” scraped from his mouth and that he viewed under his microscope were gone, or were immobile after he drank hot coffee. Leeuwnhoek also realized that the “animalcules” were attracted to “corrupted flesh and bones”, thus being the first to describe habitat selection and colonization by microbes. He did not, however, make the larger leap of imagination to realize that the animalcules were the ones responsible for that corruption.

A collaborative study between several groups in the UK, US, Germany and Denmark that was published last week in the Proceedings of the National Academy of Sciences may very well set the path to the study of the relationship between a metagenome and environmental attributes, thus leading the way to using environmentally sampled genomes to characterize the environment from which they were taken. The lead authors are Tara A. Gianoulis from Mark Gerstein’s lab in Yale and Jeroen Raes from Peer Bork’s lab in EMBL. They have performed correlation analyses of different degrees of complexity to define the relationships between the metagenome of marine bacteria to the environment they were in. Previous studies have addressed this question, both using metagenomic and genomic data. However, most of those studies were concerned with correlations between pairs of variables. Typical questions asked were the correlation between phtosynthetic genes and water depth, water salinity and species of enzymes and, as we have seen, obesity and metabolic enzymes. Nevertheless, most of these studies were driven by a very specific biological question or set of questions.

Gioanoulis, Raes and their colleagues took a more comprehensive approach. They decided to look for correlations between several environmental attributes (temperature, sample depth, water depth, salinity and monthly average chlorophyll level) and the presence of genes for certain metabolic pathways. They then looked for “many to many” correlations instead of “one to one” correlations, expecting to find meaningful metagenomic characterization of the environmental sample. The metagenomic data set that they chose is one of the largest and best documented sets available. The Global Ocean Survey metagenomic data set was taken from 58 sites around the globe, mostly marine, boasting a total of over six million predicted protein sequences. The many to many correlation technique they used was Canonical Correlation Analysis (CCA), a multivariate statistical tool used to find covariance between many variables (cross-covariance).

Sample sites during the GOS expedition (route of the Sorcerer II runs from east to west).

Gioanoulis and Raes have found correlations between the genomic potential of the samples and the environmental conditions. In water samples that were nutrient-poor (as indicated by chlorophyll content) there were more pathways that had to do with amino acid synthesis than in samples that were in nutrient-rich water. This is because bacteria in nutrient poor water are selected for the ability to synthesize amino acids that are otherwise unavailable. Interestingly, the amino acid paths that vary with the environment do not vary with the energy necessary to synthesize a given amino acid. It seems that those pathways require more metal, in cofactors, for the synthesis process. The authors speculate that the energetic cost of importing trace metals is the rate limiting step, rather than production of amino acids.

Back to where we started: can the multiple covariations found between the metabolic pathways in the sequence data and the environmental data be used to learn from the metagenome about the environment? In my opinion, not quite yet. The absence of pathways may be due to undersampling, or the correlations observed might be with other environmental factors not listed in this study. But it is a great start though, and I believe that many other studies will adopt CCA or similar approaches for analyzing the relationships between environmental factors and genomes adapted to those environments. It would be interesting to see CCA applied to host-symbiont (pathogen or commensal) studies, where environmental factors can be better controlled, especially in model animals.

T. A. Gianoulis, J. Raes, P. V. Patel, R. Bjornson, J. O. Korbel, I. Letunic, T. Yamada, A. Paccanaro, L. J. Jensen, M. Snyder, P. Bork, M. B. Gerstein (2009). Quantifying environmental adaptation of metabolic pathways in metagenomics Proceedings of the National Academy of Sciences, 106 (5), 1374-1379 DOI: 10.1073/pnas.0808022106

Share and Enjoy: