Subscribe  RSS
  • About me
  • Resources

Byte Size Biology

The musings and ravings of a computational biologist about science, computers, music and, you know, stuff


  • Art
  • Biotechnology
    • bioengineering
  • Books
  • computer science
  • Education
  • Film
  • Free Culture
    • creative commons
    • open access
    • open source software
  • Funny
  • Health
    • Cancer
    • metabolic syndrome
  • Music
    • acapella
    • Alternative
    • Blues
    • Classical
    • Disco
    • Dub
    • Electronic
    • folk
    • Hip hop
    • indie
    • Jazz
    • pop
    • Psychedelic
    • Punk
    • Reggae
    • Rock
    • World music
  • Science
    • Astronomy
    • Biology
      • Biochemistry
        • Structural biology
      • Bioinformatics
      • Botany
      • Ecology
      • Entomology
      • Evolution
      • Genetics
      • Genomics
        • Metagenomics
      • Microbiology
      • Molecular biology
      • Neurobiology
      • physiology
      • stuctural biology
      • Systems Biology
      • Taxonomy
      • Zoology
    • Chemistry
    • Computer Science
    • Data Science
    • Earth Science
    • Economics
    • Funding
    • Mathematics
    • Paleontology
    • Physics
    • Psychology
    • Science publication
    • statistics
  • Software
    • programming
  • Technology
    • computers
      • Linux
    • Social media
  • Travel
  • Uncategorized
  • Weird
  • World
  • Writing
    • blogging
    • Comics

Absolut standards: report from the Metagenomics Metadata and Metaanalysis 2009 meeting. Part 1

By Iddo on July 20th, 2009

ResearchBlogging.org

The first metagenomics, metadata and metaanalysis meeting held in Stockholm June 27 2009 was a raging success. People were standing all the way back to the hall jostling for elbow room, while all the other concurrent meetings were pitifully empty after word has made it about how awesome we were.

OK, I may be exaggerating  slightly, since I was the meeting’s co-organizer, co-chair, program committee co-chair, and bartender. (If you were there and you don’t remember me tending bar then I must have done a good job). Well, maybe I wasn’t a bartender. Fine.

ISMB2009-M3_SIG

So what was the meta(genomics, data, analysis) meeting about?

I’ve talked about metagenomics in several earlier posts. Just in case you are a new here: metagenomics is the study of genetic material that comes directly from the environment. It is a technique used to study genetic material from organisms (usually microbes) that cannot be cultured in a lab, and to get a picture of organisms in their natural environment, which often differ from lab clones.

While in genomics we strive to obtain a full picture of an organism’s DNA, in metagenomics we sample the environment for whatever DNA we can get. We are actually merging population biology with genomics. While in population biology our basic unit of study is an organism, in metagenomics it is a DNA sequence. This presents many challenges: properly sampling the microbial habitat and extracting the DNA, understanding which organisms the DNA in the samples came from, gauging sample depth, assembling the sequences, identifying genes, assigning a biological function to those genes, to name a few.  There are many different experimental and computational procedures for doing so, and they should be meticulously documented, as Nikos Kyrpides from the Joint Genome Institute writes in this month’s Nature Biotechnology:

Like molecular biology, genomics has been fueled by the innovative energy
of many interdisciplinary activities. Unlike molecular biology, which has
thrived on the principle of standardized methods and protocols, genomics
has progressed without regard for the critical importance of shared
standards. Now, 14 years since the first complete genome was published
and with more than 900 genome sequences finished, it is astonishing to
observe the lack of standards for so many critical procedures in the
field, ranging from simple data exchange to gene finding, function
prediction and metabolic pathway description.

Now for the kick in the head:

As an example, we compared the genomes of two closely related organisms,
Burkholderia mallei ATCC 23344 (ref. 19) and Burkholderia pseudomallei
K96243
(each sequenced by a different sequencing center)
[...]
we identified 548 genes in B. mallei that are absent from B. pseudomallei
and are potentially related to their different lifestyles. Manual curation
of those 548 genes revealed that, in fact, 497 of them are also in
the B. pseudomallei genome, but there they had not been identified as
'real' genes. The reason for this discrepancy?

The two sequencing centers used different gene finding methods.
The consequence was an almost 90% error rate in the results of our
comparison.

Ouch. Ouch, ouch ouch. And that is not an anecdotal example. Furtehrmore,  it also applies to metagenomics: even more so, since many of the standard operating procedures (SOPs) in metagenomics are still in the process of inventing themselves.

Metadata is the “data about the data”: all the habitat data, SOPs and abiotic data that is in dire need of the standardization Kyrpides writes about.

Last, metaanalysis would be the analysis of genomes and metagenomes. Since the M3 meeting was held under the auspices of the International Society for Computational Biology, it attracted mainly computational biologists — the type to analyze, rather than sample and sequence (but the differences are rapidly blurring, as we saw in many talks).

But things are actually looking better for standards. In 2005 the Genomics Standards Consortium was formed to address this problem. Renzo Kottman from the Max-Planck Institute for Marine Microbiology in Bremen, Germany  talked about software development within the GSC, and specifically about his own project: the Genomic Contextual Data Markup Language, or GCDML. GCDML is an XML-based standard for describing everything associated with a genomic or a metagenomic sample: where it was taken from , under what conditions, which protocols were used to extract, sequence, assemble, finish and analyze the metagenome. Again, my own personal bias here: I am a heavy user of GCDML, as I am writing my own data-insertion software, and have headed such an effort for a while at the University of California San Diego. Here are Kottmann’s slides, and you can also read more about GCDML.

Software Development by the Genomics Standards Consortium

View more documents from Renzo Kottmann.
<div style=”width:425px;text-align:left” id=”__ss_1685987″><a style=”font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;” href=”http://www.slideshare.net/djudge/functional-metagenome-analysis-using-gene-ontology-megan-4-1685987″ title=”Functional Metagenome Analysis using Gene Ontology (MEGAN 4)”>Functional Metagenome Analysis using Gene Ontology (MEGAN 4)</a><object style=”margin:0px” width=”425″ height=”355″><param name=”movie” value=”http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=m3-talk-drichter-090706045218-phpapp01&rel=0&stripped_title=functional-metagenome-analysis-using-gene-ontology-megan-4-1685987″ /><param name=”allowFullScreen” value=”true”/><param name=”allowScriptAccess” value=”always”/><embed src=”http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=m3-talk-drichter-090706045218-phpapp01&rel=0&stripped_title=functional-metagenome-analysis-using-gene-ontology-megan-4-1685987″ type=”application/x-shockwave-flash” allowscriptaccess=”always” allowfullscreen=”true” width=”425″ height=”355″></embed></object><div style=”font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;”>View more <a style=”text-decoration:underline;” href=”http://www.slideshare.net/”>presentations</a> from <a style=”text-decoration:underline;” href=”http://www.slideshare.net/djudge”>djudge</a>.</div></div>

Daniel Richter talked about the [:ttip=”Assigning biological functions to genes” id=”annotation”] functional annotation[:/ttip]  of metagenomes, using Gene Ontology, a technique he developed with Daniel Huson, at the university of Tuebingen, Germany. The Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. It is composed of a vocabulary of some 27,000 terms, with hierarchical relationships defined between them, from the general (“catalytic activity”) to the specific (“phosphatase activity”) to the more specific (“Tyrosine phosphatase activity”). (Graph theory prudes: GO is a DAG, not really a hierarchy, I know, I know). Richter assigns functions to sequences hypothesized to be genes using the Last Common Ancestor approach. LCA works as follows: once a high enough similarity is found between a sequence from a metagenome and a sequence from a reference database, LCA looks for similarities to other, related sequences, where the similarity score is above a certain threshold. It then assign a general function using GO that may fit all.

Functional Metagenome Analysis using Gene Ontology (MEGAN 4)

View more presentations from djudge.

Jack Gilbert from Plymouth Marine Laboratory, Plymouth UK talked about a year of sampling marine microbiome in the Western English Channel. He went through many different sampling and normalization problems.

A Year In the Western English Channel

View more presentations from idoerg.

Tom Matthews from the National Microbiology Laboratory in Canada talked about a profiling pipeline for pathogens. A fast typification of pathogens in case of an outbreak.

Pathogen Profiling Pipeline

View more documents from tom14.

There were more presentations, but I think I’ll give it a rest and get back to them in part 2. I am also waiting for some people to upload their slides…. you know who you are!


Kyrpides, N. (2009). Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream Nature Biotechnology, 27 (7), 627-632 DOI: 10.1038/nbt.1552

Kottmann, R., Gray, T., Murphy, S., Kagan, L., Kravitz, S., Lombardot, T., Field, D., Glöckner, F., & , . (2008). A Standard MIGS/MIMS Compliant XML Schema: Toward the Development of the Genomic Contextual Data Markup Language (GCDML) OMICS: A Journal of Integrative Biology, 12 (2), 115-121 DOI: 10.1089/omi.2008.0A10

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks
Categorized under: Bioinformatics, Evolution, Genomics, Microbiology.
Tagged with: genomics, metadata, metagenomics, microbiology, standards.

Comments are closed.

← Swimming lizards and jamming moths
A Flurry of Red and Green →

Creative Commons License
Byte Size Biology is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

 

Categories

 

 

Tags

Bioinformatics Biology Biopython blogging Blues Books conference Education evolution food function-prediction Funny genomics Health history humor Jazz jobs Mathematics metagenomics microbial ecology microbiology Music NIH open access Open Science open source software Paleontology personal-genomics programming protein-function protein function prediction Python Rock Science science culture Science education science funding science publication sequencing Social media Software structural biology Travel viruses

Recent Posts

  • The Yoda are an Extreme K Species
  • Lively discussion: how to cross-validate?
  • How much do cows offset wind energy savings?
  • NIH scaling back on model organism database funding: what you can do
  • PLoS-1 published a “creationist” paper: some thoughts on what followed

Recent Comments

  • Petruccio on Resources
  • 1xslotszerkala on Resources
  • Peterrep on Resources
  • Humminbirdkli on Resources
  • Darylpaymn on Resources

Other stuff I read

  • Dinosaur Comics
  • Doghouse Diaries
  • Haiku Comics
  • Hyperbole and a Half
  • Information is Beautiful
  • Romantically Apocalyptic
  • XKCD

Science blogs I like to read

  • Blue Collar Bioinformatics
  • Building Confidence
  • CC News
  • Culture Dish
  • Evolving Thoughts
  • Freelancing Science
  • Gas Station Without Pumps
  • Genomics, Evolution, and Pseudoscience
  • Girl Developer
  • I was lost but now I live here
  • iPhylo
  • Mailund on the Internet
  • Medical Writing, Editing and Grantsmanship
  • Microbial Modus
  • Mike the Mad Biologist
  • Molecular modeling blog
  • Monbiot.com
  • Mystery Rays from Outer Space
  • Neurotic Physiology
  • Not Exactly Rocket Science
  • Observations of a Nerd
  • Open Reading Frame
  • Pathogens: genes and genomes
  • Professor Anonymous
  • psique
  • Python for Bioinformatics
  • reportergene
  • Sandwalk
  • Science, Reason and Critical Thinking
  • The Geek Stuff
  • The Language of Bad Physics
  • The Loom
  • Thoughtomics
  • Tree of Life

free counters

 

 

Twitter

  • No public Twitter messages.
Follow this blog

Powered by WordPress and the PressPlay Theme
Copyright © 2025 Byte Size Biology