Archive

Posts Tagged ‘microbiology’

When is it a good idea to cheat?

August 27th, 2010 1 comment

I have written before about bacterial cooperation, and how cheating works, up to a point, in an environment of bacterial cooperation. That post talked about bacterial quorum sensing, the collective signaling mechanism by which bacteria construct supra-cellular structures called biofilms. Biofilms are tough multicellular enclosures that allow bacteria to survive and thrive in hostile environments, and to invade host species. Both studies have shown that freeloading does not pay off. Bacteria who do not chip in to build the biofilm, yet benefit from it are ultimately doomed — and sometimes doom the collective of which they are constituents. That post dealt with the “here and now” aspect of cooperation and cheating.

Life cycle of M. xanthus. Credit: Carla canales / citizendium.org

This post deals with another aspect of bacterial cooperation: how does it evolve? Why cooperate in the first place at all? Every time an individual cooperates, short term gains are sacrificed for long-term ones, but those long-term ones are contingent upon all or most cooperating individuals doing their bit. Think about standing in line to the bus. If everyone cooperates, we get on the bus faster, but some of us may be forced to stand. On the other hand, shoving your way to the beginning of the line will assure you a good seat, albeit at the expense of glares from your fellow-passengers, and maybe a few altercations along the way.  In evolutionary terms, selfishness seems like a sounder strategy than cooperating.  After all, if you manage to gain a better position for yourself in life’s pecking order, you pass those genes that enable that to your progeny, and further down the line. Why cooperate or act selflessly in the first place? Why let someone else share the gene pool with you when you can have it all to yourself?
ResearchBlogging.org
Unless that “someone else” shared genes with you: that is, they were related in some way. Suddenly, cooperation seems to have evolutionary benefits: you are preserving and passing on some of the same genes.  Protecting kin is the most often-used explanation for how cooperation evolved in the first place: kin selection, meaning, favoring cooperation those individuals with which you share a larger number of genes over those who do not. Evolutionary biologists use the Hamilton’s law as a guideline:  the higher the benefit of the cooperation, the lower the cost, and the closer the relatedness of the individuals cooperating, the more likely it is that there will be cooperation. Putting it into an equation, cooperation will evolve if the following condition is met:

rb - c > 0

Where r is the relatedness (on a scale of 0 to 1 where 0: no genetic relation, 1:self), b is the benefit of cooperation and c is the cost. This rule, formulated by William Donald Hamilton is a centerpiece of evolutionary biology. Imagine going on a day’s hunt  where the quarry is a large animal that can feed one hunter for 35 days, but requires at least five hunters to take it down.  Now there are also smaller animals around, that can be hunted by one individual, and they supply enough food for one hunter for two days. Is it beneficial to hunt  alone or together? Let’s figure the benefits and costs. For hunting the large animal, the one that requires at least five hunters, the benefit is a week of food each (b= 35/5 = 7) while hunting for one day (c=1). If the individuals are cousins sharing an average of 20% of the genetic material  then:

0.2×7 – 1 = 0.4 is the benefit score

If they are siblings, sharing 50% of the genetic material, then:

0.5×7 – 1 = 2.5 the benefit score is even higher

But what about individual hunting? The benefit of the smaller quarry is is 2 days worth of food, and one day of hunting, and you do it alone. So r=1 (yourself), b=2 and c=1 giving us:

1×2-1 = 1

In this hypothetical model, a group of siblings will cooperate to hunt big game, while cousins would probably hunt smaller game alone. If you want to dig deeper into how Hamilton’s rule was derived, and further implications of the rule, I recommend this post.

Any mechanism in evolution is examined through the lens of fitness. Fitness is the relative ability to produce and support viable progeny. So if cooperation increases fitness, we can use the following simple graph to explain the difference between a cooperating and a non-cooperating  individuals in a cooperating population using Hamilton’s rule:

Figure 1: Hamilton's rule prediction: the fitness of cooperators (blue) and non-cooperators (red) increases as the number of cooperators among social neighbors (x-axis) increases. The slope of both lines is the benefit (b).

The benefit, b, is the slope of these two lines. The difference is c. Note that for any given frequency of cooperation in the population, the non-cooperating individuals (red line) have a higher fitness than the cooperating ones (blue line). It seems that it “pays off” to be a self-server no matter the social environment you are in, even though you still benefit from being in a cooperating community. Yeah, we all know the type.

But what happens when the difference between cooperating and not cooperating depends on the percentage of cooperators in the population? Not too hard to imagine: if most of the population is playing nicely together and benefiting from it, then this might change the attitude of the selfish individuals more readily then if only a small fraction of the population is cooperating. But as it stands, Hamilton’s rule does not provide for this type of model. However, the following modification of Hamilton’s rule does:

r ⋅ bc + m ⋅ d > 0

Relatedness, r, is now not a scalar (a single number), but a vector (an ordered set of values)  r = {r1, r2, …} describing relatedness under different conditions. Ditto the benefit vector, b. b has the coefficients of the equation describing the fitness of non-cooperators as a function of how many neighboring cooperators there are in the population (red lines). In a linear setting (Figure 1), r = {r1} b={b1} and m⋅d = 0, collapsing the expanded equation into the classical Hamilton’s rule.  We won’t get into m and d in this post, they are important though, and you should read the paper to understand how they play a role

Expanding them from scalars to vectors enables a richer and more flexible description of Hamilton’s rule, allowing to describe non-linear relationships like this:

Figure 2: Note two things. First, the relationship between fitness and the fraction of cooperators in the population is not linear. Second, the difference in fitness between cooperators and non-cooperators decreases as the fraction of cooperators in the population goes up. These two phenomena cannot be described by the classic Hamilton's rule equation. They can be described using the modified rule.

This modification of Hamilton’s rule was developed by Jeff Smith and colleagues, at the department of Biology at Indiana University. Armed with the new equation, Smith and his colleagues decided to see how well it can be applied. They decided to look at Myxococcus xanthus. M. xanthus bacteria behave normally as long as food is abundant: they swim around and proliferate by cell-division as bacteria do. But when starved, they aggregate, and some cells form resistant spores, while the others die. Some cheating strains sporulate  well when in cooperating populations, but do poorly on their own. The scientists mixed a cooperator strain with a cheating strain at different frequencies, starved them, and measured the fraction of each strain in the population of surviving spores. They found the following: first, the fitness effect was non-linear; in fact, it was almost exponential. Second, cooperators were more fit than cheaters at low cooperator frequencies, but cheaters fared better at high cooperator frequencies. So it pays to freeload when most people around you behave nicely. In the case of M. xanthus, the added value to the population is quite high. In fact, the scientists found that cooperation in M. xanthus is very robust and resistant  to cheating:  cheaters were viable (i.e. had a positive fitness)  only with groups that had more than 70% cooperators. So it is only when cheaters have a large cooperating population to buffer their nasty habits that a they can thrive.

Figure 3: Relative fitness of cooperators (blue) and cheaters (red) in a populations with different relative frequencies of cooperators. Note that the fitness scale is logarithmic: the fitness increase is very much non-linear, as in Figure 2.

Moral of this story: if you got to cheat, make sure there are a lot of nice people around. Otherwise it won’t work out very well.  In evolutionary terms, the  trait for cooperation and kin selection has evolved to become strongly entrenched, so much that cheaters can only survive if cushioned by a high frequency of cooperators. Favoring your own and acting selflessly towards them seems to be the way to go, in the case of M. xanthus.


smith, J., Van Dyken, J., & Zee, P. (2010). A Generalization of Hamilton’s Rule for the Evolution of Microbial Cooperation Science, 328 (5986), 1700-1703 DOI: 10.1126/science.1189675

I can’t hear you, the bacteria are too noisy

August 4th, 2010 6 comments

ResearchBlogging.org

Much too noisy. When looking at a population of genetically identical bacteria, the number of proteins they produce varies. The picture below shows the levels of one type of protein that was fused to a green fluorescent protein (so we can see it): clearly there is a variation in how much of the protein each cell produces (“protein expression” in molbio-speak), even though the bacteria are genetically identical. Why is that? In 2006, a group of researchers at the University of California San Diego and Boston University looked at the variation in protein expression in genetically identical bacteria, and what it could mean. They constructed a simple and well-defined computational model first. The researchers  were surprised when their model shows that the variation actually increased when the cell growth and division was slowed or stopped. This prediction seemed paradoxical: if the cells are less active, how come the variation in protein expression increases? Shouldn’t they all be going into some “baseline production mode”? To answer these questions, they took them to the lab. Nicholas Guido and his colleagues engineered bacteria with simple gene networks, where the production of the gene could be induced, repressed and both induced and repressed simultaneously. The gene itself was with green fluorescent protein, so that the more protein is produced, the brighter the cells shine under light. Lo and behold, the computational predictions were correct! (1) the expression of the protein was not uniform (even though the cells were genetically identical) and (2) variation in protein expression increased when the proliferation of the bacterial cells was slowing down or has stopped.

Source: University of California, San Diego

Their explanation to this random noise in protein production: the need for variation to survive. Bacteria often deal with quickly changing conditions: temperature, oxygen concentration, water, chemicals, antibiotics… all these can kill. If the population is identical, what kills one kills all. But if even within a genetically identical population there are variations in protein level expression, then the population is not phenotypically identical even though it is genetically so. Some bacteria may survive the dry spell, the heat or — what concerns us quite a bit — the onslaught by antibiotics. The random population variation or “noise” in protein level expression is an evolutionary survival mechanism.

Fast forward from 2006 to last week. In a brilliant work published in Science, Yuichi Taniguchi and colleagues from Harvard University and University of Toronto looked at individual E. coli cells for protein expression.  They used examined different strains, 96 at a time using a microfluidic chip. Each lane on the chip has room for a single cell, enabling them to quantify the levels of proteins in single cells from the same or different strains very quickly. Taniguchi and colleagues examined 1018 different genes in E. coli which covers about 25% of the genome.  Like Guido and colleagues, Taniguchi and colleagues also found a large variation in the expression of the same protein in different cells which were otherwise genetically identical, no matter what the protein was. They  also found that different kinds of proteins were produced in different distributions in the cell. They also measured was noise: how much randomness went into the production of proteins. What they found were two kinds of noise: one type of noise was from proteins that were produced in small numbers (less than 10 molecules  per cell) the more protein produced, the lower the variation in protein production, or “noise”, between cells.  A second type was from proteins that were produced in larger numbers. For those,  there is a “noise floor”: the fluctuation in protein production  does not decrease below a certain point, and there is less fluctuation in proteins that are produced in high numbers than in those produced in low numbers. This means that the cellular mechanisms of protein degradation and/or production control may hit some sort  of steady-state once protein production reaches a certain level.

A young grad student fascinated by bacterial protein and mRNA noise

They did not stop by looking at proteins, though. In each cell, they also looked at the level of mRNA coding for that protein. mRNA production numbers are also very noisy: actually, noisier than those of protein.  But surprisingly,  Taniguchi discovered that when looking at single cells, mRNA and protein levels do not correlate. Not even a weak correlation, and no matter what the protein. The high noise levels and lack of correlation in expression can be explained by the different lifetimes of protein vs. mRNA. mRNA is quickly degraded in the cell, while proteins may outlive cell-division. mRNA is produced in short bursts, “lives fast and dies fast”, while buffering protein levels from high fluctuation levels.

Looking at these studies together, we can say that there is a lot of noise in the system, but it serves a purpose: not only on the selection level (as discovered in 2006), but also  on the systems level (as shown last week). On the selection level, noise fosters differences between individuals, which gives at least someone from the bacterial population a chance to survive if conditions change drastically. It is less clear what is happening in the level of the intracellular system: for proteins expressed in large numbers, it seems like there is some external control mechanism at work that keeps noise above a certain level.

How does mRNA and protein production  noise then propagate, say, across gene expression pathways, when one protein can cascade the production of many others? How much is noise a control mechanism on a cellular and cellular-ensemble level? Are there “noise clamping” and “noise amplification” mechanisms that need yet be discovered? The Taniguchi study hints that there are, and the Guido study strongly suggests that they are affected by the environment. I think we are only beginning to hear the noise bacteria make.


Taniguchi, Y., Choi, P., Li, G., Chen, H., Babu, M., Hearn, J., Emili, A., & Xie, X. (2010). Quantifying E. coli Proteome and Transcriptome with Single-Molecule Sensitivity in Single Cells Science, 329 (5991), 533-538 DOI: 10.1126/science.1188308

Guido, N., Wang, X., Adalsteinsson, D., McMillen, D., Hasty, J., Cantor, C., Elston, T., & Collins, J. (2006). A bottom-up approach to gene regulation Nature, 439 (7078), 856-860 DOI: 10.1038/nature04473

Computational Bridge to Experiments

June 8th, 2010 Comments off

A bit of background information: this is a meeting I am really happy to be part of, and even more so honored to be a co-organizer. One of my main scientific interests is the prediction of the function of genes and proteins of unknown function.

Some background information: we have sequenced more than 1000 genomes of microbes, and hundreds of plants and animals. Additionally, we have millions of partial DNA sequences, RNA sequences, proteins, genomic fragments and millions of genes sequenced from metagenomic data. Problem: for most of these sequenced genes, we do not know what they are doing. That’s right:  most of the sequence data that we have is just that: data. Not information. We are amassing an ever-growing collection of books that are written in a mostly incomprehensible  language. We know (or “educatedly guess”) where the words in those books (the genes)  are located, because we have sequence signals that indicate where the bits of the DNA that code for genes is. For some of the words, we know the meaning. But in many cases, (and by some estimates in most cases) we fail to understand the meaning of the words (genes) in those books (genomes). Drawing further on the book<–>genome and gene<–>word metaphor, we sometimes know one  meaning of a word, but we all know that words in human languages can hold different meanings, depending on context. “Whatever floats your boat” can be read literally, but more often this particular collection of words in this order is a figure of speech. The same thing goes for genes: a gene may code for a certain enzyme, catalyzing a simple chemical reaction. But in another context, it may perform  developmental function for the whole organism, which has different implications than just the biochemical level.

Where's one of those when we need them?

We can’t just rely on computational means to find out what’s doing what. Bioinformatics can help us annotate genes that are similar to those already discovered, and in some cases give us new insights to the function of unknown genes. But for truly novel functions, and to known whether our boat is real or a metaphor for “what works best” we may need to run experiments. And we need a good collaboration between those who do the computational work, and those who do the experimental work in identifying which are the most important books to look at, and what words in them we need to decipher first.

The COMBREX meeting aims to start this large-scale and long-term decoding, a collaboration between experimentalists and computational biologists.
Note that the COMBREX workshop is part of the larger Microbial Genomics meeting at Lake Arrowhead, California.

Here is the announcement. Feel free to cut & paste and forward:

Announcing the first COMBREX Workshop for Computational and Experimental Determination of Protein Function. September 15, 2010 Lake Arrowhead, California USA

COMBREX (Computational Bridge to Experiments) is a new NIH funded effort that aims to increase the pace of experimental determination of the function of large and high priority gene families in bacterial genomes. The Principal investigators are  Richard Roberts (New England Biolabs) Simon Kasif (Boston University) and Martin Steffen (Boston University), this effort will form a consortium of experimental and computational biologists that would collaborate directly to test the predicted functions or specificity of high-priority genes.
Central to this effort would be the creation of a community web-based database that will  allow computational and experimental scientists to communicate easily and assist experimentalists in identifying high-priority genes with high-quality computational predictions. Experimentalists will be able to submit bids (proposals) to validate individual predictions, and if successful, will receive modest funding from COMBREX to perform the validation.
The website can be found at http://combrex.bu.edu/ .
A workshop to discuss issues related to the formation and operation of COMBREX will take place on Wednesday, September 15, 2010, as part of the 18th Annual International Meeting on Microbial Genomics at Lake Arrowhead, CA, outside of Los Angeles. A preliminary program can be found at http://www.mimg.ucla.edu/arrowhead2010/program.html (COMBREX is formerly SciBay). Confirmed speakers include Richard Roberts, Simon
Kasif, Manuel Ferrer (CSIC, Madrid), Patricia Babbit (UCSF), John Gerlt (Illinois), Peter Karp (SRI), Alexander Yakunin (Toronto), Steven Brenner (UC Berkeley) and Bruno Sobral (Virginia Tech).
The morning session will provide an overview of COMBREX, including both the experimental and computational challenges, related talks, and a
description of topics to be discussed by breakout groups. These groups will convene in the afternoon to discuss the topics and prepare a short summary, for presentation to the entire workshop after dinner.
Topics to be discussed by the breakout groups will roughly divide into the following areas: (1) whole genome annotation, (2) assessment of computational predictions, (3) use of structure to predict function, and (4) infrastructure for function annotation. General topics to be discussed include:
1. How to prioritize predictions?
2. How to evaluate experimental bids?
3. How to handle non-enzymatic proteins?
4. How best to handle predictions/phenotypes from high-throughput experimentation?
A key desired outcome of the workshop is the identification of opportunities and catalysis collaborations between computational and experimental biologists.
We hope you will be able to join us for this event. You can register at: http://www.mimg.ucla.edu/arrowhead2010/registration.html
For further information please contact the organizers:
Co-chairs: Martin Steffen, Boston University, steffen ‘at’ bu ‘dot’ edu
Iddo Friedberg, Miami University, i.friedberg ‘at’ muohio ‘dot’ edu

Steering Committee: Simon Kasif and Richard J. Roberts

A non-post about Craig Venter’s new bug

May 21st, 2010 Comments off

ResearchBlogging.orgIn case you have been vacationing in a parallel universe in the past two days, you should have heard about the new synthetic bacterium created at the J Craig Venter Institute. In a nutshell, the scientific team synthesized an artificial chromosome of the bacterium Mycoplasma mycoides and transferred it to another bacterium, Mycoplasma capricolum. The capricolum cells with the mycoides genome proved viable, and were named Mycoplasma mycoides JCVI-syn1.0. Even more briefly they synthesized Bug A’s DNA from scratch, put it in bug B, turning B into A.

I wanted to write a blog post about it. I really did. Something original, inspiring, funny, critical and deep. But so many others beat me to it, so no matter what angle I took, it’s already been covered in the last 24 hours. Informative? Yes. Debateable achievement? Yes yes, and yes. Thoughts from bigshots? Yes. Funny? totally. Religiously suspect? Verily. Government weighing in? Naturally. Reddit? Yes, even Reddit! (Thanks Shirley!)

So here’s the interview Science journal conducted with Craig Venter:

Or, if you’d rather, the Scorpions’ comment:

And the paper in Science. I’m done.


Gibson, D., Glass, J., Lartigue, C., Noskov, V., Chuang, R., Algire, M., Benders, G., Montague, M., Ma, L., Moodie, M., Merryman, C., Vashee, S., Krishnakumar, R., Assad-Garcia, N., Andrews-Pfannkoch, C., Denisova, E., Young, L., Qi, Z., Segall-Shapiro, T., Calvey, C., Parmar, P., Hutchison, C., Smith, H., & Venter, J. (2010). Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome Science DOI: 10.1126/science.1190719

Comparative Functional Genomics: Penguin vs. Bacterium

May 4th, 2010 2 comments

No, not the flesh-blood-and-feathers penguin, but rather Tux, the beloved mascot of the Linux operating system. Compared with Escherichia coli, the model organism of choice for microbiologists.

We refer to DNA as “the book of life”; some geeks refer to it as the “operating system of life”. Just like in a computer’s operating system, DNA contains all the instructions on how to “execute” life and to keep things humming.  Many genes make proteins or RNA than act as switches to activate the synthesis of other proteins, sometimes in a two- three- or higher level hierarchy.  These switches are conditional, based on environmental conditions, or whether it’s time to replicate the DNA and divide into two daughter cells, and so on. Some genes activate the transcription of other genes, but are not regulated themselves by other genes, those can be dubbed  “master regulators”. Some genes are both activated by other genes, and activate other genes themselves: “middle management”. Finally, there are genes that are activated, but do not regulate other genes: the “workhorses”. This information, known as the transcriptional regulatory network exists for 1,378 genes of the E. coli bacterium.
ResearchBlogging.org

Paralleling this in Linux, there are programs that call other programs; again, in a hierarchical fashion.  According to the calling structure, they also can be dubbed Master Regulators (calling other programs but not being called themselves), Middle Management (calling other programs and being called), and  Workhorse (only being called).

Koon-Kiu Yan and his colleagues from Yale mapped the program call graph in Linux by setting each program as a node and drawing lines to the programs that call it, and to the programs it calls. They did the same thing for E. coli‘s transcriptional regulatory network. Here are the graphs they got:

So it seems like Linux is middle-management heavy, whereas E. coli is workhorse heavy. 30% of Linux programs are top management, as opposed to only 5% in E. coli.

Looking at the actual functions for the genes/programs, it seems that Linux programs also have much more of a  functional redundancy than in E. coli: 3.5% of E. coli‘s genes have “reusable” functions, as opposed to 8.4% of Linux programs. But if we look at entire working subgraphs of these two graphs, the subgraph overlap in Linux is 87%, whereas in E.coli the overlap is only 4.3%. This means that the division of labor in E. coli is much more distinct than in Linux. There are many ways of activating the same hierarchy in Linux, but in E. coli there is rarely more than one way to do it. Note that Linux is top-heavy, whereas E. coli has a pyramid-like structure. It is pretty obvious that the Workhorse modules in Linux go through heavy reuse while those in E. coli do not.

The scientists then decided to look into how these two networks developed.  The oldest genes in E. coli are the Workhorses, whereas the regulatory genes in middle and top management arrived more recently. In contrast, the newest programs — the most heavily rewritten ones– in Linux are the Workhorses, whereas the ones in the management echelons are  less changed than their predecessors. The oldest programs are those that are in Middle Management. they are also the most abundant type in Linux’s call graph.

Who are the Workhorses in E. coli? Those are mostly enzymes, the proteins that catalyze specific biochemical functions.  As a rule, enzymes are very specific: an enzyme would catalyze only one type of reaction, and only with a very specific chemical (substrate). Examples are enzymes that break up sugars: there is a specific enzyme for every type of sugar molecule. Who are the Workhorses in Linux? Those are the functions that get used all the time in thousands of different programs: strlen (measuring a character string’s length) or malloc (allocating memory for a data structure).  The Workhorses in Linux are non-specific while the Workhorses in E. coli are very specific.

So how to account for these differences? Nothing in biology makes sense except in light of evolution, and we have to look to the evolutionary history of both the bacterial and the computational systems for answers.  The major constraint in E. coli‘s evolution is fitness. If something breaks down in E. coli‘s Workhorse it wont get passed on to the next generation: the cell with the lethal mutation would never reproduce and will get thrown into Darwin’s rubbish bin. This leads to single-function workhorses because a multi-functional Workhorse would be too prone to messing too many systems up when it  mutates, and would never make it to the next generation, which is why the Workhorses in  E. coli‘s call graph have a lower connectivity that those in Linux’s call graph.

The authors conclude that the E. coli‘s call graph evolved bottom-up, with system robustness being the main selective trait. In contrast, Linux evolved top-bottom, with reusability of the Workhorses being the main selective trait. Reusability and robustness are tradeoffs. In the case of a man-made system like Linux, bugs in reusable modules are is not a problem, since Workhorse bugs are easily fixed in the next release. It is much less costly, in coding time, to tweak existing Workhorses than to build new ones.  Mutations in reusable workhorses in E. coli would weed out those kinds of proteins from the gene pool, and therefore E. coli‘s Workhorses are not reusable.

I’m not exactly sure what insight we can get by comparing natural vs. man-made networks. But hey, sometimes science is not about insight – sometimes is just about being totally cool; and The Coolness is strong with this work.


Yan, K., Fang, G., Bhardwaj, N., Alexander, R., & Gerstein, M. (2010). Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.0914771107

Combrex: Computational Bridge to Experiments

May 4th, 2010 Comments off

Combrex is an exciting new project at Boston University to bridge computational and experimental techniques to functionally annotate proteins. They are hiring, see below:

JOB POST

We are seeking to hire a creative computational scientist for a
transformative project: COMBREX: A Computational Bridge to Experiments.

The work will involve building a novel resource that combines databases,
science, social networking and machine learning.

The position is available immediately.

For some preliminary information pls. see

www.combrex.org

BS or MS in Computer Science, Informatics, Engineering or related field is required.

Applicants with PhD’s would be considered for a separate Research Associate position.

Pls send CV and names (emails) of two references to:

Prof. Simon Kasif

kasif@bu.edu

Subject Line: COMBREX POSITION

A sh*tload of data

March 4th, 2010 1 comment

ResearchBlogging.org
This post was chosen as an Editor's Selection for ResearchBlogging.org

There are more microbial cells in our body than our own. Those microbes are not just passive hitchhikers or conversely, malicious agents of disease. They affect our well-being and health in a much broader spectrum than simply “bad” or “passive”. Among other things our gut microbes play an important role in digestion, have been linked to obesity, conditions as severe as certain inflammatory bowel syndromes or as relatively mild as traveler’s diarrhea. The microbes living on our skin also affect many things: from body odor and dandruff and acne to dermatitis and psoriasis. Also, being the most exposed microbial population means that they are themselves affected by our constant exposure to various agents,and their population varies by our own behavior –  such as how often, and with what, do we wash our hands: antibacterial soap has been named as a major culprit in the development of resistant bacterial strains. In all organs the native flora, relatively benign, protects us against colonization by more virulent bacteria.

Indeed, our gut, skin, mouth and genital microbiomes can be viewed as  additional organs in the way that they affect us. If you take a long course of powerful systemic antibiotics, the ensuing diarrhea and sometimes mouth thrush are the result of these “organs” — in your mouth and gut — being removed temporarily from your body.

Credit: David Gregory&Debbie Marshall. Wellcome Images images@wellcome.ac.uk

Today, another big step has been taken towards understanding the role of our microbiomes play. Just how big, in what direction, or what will be the consequences of this step is unclear, and will remain unclear for quite some time. A group of Chinese and European researchers have published the largest sequencing effort yet of gut bacteria. Their current yield, 576 Gigabases of DNA from the feces of 124 European people is considerably larger than the previous large effort in the US: 3GB from sequenced from 33 US and Japanese adults. Also, Qin and colleagues looked at obese, lean, healthy and sick (inflamatory bowel syndrome) individuals. They identified 1,000 to 1,100 different bacterial species, with about 160 different species per individual. Healthy individual’s bacterial population was markedly different that those with inflammatory bowel syndromes, and the populations of those with IBS differed depending on disease type.

What does this all mean? Well, like in the human genome project, it will take a while before  we understand not only what these data contain, but what are the limits of our ability to interpret them, and how best they can serve us. What we have right now is the equivalent of the outline of a newly discovered continent. It is up to many individual explorers to discover and chart the myriad living things in terra excreta. Which genes are most associated with obesity or with ulcerative colitis? How prevalent is gene transfer between gut bacteria, and how much of a role does this play in antibiotic resistance? Are there microbial species more prone to changes in their genomes than others? Are there metabolic pathways that are shared between different microbial species? Are some genes faster evolving than others, what would be the ecological role different species play in the gut? And how do different microbial populations ultimately affect their human hosts? Do different bacterial species have a preference with whom they share our guts? These broad questions can be broken down into individual questions relating to a lab’s pet genes, metabolic pathways, microbial species and metabolic conditions. There is a lot to sift through here and these data will keep us busy for years to come.

Pathogenic E. coli on the intestinal lining. Credit: S. Schuller.

In other blogs Carl Zimmer has a great post on how he, for one welcomes our bacterial overlords, while Ed Yong talks about the science of things to come.

Oh, and for those of you who, like me, can’t wait to plug the data into your favorite analysis pipeline, you can get the gene and assembled and annotated sequence data data from Peer Bork’s lab at the European Molecular Biology Laboratory, or the Beijing Genome Institute. The raw sequence data are available from the European Bioinformatics Institute under the accession ERA000116. (I can’t find the latter myself right now, hopefully they will show up in a couple of days). Update March 7, 2010: the sequences are now deposited at EBI. Also, on BioTorrents (hat tip to Morgan Langille).


Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., Mende, D., Li, J., Xu, J., Li, S., Li, D., Cao, J., Wang, B., Liang, H., Zheng, H., Xie, Y., Tap, J., Lepage, P., Bertalan, M., Batto, J., Hansen, T., Le Paslier, D., Linneberg, A., Nielsen, H., Pelletier, E., Renault, P., Sicheritz-Ponten, T., Turner, K., Zhu, H., Yu, C., Li, S., Jian, M., Zhou, Y., Li, Y., Zhang, X., Li, S., Qin, N., Yang, H., Wang, J., Brunak, S., Doré, J., Guarner, F., Kristiansen, K., Pedersen, O., Parkhill, J., Weissenbach, J., Antolin, M., Artiguenave, F., Blottiere, H., Borruel, N., Bruls, T., Casellas, F., Chervaux, C., Cultrone, A., Delorme, C., Denariaz, G., Dervyn, R., Forte, M., Friss, C., van de Guchte, M., Guedon, E., Haimet, F., Jamet, A., Juste, C., Kaci, G., Kleerebezem, M., Knol, J., Kristensen, M., Layec, S., Le Roux, K., Leclerc, M., Maguin, E., Melo Minardi, R., Oozeer, R., Rescigno, M., Sanchez, N., Tims, S., Torrejon, T., Varela, E., de Vos, W., Winogradsky, Y., Zoetendal, E., Bork, P., Ehrlich, S., & Wang, J. (2010). A human gut microbial gene catalogue established by metagenomic sequencing Nature, 464 (7285), 59-65 DOI: 10.1038/nature08821

Filling in the evolutionary blanks, genome by genome

December 23rd, 2009 8 comments

ResearchBlogging.org

After hearing Jonathan Eisen and Nikos Kyripdes talk about GEBA in various meetings, it is great to see the paper finally come out, and under a CC license too. Good move for everyone.

GEBA is the Genomic Encyclopedia of Bacteria and Archaea. The idea is simple: we have >1000 prokaryotic genomes in GenBank as of today.  But those were sequenced under a myriad of interests: clinical, functional, ease, biotechnological or pharmaceutical potential, etc.  In evolutionary terms, those 1000 genomes provide a very biased view of the tree of microbial life. That would be like sampling mammalian life in Europe and North America only: you would miss out on most big cats, Elephants, Rhinos, not to mention all the marsupials. To correct this situation, teams from the  Joint Genome Institute,  UC Davis and several others set out to perform a more uniform sampling across the tree of prokaryotic life. The first batch of 56 genomes from GEBA is published today in Nature; fifty-three bacterial and three archaeal.

Maximum-likelihood phylogenetic tree of the bacterial domain based on a concatenated alignment of 31 broadly conserved protein-coding genes. Phyla are distinguished by colour of the branch and GEBA genomes are indicated in red in the outer circle of species names. Click to open original in Nature.

It seems that they are on the right track to enrich our understanding of bacterial genes and genomes using this phylogenetically-mindful sampling strategy.  For example, they show that their sampling enables the discovery of an average of 1,060 protein families/genome. Sampling a single bacterial family would provide 121 new protein families, sampling within a bacterial phylum would give an average of 308 new protein families, and within a bacterial domain, 650. They have discovered a total of 1,798 families that seem to have no similarity to any existing family, hinting at new bacterial functionality (or maybe some new prophages?) They have  discovered a few new cellulases, genes that break down cellulose, the polymer that makes up plant cell walls. Cellulases are the holy grail of the biofuel prospecting industry: specifically,  a cellulase that can be exploited en-masse to turn plant matter into fuel economically. They also discovered a homolog of Actin, a cytoskeletal protein thought until now to only exist in eukaryotes.

One thing that is sorely missing is accessibility. Yes, the individual genome papers are all published in SIGS and in Nature under open access, which is great. But when you go to the GEBA site, you get a simple description of the candidate genomes. The annotations are somewhere behind a password-protected site, but I could not seem to get an account to view them. A proper genomic browser for the sequenced and annotated genomes, with some phylogenetic map showing who is located where on the tree would go a long way towards  helping the rest of us explore this new comprehensive picture of prokaryotic genome space.

Finally, if you want to hear more about how they did what, here’s Eisen talking about GEBA.


Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N., Kunin, V., Goodwin, L., Wu, M., Tindall, B., Hooper, S., Pati, A., Lykidis, A., Spring, S., Anderson, I., D’haeseleer, P., Zemla, A., Singer, M., Lapidus, A., Nolan, M., Copeland, A., Han, C., Chen, F., Cheng, J., Lucas, S., Kerfeld, C., Lang, E., Gronow, S., Chain, P., Bruce, D., Rubin, E., Kyrpides, N., Klenk, H., & Eisen, J. (2009). A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea Nature, 462 (7276), 1056-1060 DOI: 10.1038/nature08656

Photosynthesis, phages and structures: there’s treasure everywhere!

November 24th, 2009 Comments off

ResearchBlogging.org

Here’s a really cool work, published this September in Nature.. Why did I choose this work?  Well, it’s a major discovery, and it’s all done using bioinformatics, and fairly simple bioinformatics at that. The power of metagenomics and bioinfromatics: in a mass of data you just have to know what you are looking for, and how to look for it.

Obviously not CC licensed, but I couldn't resist using this very appropriate strip

Obviously not CC licensed, but I couldn't resist using this very appropriate strip

Viruses as a bacterial genetic mechanism

Viruses follow some interesting and sometimes convoluted evolutionary paths.  One is “infect quick, reproduce fast, and make sure you can get to the next host before you kill this one”.  That is pretty extreme: smallpox was doing that, when there was smallpox. Ebola is doing that, but not very well: killing the host too quickly means that the disease is contained, especially in rural areas. Another strategy is: “slow and easy wins the race”. The herpes virus does that. Not lethal, but laying dormant in the central nervous system, it is  infectious, but rarely causes anything more than they occasional cold sore (which admittedly, is painful and disturbing). Still, it manages to infect up to 90% of the human population, most of which are completely unaware they harbor it, and would never develop any symptoms.

Most of the viruses on earth don’t infect humans, nor animals, nor plants. They infect microbes, where the same spectrum of evolutionary strategies applies. Some attack quickly, killing the microbial population they infect. Other can remain dormant for a long time. It is becoming clear to us that bacterial viruses or bacteriophages, are responsible for a large portion, if not the majority, of genetic variance in bacteria. In fact, viruses are a major component in bacterial genetics. The mechanism is called transduction, and it is illustrated below. Bacteriophages pick up DNA from bacteria they infect, and transfer it to other bacteria, creating genetic variance in the bacterial population.

Generalized transduciotn. Source: Indian River State College

Generalized transduction. Source: Indian River State College

Viral transduction also adapts

But viral transduction does not just carry random genes. Natural selection favors transduced genes that increase the bacterial host’s fitness. Because when a bacteria is infected by a virus, its protein making machinery is used to make viral genes. But when the viral genes include genes that are beneficial to the host as well, then everybody wins: the phage-infected bacterial species gets genes which enable it to compete better for resources with other bacterial species, while the phage gets a larger number of hosts to infect. Of course, this has to go hand in hand with a relatively benign virus that remains dormant long enough to let the bacterial host species enjoy the benefits of the transduced genes.

Such is the case of cyanophages and cyanobacteria. Cyanobacteria are photosynthetic bacteria, and cyanophages are the viruses that infect them. Several studies have shown that cyanophages have acquired whole photosynthetic genes from bacteria. Viruses do not photosynthesize, but when they infect cyanobaceria, the viral photosynthetic system is added to the bacterial one, boosting bacterial photosynthetic activity and ultimately increasing bacterial energy production.

The photosynthetic mechanism is  divided into two components: photosystem I and photosystem II (PSI and PSII). For a few years now, PSII has been known to be transduced by cyanophages.

A  more recent study by Itai Sharon and colleagues published in Nature this September shows that PSI proteins are also tranduced by cyanophages. Also, it seems like the viral PSI has some interesting properties that may make it advantageous over the cyanobacterial PSI. Two proteins in the bacterial PSI are called PsaJ and PsaF.  They found that the homologous protein in cyanophages is a fusion of the two, PsaJF. When they modeled an insert of PsaJF into the bacterial photosystem I it seemed that the bacterial PSI with the viral insert can now function more efficiently than the the original bacterial PSI. As a rule, PSI is a system that accepts electrons from PSII via a protein called plastocyanin. The donated electrons are excited by light, and the energized electrons are used to synthesize ATP and NADPH, the energy coinage of the cell, which are used to synthesize sugar from CO2. However, when the bacterial PsaJ and PsaF are replaced by the viral compound PsaJF, it seems like plastocyanin does not have to be the only electron donor to the newly minted virally-donated PSI. This means that the PSI may now accept electrons not only from plastocyanin, but from other electron-carrying proteins as well. Such proteins that are involved in the respiratory system, for example, which also donate electrons. The advantage of such a setup is that electrons whose reducing power would otherwise go to waste, got through PSII for formation of extra NADPH and ATP. Sharon and colleagues do not prove all this experimentally, but they make a pretty strong case, citing some analogous cases.

Electron transport from PSII to PSI via plastocyanin

Electron transport from PSII to PSI via plastocyanin. Source: wikimedia commons.

a, The structure of T. elongatus PSI (subunits) was illustrated by PyMOL (http://pymol.sourceforge.net/) using a PSI monomer (adopted from Protein Data Bank (PDB) accession 1jb0). PsaF is in magenta, PsaJ is in blue, and all of the other subunits are in green. b, A model for the structure of the viral PsaJF fusion protein (red) substituting the original PsaF and PsaJ subunits. Reproduced under NPG Liceensing terms for non-commercial / educational purposes

a, The structure of T. elongatus PSI (subunits) was illustrated by PyMOL (http://pymol.sourceforge.net/) using a PSI monomer (adopted from Protein Data Bank (PDB) accession 1jb0). PsaF is in magenta, PsaJ is in blue, and all of the other subunits are in green. b, A model for the structure of the viral PsaJF fusion protein (red) substituting the original PsaF and PsaJ subunits. Reproduced under NPG Licensing terms for non-commercial / educational purposes. doi:10.1038/nature08284

Like I said,  this work is purely bioinformatics. They basically mined the Global Ocean Survey metagenomic data, over six million sequences from marine microbes collected by the J. Craig Venter Institute which I mentioned in another post. They then identified sequences that contain PSI genes, and sifted through those to find sequences that also contain genes that are exclusively viral. Having both a PSI gene and a viral gene on the same DNA clone ensures they were taken from a virus. I’m not sure how they did the structural modeling and insertion of the PsaJF. This seems to be missing both from the Nature article, and the supplementary material. Yes, it’s one of those Nature works with 3 pages of article, and 28 of supplementary. Great read though, there’s treasure everywhere.


Sharon, I., Alperovitch, A., Rohwer, F., Haynes, M., Glaser, F., Atamna-Ismaeel, N., Pinter, R., Partensky, F., Koonin, E., Wolf, Y., Nelson, N., & Béjà, O. (2009). Photosystem I gene cassettes are present in marine virus genomes Nature, 461 (7261), 258-262 DOI: 10.1038/nature08284

Lindell, D., Jaffe, J., Johnson, Z., Church, G., & Chisholm, S. (2005). Photosynthesis genes in marine viruses yield proteins during host infection Nature, 438 (7064), 86-89 DOI: 10.1038/nature04111

Blog Action Day: the Methane Pulse

October 15th, 2009 2 comments

ResearchBlogging.org

Blog Action Day focuses this year on climate change, which, like everything else on this planet, is also a microbial matter. Howzat? Methane (CH4) is a greenhouse gas which has heat retention capability 23 times of that of CO2.  Soil methanogens are the chief global producers of methane. There are an estimated 7.5x 109 tons of methane trapped in a frozen peat bog in West Siberia which constitute 25% of the estimated methane trapped in soil and ice-age permafrost worldwide. Due to global warming, this permafrost is melting, releasing methane, which in turn contributes to global warming in a vicious cycle. The Nature paper, and an article in TerraNature.

Not only there, but trapped methane in the melting Arctic Ocean is also being released.  The ocean floor permafrost is melting,  clouds of gas bubbles are welling up in “methane chimneys”

These “methane chimneys” sometimes contained concentrations of the gas 100 times higher than background levels and were so large that clouds of gas bubbles were detected “rising up through the water column,” Orjan Gustafsson of the Department of Applied Environmental Science at Stockholm University and the co-leader of the expedition, said in an interview. There was no doubt, he said, that the methane was coming from sub-sea permafrost, indicating that the sea bottom might be melting and freeing up this potent greenhouse gas.

Susan Q. Stranahan, environment360

This may be the only permafrost we will have in a few years

This may be the only permafrost we will have in a few years

The concern is that methane release might lead to a tipping point in global climate change: flipping a switch rather than turning a dial. At some point, global warming might turn into a runaway scenario when a critical concentration of atmospheric methane is reached. Martin Kennedy and colleagues at the University of California, Riverside claim that this is how Snowball Earth has ended 635 million years ago: a rapid warming period following a runaway positive feedback prompted by a methane pulse.

The effects of permafrost thaw in Dawson City, Canada

The effects of permafrost thaw in Dawson City, Canada

How big a problem is this? Big.  We have only recently begun to understand the magnitude of the role of methanogens in soil chemistry. It is very large.  Even in arctic climes, cold adapted methanogens are active at below 0C temperature, down to -20C. However, a study conducted by Dirk Wagner and colleagues shows that a 3 degree rise in soil temperature from -6C to -3C  would increase methane production dramatically. This means that not only trapped methane will be released due to soil thawing, but also that methane production itself will increase due to more favorable growth conditions for soil methanogens. So permafrost thawing hits the atmosphere with a double-whammy of methane release, supporting the concern about a runaway positive-feedback cycle that  will cause sudden climate change.

The Return of Dr. Permafrost

Dr. Permafrost may actually be the hero here, rather than the villain


Walter, K., Zimov, S., Chanton, J., Verbyla, D., & Chapin, F. (2006). Methane bubbling from Siberian thaw lakes as a positive feedback to climate warming Nature, 443 (7107), 71-75 DOI: 10.1038/nature05040

Kennedy, M., Mrofka, D., & von der Borch, C. (2008). Snowball Earth termination by destabilization of equatorial permafrost methane clathrate Nature, 453 (7195), 642-645 DOI: 10.1038/nature06961

WAGNER, D., GATTINGER, A., EMBACHER, A., PFEIFFER, E., SCHLOTER, M., & LIPSKI, A. (2007). Methanogenic activity and biomass in Holocene permafrost deposits of the Lena Delta, Siberian Arctic and its implication for the global methane budget Global Change Biology, 13 (5), 1089-1099 DOI: 10.1111/j.1365-2486.2007.01331.x

Freeloading pays off, but only up to a point.

August 25th, 2009 6 comments
This post was chosen as an Editor's Selection for ResearchBlogging.org
Quorum sensing

Social behavior is not exactly the first term that comes to mind with relation to microbes. After all, we assume a certain amount of intelligence and an ability to implement a behavioral pattern in response to peer actions. Humans, yes. Apes, yes. Birds of a feather flock together… so birds, yes. Ants and bees and other social insects, sure.  But bacteria?

Yes, bacteria are social creatures: they can cooperate as a community. For example, many bacteria live in a biofilm,  a tangled matrix of polymeric substances that includes proteins, DNA and polysaccharides. Biofilms constitute tough physical barriers that are immune to attacks by many antibiotics and other bacteriocidal agents. Indeed, many of the harder to treat infectious diseases are a result of the formation of biofilms in our bodies.  A biofilm is analogous to a bunch of humans banding together, and deciding that instead of living  in dispersed separate dwellings, they will all live together in  a walled city that is easier to defend from attacks.

To achieve this cooperation, each bacterial cell starts by sending a signal.  A molecule that says: “Yoohoo, I am here and I can help build a biofilm. Let me know if others are interested”.  As more Bacteria send this molecular signal, its concentration in the environment grows. The sending bacterium also senses the environmental concentration of this signal.    At some point, the concentration of the “yoohoo” signal reaches a certain threshold, and now each bacterium is convinced that rolling up its tiny sleeves and helping build a biofilm is actually a good use of its time and metabolic resources. The bacterial cell now begins to release biofilm building components, under the assumption that everybody around it is doing the same: after all, there is  a lot of yoohoo signaling going on. This method of signalling is called quorum sensing (QS). Quorum sensing is used not only for biofilm construction, but for other group activities by bacteria. Secretion of virulence factors that damage the host, or molecules for scavenging nutrients. All these activities that are also community based.

Biofilm Credit: AJC1 on Flickr

Biofilm. Credit: AJC1 on Flickr

Freeloaders

But wherever there is community work to be done, there is the danger of  freeloaders: those who benefit from the labor of the community, but provide little or no input themselves. Are bacterial communities an exception? This question has been asked by several research groups, experimental and theoretical.

In 2007, Stephen Diggle and his colleagues have created two types of QS-related Pseudomonas aureginosa mutants. First, those who do not send the signal, hence they make no effort in propagating the information that a biofilm is being constructed (signal-negative).    The second type produce the signal, but not the necessary products for constructing the biofilm (signal-blind).  They then examined how well these mutants did alongside regular bacteria, in a stressful environment that facilitates the creation of biofilms. They started a culture with a small percentage os signal-negative and signal lind mutants (1-3% of the total population).  Both types of cheating bacteria proliferated rather well, rising up to 45% and 66% of the populations respectively. But once cheats grew more common, their ability to proliferate of their fitness declines. Diggle and his colleagues attributed that to the decline in the number of cooperators that cannot support the cheats.

Why would cheating increase fitness, even of transiently? The answer is that producing both the quorum sensing signal, and the actual biofilm building components is metabolically costly. QS is therefore very sensitive to parasites: those strains that don’t have to produce signals nor the actual components will therefore benefit more than their hard working neighbors. Up to a point, that is.

relationships

The game of life. For life.

A recent theoretical study in PLoS-ONE examines the evolutionary fitness of hypothetical QS mutants that freeload. Note that this is theoretical: no Pseudomonas were harmed in this study.

Czaran and Hoekstra looked at the problem from an opposite point of view than that of Diggle. They asked whether QS individuals invade and proliferate in a non-QS population.  To answer this question, they used a cellular automaton simulation. A cellular automaton is a grid in which the composing cells have different states (i.e. “full” or “empty”), and whose state depends on the neighboring cells’ state. Each time the grid is scanned, for each cell the neighboring cells determine that cell’s state in the next generation. Here is a simple cellular automaton called The Game of Life.  In the Game of Life, each cell can be either “alive” or “dead”, depending on the number of neighboring live cells. A cellular automaton is therefore a good basis for simulating bacterial communities. In Czaran and Hoekstra’s simulation, each cell is a bacterium, that can be fully QS capable or QS incapable, or partially QS capable in different manners.

Ignoramuses, Liars and Voyeurs

Creating strains in a computer is much easier than in real life, Czaran and Hoekstra used 3 loci for their simulated bacterial genomes.  C for cooperation: production of a public good molecule, such as a polysaccharide for the biofilm. The other two for quorum sensing: locus S for producing the signal molecule (“yoohoo, I’m here”) and locus R for signal response, which includes the signal receptor and the signal transduction machinery that triggers the cooperative behaviour when the threshold signal concentration has been reached. The created 23 = 8 different strains based on the presence or absence of each active gene: Ignorant, Voyeur, Liar, Lame, Blunt, Shy, Vain, Honest. The Ignorant (csr)  lives in complete solitude, and cannot participate in QS. The Honest (CSR) is a good QS citizen. The various others are freeloaders to some extent. For example, the   Liar (cSr) produces the signal molecule, but not the actual response. Lame produces the quorum sensor and the response signal, but cannot produce the actual public-good (C) molecule.

journal.pone.0006655.t001

Table 1. The 8 possible genotypes of the cooperation-quorum sensing system and the corresponding total metabolic costs m(e) of gene expression.

They then ran the simulation using cellular automata. They started with mixtures of initial different populations, and ran the automata, with each cell’s response being a function of how it can respond (a liar cannot build a biofilm even though it asks everyone else to,  while an Honest cannot help but sensing the signals and contributing). Since, as Diggle and colleagues have shown, being a good citizen is metabolically costly, Czaran and Hoekstra figured the metabolic cost in their simulations.

In a nutshell, Czaran and Hoekstra have shown that “both cooperation and the associated communication system can evolve, spread and persist in the population“. So being a good citizen pays off, and cooperation actually increases the fitness of the cooperative strains as opposed to the non-cooperative ones. This is a very elegant and informative simulation work, and I recommend reading it, since there is quite a bit more there than I have written about. My only complaint is that they did not provide some online resource to play around with seeding initial populations and seeing what happens to them after a multi-generational run. So just to psych you out a bit, here is a cellular automaton from YouTube:

And a biofilm, er, film:


Diggle, S., Griffin, A., Campbell, G., & West, S. (2007). Cooperation and conflict in quorum-sensing bacterial populations Nature, 450 (7168), 411-414 DOI: 10.1038/nature06279

Czárán T, & Hoekstra RF (2009). Microbial communication, cooperation and cheating: quorum sensing drives the evolution of cooperation in bacteria. PloS one, 4 (8) PMID: 19684853

signal response, which includes the signal receptor and the signal transduction machinery that triggers the cooperative behaviour when the critical signal concentration has been reached.

PLoS Currents: Influenza. Because knowledge should travel faster than epidemics

August 21st, 2009 Comments off

ResearchBlogging.org

(Full disclosure before I start: I am an academic editor in PLoS ONE. I have no financial stake in PLoS, and as far as I know, they have none in me. They’d better not, if they know what’s good for them).

PLoS have come up with yet another cool mechanism for scientific communication: PLoS Currents. The emphasis in PloS Currents is on rapid science communication, but without sacrificing scientific rigor. To wit:

The submissions are not peer reviewed in depth, but are screened by a group of leading researchers in the field (“moderators”). The moderators will make a rapid determination as to whether a contribution is intelligible, relevant, ethical and scientifically credible, but will otherwise not impose restrictions on the nature, format or content of the contributions. Those submissions deemed appropriate are posted immediately at PLoS Currents: Influenza and publicly archived at the National Center for Biotechnology Information (NCBI).

So here we have all chief elements of scientific communication: credibility (by the moderators), timeliness (immediate online publishing) and attribution (by public archiving).  PLoS Currents: Influenza orPC:I is heavily skewed towards timeliness. The rationale being that in Influenza research and monitoring, time is of essence. After all, a report going through the usual peer review mill can take months: which is exactly the time required for a full-blown pandemic.

Not that other scientific fields are not in need of timeliness. Physicists and mathematicians have known that for almost two decades now. Nature Precedings are also providing an outlet for rapid communication in life sciences. But the combination of speed, accessibility and credibility offered by PC:I is indeed something new and welcome.

As for content: one interesting hypothesis published in PC:I is that humidity and high temperatures block aerosol transmission of Influenza, whereas colder, dryer climes facilitate it.  On the other hand, contact transmission is not affected by This would help explain the predominant winter transmission in temperate zones, vs. the ongoing yet intermittent transmission in tropical zones.  Anice Lowen and Peter Palese have communicated this hypothesis. Or rather a hypothesis. For life scientists are embedded in a culture where they are stilll used to having  only “closed stories” communicated publicly in writing. So this is quite a change. PC:I will hopefully start a trend that will help accelerate science publishing.

One final word: the technology behind PC:I is Google knol, of which I know very little about, but is seems everybody else does.


Anice Lowen, & Peter Palese (2009). Transmission of influenza virus in temperate zones is predominantly by aerosol, in the tropics by contact PloS Currents: Influenza

Absolut standards: report from the M3-2009 meeting, part 2: signature genes and big science

July 27th, 2009 Comments off

ResearchBlogging.org

Some more presentations from the metagenomics, metadata, and metaanalysis (M3) meeting, Stockholm June 27, 2009

Pathway Signature Genes
 Lucas A. Brouwers, Martijn A. Huynen and Bas E. Dutilh
CMBI / NCMLS, Radboud University Nijmegen Medical Centre, The Netherlands

If we take a sample of soil, how can we know whether it is adequate for growing a certain crop? For example, does it have the necessary bacteria to provide the nutrients for that crop from raw compounds in the soil? Or when examining a person with an apparent metabolic disorder, could it be that certain characteristics of  their gut bacteria are causing this? We have already seen this happen with obesity.

Questions like this apply to the functional capacity of the microbes in their habitat. Think about a microbial community as an industrial zone with many factories that can make a range of products, and share each others products as raw material. Some of the byproducts are also products, as well as intermediate assembly stage of what are deemed to be the final products. All these products can be consumed by other microbe species / factories, as well as by plants and animals sharing the habitat.

But when we sample a metagenome, we receive a partial picture of the genes necessary to complete a product or range of products. It’s like receiving a series of partial snapshots of an industrial zone:  suppose we identify a tire factory and a body frame factory. Does this mean they are actually making cars in that industrial zone? Or maybe just certain vehicle parts?

Lucas Brouwers presented a really cool idea: how to detect the existence of pathways in metagenomic data given this partial information.  His reasoning was as follows: certain metabolic pathways — the factories that make compounds necessary for sustaining life –  have signature genes. If these genes exist, then there is a high probability the entire pathway exists. He determined which are the signature genes by examining many bacterial genomes and finding which genes indicate the presence of whole metabolic pathways, and estimating the probability for that. If we use the factory analogy, that would be the equivalent of carefully studying many industrial zones, and determining, for example that 90% of the industrial zones that have both a body frame shop and a tire shop will also make whole cars.  So when we fly quickly by an unknown industrial zone and see that these two factory exist, there is a 90% probability that this zone also makes whole cars as well. After looking at many bacterial genomes and the homologous genes that constitute their pathways, Brouwers and his colleagues built a statistical model to determine the probability of pathways in the metagenomic sequence data, provided certain signature genes are detected.


Jeffrey Grethe: Standards in the Context of a Large Scale Microbial Ecology Cyberinfrastructure
Jeffrey Grethe
Center for Research in Biological Systems, University of California San Diego

(Full disclosure: Jeff is my boss, at least for another couple of weeks before I move on to other things).  :)

Jeffrey Grethe talked about using standards in the CAMERA project. CAMERA, like megx.net, MG-RAST and IMG/M, aims to be a serve the needs of the microbial ecology research community by creating a data repository and a bioinformatics tools resource for metagenomic analysis. Jeff discussed the use of standards in CAMERA, which is working with the Genomics Standards Consortium. Specifically, he showed some examples of the upcoming Geographic database that will enable queries and information on metagenomes, and the data input system that mandates the use of community standards when putting in the data. So for example, when someone would like to compare metagenomes from environments that have high temperature and salinity, CAMERA can help retrieve those using simple queries.


Dutilh, B., Snel, B., Ettema, T., & Huynen, M. (2008). Signature Genes as a Phylogenomic Tool Molecular Biology and Evolution, 25 (8), 1659-1667 DOI: 10.1093/molbev/msn115

Seshadri, R., Kravitz, S., Smarr, L., Gilna, P., & Frazier, M. (2007). CAMERA: A Community Resource for Metagenomics PLoS Biology, 5 (3) DOI: 10.1371/journal.pbio.0050075

A Flurry of Red and Green

July 23rd, 2009 2 comments

ResearchBlogging.org

UPDATE: I submitted this post to the National Evolutionary Synthesis Center’s sponsored contest for a travel award to ScienceOnline2010. Let’s see how it goes… #scio10

In a previous post about Hatena we saw what might very well be the beginning of a (beautiful?) endosymbiotic relationship: a unicellular predator swallows a microalga, resulting in physiological changes to both, and the resulting endosymbiont is now a phototroph, rather than a predator. “endo” – inside “symbiosis” – life together. Endosymbionts live out their symbiosis inside the host’s cells.

In this post I would like to fast-forward to another part of the endosymbiotic movie. We will see how endosymbiosis contributes to evolution much more than we thought. But first, some background information.

Primary and secondary endosymbiosis

Primary endosymbiosis happens when one free living organism engulfs another, resulting in a mutualistic  relationship. Secondary endosymbiosis is the process of engulfing  another free-living organism that already went through primary endosymbiosis.  Such is the case of Hatena: the algal endosymbiont provides the photosynthetic capability and light sensitivity (acquired by primary endosymbiosis), while the host provides motility and a cozy stable home: its cell. Plastids are organelles that  harvest light, manufacture pigments, store food and perform various other functions in plants and algae. Plastids are thought to be photosynthetic microbes that were acquired by primary and then secondary endosymbiosis: they have chromosomes encoding their own DNA transcription and translation machinery, as well as some other genes.  One strong evidence for secondary rather than primary endosymbiosis  is the number of membranes surrounding plastids:  3 membrane layers in algae, 2 in plants, strongly suggesting successive endosymbiotic events. Another evidence is molecular:  most of the proteins needed for plastids to function are encoded in the host’s nucleus.  How and why did the genes travel from the endosymbionts to the host?

Nobody is really sure yet, but here is a working hypothesis: once endosymbiosis occurs, the genome of the endosymbiont becomes mostly redundant. After all, the host takes care of most of the endosymbiont’s nutritional and metabolic needs, and maintains a stable environment in the cell. About 30% of a typical microbial genome is dedicated to genes that stabilize its internal environment in response to events in the external one. Most or all of these genes become redundant once the microbe in question becomes an endosymbiont, and enjoys the hospitality of its host, trusting it to maintain a controlled environment. They either disappear or migrate to the nucleus of the host.

Diatoms: hosting more types of algae for longer that you think

Diatoms are microscopic  algae, so named because they are often shaped from two symmetric lobes — hence “diatoms”. They are photosynthetic, and are thought to compose most of the phytoplankton biomass.

It has been known for a while that diatoms acquired their plastids by a process of secondary endosymbiosis with red algae. The commonly accepted  sequence of events for the diatom / red algae endosymbiotic time-line is shown here:

endosymbiosis-life-cycle

(A) historical diatom (yellow) and red-algae: red ellipse is a generic plastid; (B) algal endosymbiont in diatom; (C) gene migration from alga to the diatom’s nucleal DNA; (D-E) algal cell mostly gone, only the plastid remains.

This is what Ahmed Moustafa and his colleagues also thought about the acquisition of chloroplasts by diatoms.  They therefore set out to look for genes of red algae  in the nuclear DNA of two diatom species whose genomes have been sequenced. To their surprise they discovered that 70% of the algal origin genes in the diatom were from green algae lineages, not red algae. However, there are no green algae-originating plastids in  those diatoms.  In particular, there were some genes that exist in the chloroplasts of red algae, but not in the secondary endosymbiotic chloroplasts in diatoms.  So what happened? Why is the host’s genome “mostly green” instead of  “all red”?

The answer that Moustafa and colleagues proposed was that these diatoms used to have plastids of green algae lineage.  The genes that migrated to the diatom nuclear DNA are therefore green in origin. Over evolutionary time, for reasons unknown, red algae endosymbionts displaced the green ones.  Many of the red genes that would have otherwise migrated to the nucleus already had their places take by green genes, and were simply lost.

endosymbiosis-life-cycle-green-1st

A-D: first sequence of events: endosymbiosis of green algae, including gene migration to diatom nucleus;  (E) displacement of green algae by red, through some unknown mechanism; (F-I): endosymbiosis of red algae, including gene migration to nucleus. Nucleus now has a mixture of green lineage and red lineage genes.

Many questions remain open: why did this replacement take place? How prevalent is it? The researchers only looked at two diatom species, whose genomes have been sequenced. One way to answer this question would be a metagenomic analysis of a diatom population. This would mean analyzing samples of DNA sequences taken from many different diatom species, to get a picture of the frequency of red versus green endosymbiont lineage genes in many more diatom genomes. Also, why would one set of endosymbionts be displaced by another? What is the evolutionary time-line in which the endosymbiosis / displacement process occurs? What, if anything, triggers this replacement?

This finding sheds light upon a larger question in evolutionary biology: how big is the role of endosymbiosis in evolution? How many of an organism’s genes are acquired from other organisms? It seems that with this study, the importance of endosymbiosis as a  contributor of  genes, just went up a notch: we see yet a few more cross-growths between the not-so-separate branches of  the tree of life.

Finally, A flurry of Red and Green by The Dreamer and the Sleeper covered by Karys Rhea. The webcam self-shoot is grainy, and the sound is not much better than a laptop microphone. But Karys Rhea’s singing shines through.


Moustafa, A., Beszteri, B., Maier, U., Bowler, C., Valentin, K., & Bhattacharya, D. (2009). Genomic Footprints of a Cryptic Plastid Endosymbiosis in Diatoms Science, 324 (5935), 1724-1726 DOI: 10.1126/science.1172983

symbiosis where one partner lives inside the cell of the other
Uses light to synthesize food E.g. plants, algae
Mutualism is a biological interaction between organisms, where each individual derives a benefit
Plant and algal organelles that manufacture and storage of important chemical compounds
'plant', photosynthetic plankton
the study of genetic material recovered directly from environmental samples

Absolut standards: report from the Metagenomics Metadata and Metaanalysis 2009 meeting. Part 1

July 20th, 2009 Comments off

ResearchBlogging.org

The first metagenomics, metadata and metaanalysis meeting held in Stockholm June 27 2009 was a raging success. People were standing all the way back to the hall jostling for elbow room, while all the other concurrent meetings were pitifully empty after word has made it about how awesome we were.

OK, I may be exaggerating  slightly, since I was the meeting’s co-organizer, co-chair, program committee co-chair, and bartender. (If you were there and you don’t remember me tending bar then I must have done a good job). Well, maybe I wasn’t a bartender. Fine.

ISMB2009-M3_SIG

So what was the meta(genomics, data, analysis) meeting about?

I’ve talked about metagenomics in several earlier posts. Just in case you are a new here: metagenomics is the study of genetic material that comes directly from the environment. It is a technique used to study genetic material from organisms (usually microbes) that cannot be cultured in a lab, and to get a picture of organisms in their natural environment, which often differ from lab clones.

While in genomics we strive to obtain a full picture of an organism’s DNA, in metagenomics we sample the environment for whatever DNA we can get. We are actually merging population biology with genomics. While in population biology our basic unit of study is an organism, in metagenomics it is a DNA sequence. This presents many challenges: properly sampling the microbial habitat and extracting the DNA, understanding which organisms the DNA in the samples came from, gauging sample depth, assembling the sequences, identifying genes, assigning a biological function to those genes, to name a few.  There are many different experimental and computational procedures for doing so, and they should be meticulously documented, as Nikos Kyrpides from the Joint Genome Institute writes in this month’s Nature Biotechnology:

Like molecular biology, genomics has been fueled by the innovative energy
of many interdisciplinary activities. Unlike molecular biology, which has
thrived on the principle of standardized methods and protocols, genomics
has progressed without regard for the critical importance of shared
standards. Now, 14 years since the first complete genome was published
and with more than 900 genome sequences finished, it is astonishing to
observe the lack of standards for so many critical procedures in the
field, ranging from simple data exchange to gene finding, function
prediction and metabolic pathway description.

Now for the kick in the head:

As an example, we compared the genomes of two closely related organisms,
Burkholderia mallei ATCC 23344 (ref. 19) and Burkholderia pseudomallei
K96243
(each sequenced by a different sequencing center)
[...]
we identified 548 genes in B. mallei that are absent from B. pseudomallei
and are potentially related to their different lifestyles. Manual curation
of those 548 genes revealed that, in fact, 497 of them are also in
the B. pseudomallei genome, but there they had not been identified as
'real' genes. The reason for this discrepancy?

The two sequencing centers used different gene finding methods.
The consequence was an almost 90% error rate in the results of our
comparison.

Ouch. Ouch, ouch ouch. And that is not an anecdotal example. Furtehrmore,  it also applies to metagenomics: even more so, since many of the standard operating procedures (SOPs) in metagenomics are still in the process of inventing themselves.

Metadata is the “data about the data”: all the habitat data, SOPs and abiotic data that is in dire need of the standardization Kyrpides writes about.

Last, metaanalysis would be the analysis of genomes and metagenomes. Since the M3 meeting was held under the auspices of the International Society for Computational Biology, it attracted mainly computational biologists — the type to analyze, rather than sample and sequence (but the differences are rapidly blurring, as we saw in many talks).

But things are actually looking better for standards. In 2005 the Genomics Standards Consortium was formed to address this problem. Renzo Kottman from the Max-Planck Institute for Marine Microbiology in Bremen, Germany  talked about software development within the GSC, and specifically about his own project: the Genomic Contextual Data Markup Language, or GCDML. GCDML is an XML-based standard for describing everything associated with a genomic or a metagenomic sample: where it was taken from , under what conditions, which protocols were used to extract, sequence, assemble, finish and analyze the metagenome. Again, my own personal bias here: I am a heavy user of GCDML, as I am writing my own data-insertion software, and have headed such an effort for a while at the University of California San Diego. Here are Kottmann’s slides, and you can also read more about GCDML.

<div style=”width:425px;text-align:left” id=”__ss_1685987″><a style=”font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;” href=”http://www.slideshare.net/djudge/functional-metagenome-analysis-using-gene-ontology-megan-4-1685987″ title=”Functional Metagenome Analysis using Gene Ontology (MEGAN 4)”>Functional Metagenome Analysis using Gene Ontology (MEGAN 4)</a><object style=”margin:0px” width=”425″ height=”355″><param name=”movie” value=”http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=m3-talk-drichter-090706045218-phpapp01&rel=0&stripped_title=functional-metagenome-analysis-using-gene-ontology-megan-4-1685987″ /><param name=”allowFullScreen” value=”true”/><param name=”allowScriptAccess” value=”always”/><embed src=”http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=m3-talk-drichter-090706045218-phpapp01&rel=0&stripped_title=functional-metagenome-analysis-using-gene-ontology-megan-4-1685987″ type=”application/x-shockwave-flash” allowscriptaccess=”always” allowfullscreen=”true” width=”425″ height=”355″></embed></object><div style=”font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;”>View more <a style=”text-decoration:underline;” href=”http://www.slideshare.net/”>presentations</a> from <a style=”text-decoration:underline;” href=”http://www.slideshare.net/djudge”>djudge</a>.</div></div>

Daniel Richter talked about the functional annotation  of metagenomes, using Gene Ontology, a technique he developed with Daniel Huson, at the university of Tuebingen, Germany. The Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. It is composed of a vocabulary of some 27,000 terms, with hierarchical relationships defined between them, from the general (“catalytic activity”) to the specific (“phosphatase activity”) to the more specific (“Tyrosine phosphatase activity”). (Graph theory prudes: GO is a DAG, not really a hierarchy, I know, I know). Richter assigns functions to sequences hypothesized to be genes using the Last Common Ancestor approach. LCA works as follows: once a high enough similarity is found between a sequence from a metagenome and a sequence from a reference database, LCA looks for similarities to other, related sequences, where the similarity score is above a certain threshold. It then assign a general function using GO that may fit all.

Jack Gilbert from Plymouth Marine Laboratory, Plymouth UK talked about a year of sampling marine microbiome in the Western English Channel. He went through many different sampling and normalization problems.

Tom Matthews from the National Microbiology Laboratory in Canada talked about a profiling pipeline for pathogens. A fast typification of pathogens in case of an outbreak.

There were more presentations, but I think I’ll give it a rest and get back to them in part 2. I am also waiting for some people to upload their slides…. you know who you are!


Kyrpides, N. (2009). Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream Nature Biotechnology, 27 (7), 627-632 DOI: 10.1038/nbt.1552

Kottmann, R., Gray, T., Murphy, S., Kagan, L., Kravitz, S., Lombardot, T., Field, D., Glöckner, F., & , . (2008). A Standard MIGS/MIMS Compliant XML Schema: Toward the Development of the Genomic Contextual Data Markup Language (GCDML) OMICS: A Journal of Integrative Biology, 12 (2), 115-121 DOI: 10.1089/omi.2008.0A10

Assigning biological functions to genes