The first article in the Journal of Serendipitous and Unexpected Results (JSUR) has been published. Reminder: “JSUR is an open-access forum for researchers seeking to further scientific discovery by sharing surprising or unexpected results. These results should provide guidance toward the verification (or negation) of extant hypotheses.”  (From the JSUR website.) I posted about JSUR before here and here.

And the article is pure blogging gold. But not in the sense you may think: it is actually very good. This is because it uses functional MRI (fMRI) on a dead salmon. Now why didn’t I think of that? So to explain what a dead salmon is doing in an MRI machine, I first have to explain the problem the DSIM (dead salmon in MRI) is solving.

#### Beware the FWER

Suppose you are flipping a coin, to test whether it is fair. The coin lands heads side up 9 times out of 10. So if  you assume that the coin is fair, then the probability that a fair coin would come up heads at least 9 out of 10 times is (10 + 1) × (1/2)10 = 0.0107. This is relatively unlikely, and under statistical criteria such as p-value < 0.05, (meaing, your accepted error rate is 5%), you would declare that the the coin is unfair.

Now, suppose  you flip 100 coins, 10 times each to test their fairness.  Given that the probability of a fair coin coming up 9 or 10 heads in 10 flips is 0.0107, one would expect that in flipping 100 fair coins ten times each, to see any particular coin come up heads 9 or 10 times would still be very unlikely. But seeing some coin behave that way, without concern for which one, would be more likely than not. More specifically, the likelihood that all 100 fair coins are identified as fair by this criterion is (1 − 0.0107)100 ≈ 0.34. So there is a good chance that at least one coin will be falsely identified as unfair from 100 fair coins! This is because we are not dealing with one statistical test, but with a whole family of them. And we know how large families can be… Or rather, using the same significance criteria for a family of test that we use for a single test will bring up significant results where there are none.

So how do we solve this problem? Well, there are several ways. First, we have to determine the Familywise Error Rate (FWER) or False Discovery Rate, (FDR) which are two related measures used to estimate how much of such false-positive errors we are prone to make in any given family of tests. Then there are correction techniques we can apply, for example, the Bonferroni correction, which essentially divides the significance of “coming up heads”  for each particular test by the number of tests.  Using a much more rigid statistical threshold with Bonferroni correction  with 100 (1 − 0.0107/100)100 ≈ 0.99.  Therefore, the probability of any single outcome being classified as an unfair coin is now reduced to 0.01: much better.

(For you statistical purists out there: yes, this is not a rigorous explanation. I am trying to get the gist of it across).

#### This Fish has Ceased to Be

What has that got to do with fMRIs and a dead salmon?

fMRI tests are very popular. Why should they not be? Take someone, stick them in an MRI, show them a picture of their mother-in-law,  see which bits of their brain light up (get more blood, hence are more active) and voila! You’re in the New York Times science supplement under the title “Scientists discover brain region responsible for unmitigated rage.” (Any resemblance to any actual mother-in-law, living or dead, is purely coincidental.) fMRI is a great tool for mapping cognitive processes into specific areas of the brain. It is our tool to connect between mind and brain, so to speak.

The pixels that appear in an fMRI scan are called voxels, or volume picture elements: fMRI scans provide brain slices that is reconstructed in 3D. A typical fMRI scan can contain 130,000 voxels. Tens of thousands of tests can be performed over multiple conditions. With the sheer number of images, can certain voxels light up as false-positives? You betcha. Is every voxel significant? Well, to answer that, Craig Bennett and his colleagues took a dead Atlantic Salmon, and placed it in an fMRI.  The salmon was then shown a series of photographs depicting humans in various social situations. The (dead, remember?) fish was asked to determine which emotion each individual has been experiencing. They scanned the salmon’s (did I say it was dead?) brain, and collected the data. They also scanned the brain without showing the fish the pictures. The images were then checked for change between the brain doing picture recognition tasks, and the brain at rest, voxel by voxel. They found several active voxel clusters in the (yes, still dead) salmon’s brain. See below:

"Sagittal and axial images of significant brain voxels in the task > rest contrast. The parameters for this comparison were t(131) > 3.15, p(uncorrected) < 0.001, 3 voxel extent threshold. Two clusters were observed in the salmon central nervous system. One cluster was observed in the medial brain cavity and another was observed in the upper spinal column." (From Bennett et al 2010 JSUR 1:1 1-5)

So, what we have here is a very dead fish that can recognize emotions in humans? Not really: of course they will find some voxels out of the 130,000 lighting up! Just like you would have a good chance of finding any non-specific coin out of 100 falling  heads-up 9 out of ten times. Of course, once they used corrections, the voxels were smoothed out.

My conclusion from this: awesome paper,  showing us (1) how some serendipitous results should be interpreted: very carefully, with a grain of salt and with the proper FWER and FDR corrections for multiple pairwise tests. Serendipity might just be spurious, even if it does not seem to be. Also (2) know your statistics, or be a dead fish.

Golden quotes from the paper:

It is not known if the salmon was male or female, but given the post-mortem state of the subject this was not thought to be a critical variable.”

Either we have stumbled onto a rather amazing discovery in terms of post-mortem ichthyological cognition, or there is something a bit off with regard to our uncorrected statistical approach.

Craig M. Bennett, Abigail A. Baird, Michael B. Miller, & George L. Wolford (2010). Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction JSUR, 1 (1), 1-5 Other: http://jsur.org/v1n1p1

Share and Enjoy:

## Now that’s a f***ing big genome!

It isn’t junk DNA: God just commented out a lot of crappy code as he rolled out releases.
— An old bioinformaticians’ joke

(Hey, I never said it was a funny joke…)

Why are some genomes so big? I mean, seriously. Why would the marbled lungfish with a genome weighing 132.83 picograms (pg) need an estimated 130,000,000,000 bp?  It may have to do with that fact that these fish undergo metamorphosis, and the large developmental coding this could entail: some amphibians also have big genomes.. then again, some don’t. So the reason for the big lungfish genome is still a mystery.

Marbled Lungfish. Credit: Wikimedia commons

Then there is the genome of Paris japonica, a rare plant whose genome weighs 152.23 pg, making its genome the largest known so far, at a whopping estimated 150,000,000,000 bp. (Humans have a genome size of 10,000,000,000 3,000,00,000 bp by comparison. Thanks for catching this error, Jason.) Large genomes do not seem to confer an advantage: in fact, plants with large genomes are at greater risk of extinction, are less adapted to living in polluted soils and are less able to tolerate extreme environmental conditions. Their cell-cycle is, of course longer, so they grow slower than plants with a small genome and perhaps also more errors are introduced during mitosis and meiosis. The nucleus size and consequently, the cell size are also bigger, at least in plants. But in their conclusions to the study published in the Botanical Journal of the Linnean Society the authors write that “We are still profoundly ignorant about why some genomes […] are so big and how they operate and function.”

Paris japonica. Credit: Wikimedia commons

Finally, there are viruses. Not exactly alive, but getting more so as we are discovering viruses with genomes sizes that rival those of bacteria and archaea. I have posted before about the Mimivirus: a virus infecting amoebas which is so large it has been mis-classified as a bacteria for a decade.  At 1,181,404 nucleotides its genome may not seem like much compared with Paris japonica and the marble lungfish, but this genome is 100-1000 times larger than that of most known viruses. Mimivirus also has tRNA genes, which are used to assemble proteins and a viral parasite of its own, named Sputnik (“little companion”), all of which makes you wonder whether the working definition for viruses as non-living entities still holds.

This month, Matthias Fischer and his colleagues have described a large marine virus, with a genome of 730,000 bp of double stranded DNA. The virus infects a unicellular eukaryotic bacteria eater named Cafeteria roenbergensis. (Why the odd name for the host? “We found a new species of ciliate during a marine field course in Rønberg and named it Cafeteria roenbergensis because of its voracious and indiscriminate appetite after many dinner discussions in the local cafeteria.” Reminds me of Ali G saying that he will name his son after where he was conceived which would be “Langley Village”, with the full name being :’The bogs in KFC in Langley Village’). Hence, the virus infecting this creatively-named critter is the Cafeteria roenbergensis virus or CroV. The virus has some 544 predicted protein coding genes, with at least 274 of them expressed during infection. Among the goodies coded by CroV are transcription related genes, DNA repair genes, promoters, and tRNA. Fairly atypical to known viruses. Which again, begs the question: how much cellular machinery does a virus need to code in its genome to cross the border between life and non-life? Is that even a criterion, or should we also consider the lack of physiology?  Still, the majority of genes in CroV, as in Mimivirus and in most known viruses have no similarity to those in “true” living things. Go figure.

Cafeteria roenbergensis wondering if its date is infected with CRoV. Credit: Wikimedia commons

CRoV Genome. Credit: Fischer M.G. et. al PNAS 2010

Fischer, M., Allen, M., Wilson, W., & Suttle, C. (2010). Giant virus with a remarkable complement of genes infects marine zooplankton Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1007615107

PELLICER, J., FAY, M., & LEITCH, I. (2010). The largest eukaryotic genome of them all? Botanical Journal of the Linnean Society, 164 (1), 10-15 DOI: 10.1111/j.1095-8339.2010.01072.x

Share and Enjoy:

## Carnival of Evolution coming here

The 29th edition of the Carnival of Evolution will be hosted here. There are quite a few good things in store: on parrot feathers and lizardfish eyes, on Darwin cartoons, on dogs, dancing and much more.

You can still contribute. So if you are a blogger with a post on evolution, go to the blog carnival site to submit. (Please do not email me.) Submissions will be accepted until October 30. The 29th edition will be posted here November 1.

Share and Enjoy:

## Spiders

Warning: somewhat NSFW language.

Share and Enjoy:

## Two Workshops on Biological Wikis

This seems very promising: two consecutive workshops on biological Wikis in Naples. If you have a life-science related wiki, plan on doing one, or just want to learn about how collaborative authoring can help your work, this would be a great place to do so. Thanks to Paolo Romano for the information.

Joint NETTAB 2010 and BBCC 2010 workshops focused on Biological Wikis
November 29 – December 1, 2010, Naples, Italy

http://www.nettab.org/2010/

CALL FOR PARTICIPATION

The joint NETTAB and BBCC 2010 workshop on “Biological Wikis” promises to be a great meeting for all researchers involved in the exploitation of wikis in biology.

Come and discuss your ideas and doubts with such scientists as Alex Bateman, Alexander Pico, Andrew Su, Dan Bolser, Robert Hoffmann, Thomas Kelder, Mike Cariaso, Adam Godzik, Luca Toldo and many other who, we hope, will join the workshop.

It’s a great chance to follow smart tutorials and lectures on WikiPathways, WikiGenes, PDBWiki, Gene Wiki, TOPSAN, and a proficient use of Wikipedia and Semantic Wiki. See below a list of keynote speakers and tutorials.

There still is time to submit abstracts for posters and software demonstrations until next October 17, 2010 (a short delay may also be permitted)!
The complete Call is available on-line at http://www.nettab.org/2010/call.html .

Registration is open at http://www.nettab.org/2010/rform.html .
Register within next October 29, 2010 and take profit of early registration fees.
A reduction of 20 euro applies to all fees for members of ISCB.

SCIENTIFIC PROGRAMME

In the style of the other NETTAB workshops, NETTAB 2010 will include: keynote lectures given by leading experts in the field, oral communications from selected contributions, open discussion, selected software demonstrations and posters, as well as tutorials.

Provisional programme ( see http://www.nettab.org/2010/progr.html )

Monday, November 29, 2010 (Tutorial day, open to all interested participants)

The following four tutorials will be given starting at 11.30.

Mining biological pathways using WikiPathways web services and more…
Thomas Kelder, Department of Bioinformatics (BiGCaT), Maastricht University, the Netherlands

How to create your own collaborative publishing project with WikiGenes
Robert Hoffmann, Computational Biology Center, cBIO, Memorial Sloan-Kettering Cancer Center, MSKCC, New York, USA

Everything you wanted to know about Wikipedia but were too afraid to ask
Alex Bateman, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
Andrew Su, Bioinformatics and Computational Biology, Genomics Institute of the Novartis Research Foundation (GNF), San Diego, USA

Semantic MediaWiki: a community database and more.
Dan Bolser, College of Life Sciences, University of Dundee, Scotland, United Kingdom

Tuesday, November 30, 2010 (NETTAB workshop day)

A rich scientific programme is foreseen, starting at 9.00. The following five invited talk will be given:

The Pros and Cons of Wikipedia for Scientists
Alex Bateman, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom

Collaborative publishing with authorship tracking and reputation system – WikiGenes
Robert Hoffmann, Computational Biology Center, cBIO, Memorial Sloan-Kettering Cancer Center, MSKCC, New York, USA

WikiPathways, community-based curation for biological pathways
Alexander Pico, Gladstone Institute of Cardiovascular Disease, San Francisco, USA

The Gene Wiki: Achieving critical mass and mining for novel annotations
Andrew Su, Bioinformatics and Computational Biology, Genomics Institute of the Novartis Research Foundation (GNF), San Diego, USA

PDBWiki : Success or failure?
Dan Bolser, College of Life Sciences, University of Dundee, Scotland, United Kingdom

Many Oral communications and software demonstrations will be presented, including:

SNPedia
Mike Cariaso, Keygene, The Netherlands

TOPSAN: a collaborative annotation environment for structural genomics and beyond
Adam Godzik, Sanford-Burnham Medical Research Institute, La Jolla, CA, USA

A panel discussion with all speakers on the Future of Biological Wikis will close the day. A poster session is also planned.

Wednesday, December 1, 2010 (BBCC workshop day)

The programme of this section is yet to be defined.
It will include oral communications mainly from researchers of the Campania region, around Naples.
For previous editions of the BBCC workshop, please visit http://bioinformatica.isa.cnr.it/BBCC/ .

CALL FOR CONTRIBUTIONS ( see the full Call at http://www.nettab.org/2010/call.html )

Topics

Submitted contributions should address one or more of the following topics:
* Wiki development tools (Wikimedia, Wikimedia extensions, Semantic Wikis, Wiki-coupled CMSs, Other wikis)
* Arising issues for the biomedical domain (Authoritativeness of contributions and sites, Quality assessment, Users acknowledgement, Stimulatation of quality ontributions, Authorships management and reward, ‘Scientific production’ value for contributions, Management of bioinformatics data types)
* Wikis and collaborative systems for Genomics, proteomics, metabolomics, any -omics, Proteins analysis and visualization, Gene and proteins interactions, Metabolic pathways, Oncology research
* Issues to be tackled by wiki and collaborative research for Genomics, proteomics, metabolomics, any -omics, Proteins analysis and visualization, Gene and proteins interactions, Metabolic pathways, Oncology research

* October 17, 2010: Posters submission (a short delay may be accepted)
o Decisions announced: October 24, 2010
* October 29, 2010: Early registration ends
* November 29 – December 1, 2010: Workshop and Tutorials

Instructions

Submit your contribution through the EasyChair system at http://www.easychair.org/conferences/?conf=nettab2010.
* All contributions should follow the same format, as specified here: Times New Roman, 12 pti, page A4, left, right and lower margins: 2.0 cm, upper margin: 2.5 cm
* Abstracts for posters should be no more then 3 pages, including tables and figures. They should include: Introduction, Methods, Results, References.

For any further information or clarification:
Web site: http://www.nettab.org/2010/
Email: info @ nettab . org

Share and Enjoy:

## Money and Science

Writing grants all the time (another deadline coming Monday, yikes) made me think about money and science, but in a rather oblique way: coins and notes commemorating scientists and scientific achievements. While looking for examples, I found that Alex Pasternack from Motherboard.TV has done a really nice and thorough job already. So have a look, really cool collection of notes & coins commemorating scientists. If you know of any other science-inspired currency, please link to a pic in the comments.

``` Title: Science Currency Designs Get a better browser! ```
``` ```

Share and Enjoy:

Share and Enjoy:

## IgNobel Slides

The IgNobel prizes have been announced last week. From Wikipedia: “The Ig Nobel Prizes are an American parody of the Nobel Prizes and are given each year in early October for ten achievements that ‘first make people laugh, and then make them think.’ Organized by the scientific humor magazine Annals of Improbable Research (AIR), they are presented by a group that includes Nobel Laureates at a ceremony at Harvard University’s Sanders Theater.”

This year, Andre Geim was the first IgNobel laureate to receive a Nobel prize. Andre Geim shared the IgNobel with Sir Michael Berry in 2000 for their work on magnetically levitating frogs. He received the Nobel this year, in physics, for “for groundbreaking experiments regarding the two-dimensional material graphene“. I do believe that the two prizes are converging somewhat…

For those of you who teach, and whose students crave some comic relief (like mine do), here is a 3-slide show I prepared on this year’s IgNobel. You can download the presentation from Slideshare.

Ignobel2010

View more presentations from Iddo.
Share and Enjoy:

## Life serves viruses

Sometimes I get the feeling that all life on Earth basically serves as a vehicle for viral replication and propagation. Viruses thrive in all three domains, they embed themselves in all creature’s genomes, they may lie dormant in the genome for eons or decimate whole populations in a few years, and they are the most abundant protein & DNA particle on earth. I am certain that their full impact on evolution is overwhelmingly larger than they are given credit for at present.

In a new article in PLoS Biology ﻿﻿﻿Clément Gilbert and Cédric Feschotte report their discover of  the DNA of hepadnaviridae viruses embedded in the genome of the zebra finch. By comparing the viral DNA to current viral DNA they dated it to to 19 million years ago. This places the  genomic infection in animals by hepadnaviridae which also include the hepatitis B virus  much earlier than thought. The hepadnaviridae also include the infamous hepatitis B virus, or HBV.

Credit: Daniel D. Baleckaitis, From: wikimedia commons

Hepadnaviruses infect a genome on a “one time only” basis, meaning that the same virus cannot jump around the genome and re-replicate like retroviruses (HIV for example) do. Therefore, the integration of each virus provided an event time stamp for the researchers to look at. They concluded that the insertion events into  the Finch’s genome and its ancestors’ genomes took place over several million years. And, of course, is still ongoing.

UPDATE: just after I hit “Publish”, two tweets came along from Ian Holmes and Peter Cock about another interesting paper on viruses in the service of evolution (or vice-versa). This one is from Lauren McDaniel and colleagues titles “High Frequency of Horizontal Gene Transfer in the Oceans“. Gene Transfer Agents or GTAs are virus-like particles that insert DNA into bacterial genomes. McDaniel and colleagues engineered GTAs to contain a marker gene  that provides antibiotic resistance. They placed the marked GTAs in bags filled with seawater, and floated the bags in the ocean overnight to simulate natural conditions. They found that  47% of the  bacteria in the bags incorporated the GTA-specific antibiotic resistance, therefore the GTA infected those bacteria. Paul Jones was quoted as saying “they’re promiscuous little bastards”, referring to  the large number of bacterial species GTAs infect. GTAs have little of their own genome, and they basically are vectors from transferring DNA between bacteria. So wow. Whole Lotta Shakin’ Goin’ On.

Finally, a CGI video of viruses at work. The Resident Evil bit spliced on at the end (2:10), while not scientifically accurate, is kinda cool by itself.

And of course…

Gilbert, C., & Feschotte, C. (2010). Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses PLoS Biology, 8 (9) DOI: 10.1371/journal.pbio.1000495

McDaniel, L., Young, E., Delaney, J., Ruhnau, F., Ritchie, K., & Paul, J. (2010). High Frequency of Horizontal Gene Transfer in the Oceans Science, 330 (6000), 50-50 DOI: 10.1126/science.1192243

Share and Enjoy:

## Attack of the Giant Archaea

Archaea are under-rated. For one, most people don’t really know they exist – and if they do archaea are thought of as a type of bacteria. This goes not only for the general public also for some of my non-microbiology colleagues. (I had to correct quite a few “archaeobacteria” utterances.) The discovery that Archaea are a separate domain of life, as far from bacteria as we are, has never properly embedded itself in the way people commonly perceive life. There are some mitigating circumstances of course: anyone can intuitively understand the difference between Salmonella and a tiger; but you need to have a more specialized knowledge base to appreciate the difference between Methanococcus (an archaeon) and Salmonella, and understanding this difference is not an intuitive thing . Even in some textbooks for freshman biology, when addressing the diversity of life, it seems that Archaea are relegated to a status of driving the shoulders of the bacterial highway. Indeed the text usually goes through the formalities of explaining that archaea are a separate domain, and may even draw a nice three-lobed tree to illustrate the concept. But even after reading that, one is left with the feeling that the authors did not really believe what they were saying. The perceived subtext is that archaea are basically those weird non-bacteria (but they kinda are) that live in weird places like the Dead Sea, hot springs in Yellowstone, cow guts, and the arctic circle’s permafrost.

They are not THAT interesting since they don’t really affect our health, aside from causing farts. Hahaha. And much like the population of a faraway land which appears in the news only in the event of a natural disaster or the birth of a two-headed boy, they tend to be appreciated for contributing a certain spice to the microbial world, rather than being strong contributers and essential elements of the biosphere. It is the lack of perceptible distinctiveness, and the perceived exoticness and lack of impact on human affairs have marginalized archaea, even among microbiologists. (Hard to justify an R01 for studying archaea.)

Let’s fix these misperceptions. For one thing, archaea are everywhere. Soil, marshland & oceans – it is estimated they make up some 20% of the oceanic microbial plankton.

Another thing: archaea affect our planet, and as a consequence, us, profoundly. One example is the role of methanogens in carbon release to the atmosphere. Methane is a greenhouse gas 20 time as potent as carbon-dioxide.

Archaea are different: unlike bacteria, their cell walls do not have peptidoglycan, their membrane biochemistry is different than anything else on earth.
Archaea also interact with other species: other archaea and bacteria. We have not heard of archaeal parasites or pathogens yet, but there are plenty of commensals and mutualists out there.

One really cool example was recently published in Environmental Microbiology. A giant marine archaea that lives in the shallow waters of the French West indies. (Got to hand it to the authors, they know how to pick a good research spot.) Specifically, near the roots of mangroves. There is a whole ecosystem going on down there. The archaea form long filaments, 0.1mm thick and 30mm long. The filaments are wrapped around the roots of the mangroves. The individual archaea cells are huge! 0.1 mm is almost visible to the naked eye, and the filaments themselves are visible. Here’s the kicker though: the archaeal filaments are coated with bacteria, which are much smaller. Under the microscope, the whole structure looks like a plant stalk covered with aphids. The bacteria coating the archaeal filaments also contain sulfur globules, which may indicate some symbiotic relationship between the bacteria and the archaea regarding sulfur. In any case, what we have here seems to be a rather complex and fascinating structure, with giant archaea and small bacteria doing…, er… something together.

A. Cross section of the archaeal filament, showing two archaeal cells separated by the zig-zagging line. The circles around are sulfur-containing bacteria. (F) scanning electron microscope of the giant archaeal filament (Arch) coated with bacteria (bc). From Muller et. al Env Microbiol (2010) 12:8 2371- (c)2010 Society for Applied Microbiology and Blackwell Publishing Ltd.

Note to the authors (on an otherwise excellent paper): please don’t use the phrase “98.4% sequence homology”….

Muller, F., Brissac, T., Le Bris, N., Felbeck, H., & Gros, O. (2010). First description of giant Archaea (Thaumarchaeota) associated with putative bacterial ectosymbionts in a sulfidic marine habitat Environmental Microbiology, 12 (8), 2371-2383 DOI: 10.1111/j.1462-2920.2010.02309.x

Share and Enjoy:

## Lake Arrowhead Microbial Genomics Conference

Quick post: at the Lake Arrowhead Microbial Genomics Conference. I’m a bad microblogger, but thankfully Jonathan Eisen and Ruchira Datta are doing a great job of covering this conference live. There is a friendfeed room. The Twitter hashtag is #LAMG10.  The science, people, food and location are all great. My student, David Ream, is presenting a poster on our work on operon evolution, and has received a lot of feedback. Tomorrow there will be a session devoted to the COMBREX project in which I am involved. This morning: antibiotic resistance, (pretty depressing, given the ubiquity of antibiotic resistant microbes in almost everything we eat) followed in an afternoon session on extremophiles. Yay.

Share and Enjoy:

## Changing directions

For some reason, this reminds me a lot of the way some of my research has been going recently….

Share and Enjoy:

## The open source spammer: extracting email addresses from an openoffice.org document

I’m organizing a workshop later this month (see here, scroll to session V), and I have just received the attendees list from the main conference’s organizers. Since I need to spam send the attendees informative email on the specific workshop, I needed their email addresses. Here’s what I did.

The file itself is MS Word doc. Those I save as native openoffice on my system. Now, an openoffice document is really just a bunch of mostly XML documents zipped together. If you do the following:

`unzip  -l conference-delegates.odt`

You get a listing that looks like this:

```Archive:  conference-delegates.odt
Length      Date    Time    Name
---------  ---------- -----   ----
39     2010-09-01 18:16   mimetype
71244  2010-09-01 18:16   content.xml
94     2010-09-01 18:16   layout-cache
15522  2010-09-01 18:16   styles.xml
1241   2010-09-01 18:16   meta.xml
24852  2010-09-01 18:16   Thumbnails/thumbnail.png
0      2010-09-01 18:16   Configurations2/accelerator/current.xml
0      2010-09-01 18:16   Configurations2/progressbar/
0      2010-09-01 18:16   Configurations2/floater/
0      2010-09-01 18:16   Configurations2/toolbar/
0      2010-09-01 18:16   Configurations2/images/Bitmaps/
0      2010-09-01 18:16   Configurations2/statusbar/
8961   2010-09-01 18:16   settings.xml
1988   2010-09-01 18:16   META-INF/manifest.xml
---------                     -------
123941                     16 files
```

Wow. Which file contains the delegates’  emails in all that? Actually, content.xml contains the textual content of the openoffice.org document. You can open it with your favorite XML and see how it’s constructed (I like Firefox myself for browsing, and XML Copy Editor for more in-depth diagnosis). But for now, we would like to extract the emails. So we unzip content.xml only:

```unzip conference-delegates.odt content.xml
```

This unzip command will only extract content.xml from the archive that is the .odt file.

When looking at the content.xml file, we see lines like this:

``` <text:a xlink:type="simple" xlink:href="mailto:noone@usc.edu">

<text:span text:style-name="T2">noone@usc.edu</text:span>
</text:span>
</text:a>
</text:p>
```

Which means that “noone’s” (usernames have been changed to protect the innocent) email appears both as text and as hyperlink. It may or may not be that all the delegates’ emails are hyperlinked, so we may expect some duplications we need to get rid of.

To get the email addresses themselves, we use egrep. egrep uses the extended regular expression syntax in searching for emails. What is a good regex for email addresses? There is a good discussion of that at the regex-guru site. I use the rather simple form:

`egrep -o -i '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' content.xml`

Explanation: the  -o qualifier prints only the word matching the regex.  -i means a  case-insensitive match. egrep, the extended version of grep, that can handle regexs with things like {m,n} repeats. However, the result of our little exercise would still have duplicate emails, because of the hyperlinking tags. Here is how to get rid of the duplicates:

```egrep -o -i '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' content.xml | sort |  uniq
```

sort sorts the output alphabetically, preparing it for uniq to get rid of duplicates.

One last touch-up: we really don’t need to physically extract the content.xml file. “unzip -c” extracts files to stdout. Therefore, we can get the email addresses without cluttering our disk:

```unzip -c conference-delegates.odt content.xml | \
egrep -o -i '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' | sort |\
uniq > email-these.txt```

Voila! email-these.txt now contains the emails of the conference delegates.

One last word: it may have been easier just to save the MS-Word doc file as text using the File -> Save as…” option in openoffice.org. Supposed we saved the file as conference-delegates.txt. We wouldn’t have to muck about with all the XML, and remove the email address duplicates due to hyperlinking. We could have just done:

```egrep -o -i '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' \
conference-delegates.txt > email-these.txt```

But where’s the fun in that?

Happy spamming!

Share and Enjoy:

## Predator MX: Jack the Rippler

No, not a new hunter-killer drone, neither is it the n+1 installment in the sci-fi horror series. Rather, Myxococcus xanthus. Again.

Source: US Air Force

M. xanthus is a highly cooperative bacterium, as we have already seen: when starved, most cells “commit suicide” while a few form spores, to survive the lean times. But M. xanthus also cooperates when times are good and food plentiful: M. xanthus cells form swarms, which envelope their prey and increase the concentration of digestive enzymes they secrete to the environment.

M. xanthus also ripple together to better ingest the nutrients released after digesting their prey. To do so, they use a type of pilus (motility organ, like a bacterial tentacle) called type IV secretion pili. But amazingly enough, in mutant M. xanthus that are unable to make these pili, a new mechanism for cooperative swarming evolved.  In 2003, Velicer & Yu from the University of Tuebingen have used a group of mutant M. xanthus, which are lacking the type IV pili gene  to show that cooperative swarming can evolve using an alternative mechanism. The mutant bacteria do that by forming a physical net of sugars and proteins connecting them — and their rippling motion — together. Watching behavior evolve: how cool is that?

Finally, a movie of M. xanthus swarming & rippling:

Velicer, G., & Yu, Y. (2003). Evolution of novel cooperative swarming in the bacterium Myxococcus xanthus Nature, 425 (6953), 75-78 DOI: 10.1038/nature01908

Berleman, J., Chumley, T., Cheung, P., & Kirby, J. (2006). Rippling Is a Predatory Behavior in Myxococcus xanthus Journal of Bacteriology, 188 (16), 5888-5895 DOI: 10.1128/JB.00559-06

Share and Enjoy:

## When is it a good idea to cheat?

I have written before about bacterial cooperation, and how cheating works, up to a point, in an environment of bacterial cooperation. That post talked about bacterial quorum sensing, the collective signaling mechanism by which bacteria construct supra-cellular structures called biofilms. Biofilms are tough multicellular enclosures that allow bacteria to survive and thrive in hostile environments, and to invade host species. Both studies have shown that freeloading does not pay off. Bacteria who do not chip in to build the biofilm, yet benefit from it are ultimately doomed — and sometimes doom the collective of which they are constituents. That post dealt with the “here and now” aspect of cooperation and cheating.

Life cycle of M. xanthus. Credit: Carla canales / citizendium.org

This post deals with another aspect of bacterial cooperation: how does it evolve? Why cooperate in the first place at all? Every time an individual cooperates, short term gains are sacrificed for long-term ones, but those long-term ones are contingent upon all or most cooperating individuals doing their bit. Think about standing in line to the bus. If everyone cooperates, we get on the bus faster, but some of us may be forced to stand. On the other hand, shoving your way to the beginning of the line will assure you a good seat, albeit at the expense of glares from your fellow-passengers, and maybe a few altercations along the way.  In evolutionary terms, selfishness seems like a sounder strategy than cooperating.  After all, if you manage to gain a better position for yourself in life’s pecking order, you pass those genes that enable that to your progeny, and further down the line. Why cooperate or act selflessly in the first place? Why let someone else share the gene pool with you when you can have it all to yourself?

Unless that “someone else” shared genes with you: that is, they were related in some way. Suddenly, cooperation seems to have evolutionary benefits: you are preserving and passing on some of the same genes.  Protecting kin is the most often-used explanation for how cooperation evolved in the first place: kin selection, meaning, favoring cooperation those individuals with which you share a larger number of genes over those who do not. Evolutionary biologists use the Hamilton’s law as a guideline:  the higher the benefit of the cooperation, the lower the cost, and the closer the relatedness of the individuals cooperating, the more likely it is that there will be cooperation. Putting it into an equation, cooperation will evolve if the following condition is met:

`rb - c > 0`

Where r is the relatedness (on a scale of 0 to 1 where 0: no genetic relation, 1:self), b is the benefit of cooperation and c is the cost. This rule, formulated by William Donald Hamilton is a centerpiece of evolutionary biology. Imagine going on a day’s hunt  where the quarry is a large animal that can feed one hunter for 35 days, but requires at least five hunters to take it down.  Now there are also smaller animals around, that can be hunted by one individual, and they supply enough food for one hunter for two days. Is it beneficial to hunt  alone or together? Let’s figure the benefits and costs. For hunting the large animal, the one that requires at least five hunters, the benefit is a week of food each (b= 35/5 = 7) while hunting for one day (c=1). If the individuals are cousins sharing an average of 20% of the genetic material  then:

0.2×7 – 1 = 0.4 is the benefit score

If they are siblings, sharing 50% of the genetic material, then:

0.5×7 – 1 = 2.5 the benefit score is even higher

But what about individual hunting? The benefit of the smaller quarry is is 2 days worth of food, and one day of hunting, and you do it alone. So r=1 (yourself), b=2 and c=1 giving us:

1×2-1 = 1

In this hypothetical model, a group of siblings will cooperate to hunt big game, while cousins would probably hunt smaller game alone. If you want to dig deeper into how Hamilton’s rule was derived, and further implications of the rule, I recommend this post.

Any mechanism in evolution is examined through the lens of fitness. Fitness is the relative ability to produce and support viable progeny. So if cooperation increases fitness, we can use the following simple graph to explain the difference between a cooperating and a non-cooperating  individuals in a cooperating population using Hamilton’s rule:

Figure 1: Hamilton's rule prediction: the fitness of cooperators (blue) and non-cooperators (red) increases as the number of cooperators among social neighbors (x-axis) increases. The slope of both lines is the benefit (b).

The benefit, b, is the slope of these two lines. The difference is c. Note that for any given frequency of cooperation in the population, the non-cooperating individuals (red line) have a higher fitness than the cooperating ones (blue line). It seems that it “pays off” to be a self-server no matter the social environment you are in, even though you still benefit from being in a cooperating community. Yeah, we all know the type.

But what happens when the difference between cooperating and not cooperating depends on the percentage of cooperators in the population? Not too hard to imagine: if most of the population is playing nicely together and benefiting from it, then this might change the attitude of the selfish individuals more readily then if only a small fraction of the population is cooperating. But as it stands, Hamilton’s rule does not provide for this type of model. However, the following modification of Hamilton’s rule does:

r ⋅ bc + m ⋅ d > 0

Relatedness, r, is now not a scalar (a single number), but a vector (an ordered set of values)  r = {r1, r2, …} describing relatedness under different conditions. Ditto the benefit vector, b. b has the coefficients of the equation describing the fitness of non-cooperators as a function of how many neighboring cooperators there are in the population (red lines). In a linear setting (Figure 1), r = {r1} b={b1} and m⋅d = 0, collapsing the expanded equation into the classical Hamilton’s rule.  We won’t get into m and d in this post, they are important though, and you should read the paper to understand how they play a role

Expanding them from scalars to vectors enables a richer and more flexible description of Hamilton’s rule, allowing to describe non-linear relationships like this:

Figure 2: Note two things. First, the relationship between fitness and the fraction of cooperators in the population is not linear. Second, the difference in fitness between cooperators and non-cooperators decreases as the fraction of cooperators in the population goes up. These two phenomena cannot be described by the classic Hamilton's rule equation. They can be described using the modified rule.

This modification of Hamilton’s rule was developed by Jeff Smith and colleagues, at the department of Biology at Indiana University. Armed with the new equation, Smith and his colleagues decided to see how well it can be applied. They decided to look at Myxococcus xanthus. M. xanthus bacteria behave normally as long as food is abundant: they swim around and proliferate by cell-division as bacteria do. But when starved, they aggregate, and some cells form resistant spores, while the others die. Some cheating strains sporulate  well when in cooperating populations, but do poorly on their own. The scientists mixed a cooperator strain with a cheating strain at different frequencies, starved them, and measured the fraction of each strain in the population of surviving spores. They found the following: first, the fitness effect was non-linear; in fact, it was almost exponential. Second, cooperators were more fit than cheaters at low cooperator frequencies, but cheaters fared better at high cooperator frequencies. So it pays to freeload when most people around you behave nicely. In the case of M. xanthus, the added value to the population is quite high. In fact, the scientists found that cooperation in M. xanthus is very robust and resistant  to cheating:  cheaters were viable (i.e. had a positive fitness)  only with groups that had more than 70% cooperators. So it is only when cheaters have a large cooperating population to buffer their nasty habits that a they can thrive.

Figure 3: Relative fitness of cooperators (blue) and cheaters (red) in a populations with different relative frequencies of cooperators. Note that the fitness scale is logarithmic: the fitness increase is very much non-linear, as in Figure 2.

Moral of this story: if you got to cheat, make sure there are a lot of nice people around. Otherwise it won’t work out very well.  In evolutionary terms, the  trait for cooperation and kin selection has evolved to become strongly entrenched, so much that cheaters can only survive if cushioned by a high frequency of cooperators. Favoring your own and acting selflessly towards them seems to be the way to go, in the case of M. xanthus.

smith, J., Van Dyken, J., & Zee, P. (2010). A Generalization of Hamilton’s Rule for the Evolution of Microbial Cooperation Science, 328 (5986), 1700-1703 DOI: 10.1126/science.1189675

Share and Enjoy: