Phound phage phootage

I am finishing up a great weekend at HHMI Janelia Farm Research Campus. The occasion is an annual symposium celebrating the Phage genomics Course taking place in dozens of universities in the USA.

Here are two of our students, Erich Goebel and Morgan Light, next to the poster they presented at the meeting:

This post is based on notes I took at Graham Hatfull’s talk. The major finding he presented was that certain geographic trends are beginning to emerge. For example, phages in the A4 cluster seem to be limited to mostly east of the Mississippi river.

To date,  students taking this course have sequenced, annotated, and deposited 359 mycobacterial phages in phagesdb.org. This makes this educational crowdsourced genomic project the single largest contributor to the global growth of phage genome data.

Another interesting thing that was found was the large number of recombinations that go on the phage genome. Seems like phages have evolved a highly efficient mechanism of generating diversity by recombination. This recombination mechanism is already being used as a tool in synthetic biology.

Finally, the North Carolina State University students produced not only some great science, but also a funny movie:

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

The Transit and Decline of Venus

From this:

To this:

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Repost: Shavuot is a Microbial Holiday

Yesterday was Shavuot. That wonderful holiday which includes midnight studies, water-bombing and dairy products. Mmmmm…. cheese. A food product heavily embedded in the science of microbiology. Cheese is the founding product of the biotech industry (along with beer and bread).

Boaz asking Ruth on a date to the cheese and wine festival.

So here’s to Lactobacilli and Lactococci which are at the center of the production of dairy products. Breaking down milk sugar (lactose) into lactic acid, which curdles the milk protein casein. Left to its own devices, this process generally produces rotten milk, since other bacteria may join the fray. Cheesemaking starts the process by adding some rennet first. Rennet was originally and still being produced from cows’ upper stomachs. The active ingredient in rennet is chymosin, used to curdle milk drunk by calfs. But 90% of cheeses produced in the US and the UK now are made using recombinant chymosin, produced in the fungus Aspergillus niger.

 

Lactobacillus sp. Source: wikipedia

 

Cheese also needs to ripen. Propionibacterium freudenreichii ferments lactate in the cheese to produce, among other things, carbon dioxide. This produces the holes we see in Swiss cheese. But it’s a finicky bug, and needs its faithful symbiotic companion Lactobacillus helveticus (“Swiss Lactobacillus“) to provide it with essential amino acids necessary for its growth. P. freudenreichii is named after Eduardo von Freudenreich, a 19th century microbiologist who, among other things, wrote a seminal book on dairy microbiology.

Some cheeses are somewhat riper than others, as Asterix, Obelix and Dogmatix discover.

By the way, Propionibacterium acne a relative of Propionibacterium freudenreichii is a bacteria found on our skin which at times causes… yes, acne. But don’t think about acne when you eat Swiss cheese. (Now I put some people off Swiss for quite a while).

"Acne" comes from "acme" which means point, edge or peak. Here is an acme, but not acne.

Finally, mold. Penicillinum roqufortii provides those beautiful blue streaks in the sheep-milk cheese Roquefort… and the taste. Only cheeses aged in the natural Combalou caves of Roquefort-sur-Soulzon may bear the name Roquefort. Which is why it is so damn expensive.

 

And I haven’t even started upon Camembert (the real one is made from unpasteurized milk) ripened by Penicillium candidum and Penicillium camemberti. Or kefir, made from the eternal Kefir grains, a matrix of bacteria and polysaccharides that can be reused forever…. but I’m too hungry for cheesecake to continue this.

 

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Job opening: Scientific Curator at the Jackson Laboratory

Scientific Curator – Bioinformatics

Interested individuals should apply on-line at www.jax.org/careers, referring to job posting #3256.  Contact Jeannine Ross at ext. 6045 with questions.

The incumbent in this position plays a critical role in data annotation and curation for the Gene Ontology (GO) and Protein Ontology (PRO) programs at The Jackson Laboratory in Bar Harbor Maine, through diverse activities to gather, analyze, evaluate and integrate information and analysis results using biomedical ontologies.  Activities include, but are not limited to, obtaining data via literature or electronic-based means, determining data object identity/uniqueness, judging information or analyses for appropriateness of incorporation into GO and PRO resources, and evaluating and applying biomedical ontologies.  This individual must keep abreast of new scientific developments that are relevant to functional genomics, and should attend group meetings and seminars, as well as make poster present posters/platform sessions at conferences.  Team participation in project development andsoftware testing is expected, as well as collaborations with outside research groups and international bioinformatics communities.  Assisting with training new curation staff, authoring project proposals, responsibility for writing/maintaining curational documentation are some of the additional roles that may be played by scientific curators.

Required:

·       advanced knowledge in mouse as an experimental organism

·       expert knowledge in specific data areas of biochemistry as well as functional and comparative genomics

·       broad understanding of database principles, biomedical ontologies, and skills with computational analysis techniques and data interpretation

·       exceptional communication and organizational skills

Experience/Education:

·       requires a Doctoral degree in the Life Sciences, and

·       a minimum of 1 – 3 years of experience

 

Credit: Mr.Thomas, Flickr

 

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Crowdsourcing Genomics II: Unveiling HINdeR and Phrux

About this time last year, I posted about a new course I was going to teach, Phage Genomics. Briefly:

Phage isolation, electron microscopy, DNA sequencing in the first semester, annotation and comparative genomics in the second. And I get to teach the bioinformatics bit: annotation and comparative genomics. Woo-hoo! The great thing about this course, is that unlike most lab courses, the students (and faculty) will be setting up experiments intended not only to teach, but also to discover something new.  Also, the results of the research are meaningful. Genomics data generated by student participants will be used by other researchers to answer medical, ecological, and evolutionary scientific questions

The students isolated, sequenced and annotated two previously unknonwn mycobacteriophages, HINdeR and Phrux. The links are to the Mycobacteriophage Database phagesdb.org where the sequences and associated metadata (where and when HINdeR and Phrux were found and isolated) can be found. The annotations will be there shortly.

I had a great time teaching this course, together with Mitch Balish from my department, who is not only a great teacher, but shares my vice for keeping the students guessing when we are being serious and when we are kidding.  Mitch is the guy with the goatee in the short sleeved shirt; I’m the one in the black sweatshirt. Here’s what the students had to say about the course (original site at Miami University). Mitch starts talking at 2:57, I’m at 4:08, Gary Janssen (who taught the first semester) is at 5:08:

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Repost: the Scope(s) of Substance

This tweet from Neil Degrasse Tyson jolted me from a pleasant rest before tomorrow’s race:

 

…which led to the (in)famous Scopes Trial. On May 5, 1925 John Scopes was charged and subsequently tried, found guilty, and fined $100 for teaching Evolution, a violation of Tennessee’s Butler Act. The trial became a battleground for science vs. religion, evolution vs. creationism, and the interpretation of the Establishment Clause and Freedom of Speech in the US constitution.

I published a blog post two years ago, on the 85th anniversary of the trial, July 2010. Today  marks the 87th anniversary of the arrest, so it seems like a good occasion to repost. Especially since there is still some work needed in the area of teaching evolution:

Source Wikimedia Commons. Credit: John D. Croft. Based on: New Scientist Magazine 2006 191:2565 p11

 

To follow is the original post: “The Scope(s) of Substance”,  from July 29, 2010. Still relevant, I believe:

Continue reading Repost: the Scope(s) of Substance →

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

The Inside Poop

It’s pretty much common knowledge that mother’s milk is the healthiest food for infants, and that it bestows health benefits upon mother and baby that formula feeding cannot match. The unique combination of lipids, sugars, proteins and antibodies is not even close to being rivaled by baby formula manufacturers. With few exceptions, such as when there is a concern that the mother is contagious and may infect the baby, breastmilk is the recommended diet for infants.

As I am interested in things microbiological, I have been especially interested in the effect of breastmilk on the baby gut and gut microbiota. There have actually been quite a few studies on that, but most of these studies were about the gut microbiota only. However,  we can’t really separate our gut from the microbes that reside in it. The bacteria in the human gut affect the gut (and, in turn, the entire body) and are affected by it. The gut is really a superorgan, composed of a minority of human cells, and 1014 bacterial cells. Most of the gut is actually bacteria, not human, but the part that is human is important, since, well, it’s “us”. (Well, kinda hard to tell now which “us” is “us” and which “us” is “the bacteria that live in us”.) To understand what goes on there we need to study both bacterial and human cells. While adult microbiota+gut systems have been studied, mostly for the effect of probiotics, there have not been studies of baby guts because you cannot perform consented invasive procedures on babies. In other words, you cannot scrape their colons for gut lining, or epithelial, cells. So there has not been much of an opportunity to study the gut epithelium+microbiome in human infants.

The opportunity came with Robert (“Robb”) Chapkin from Texas A&M University, and Sharon Donovan from the University of Illinois at Urbana-Champaign. Robb has developed a system to isolate gut epithelial cells from the feces. We shed about millions of cells from our gut when we defecate, and Robb’s lab has a way to fish those gut lining cells out of the stool. Thus, we can sequence the mRNA, and find out which genes are transcribed in the baby gut. At the same time, we can analyze the baby’s microbiome. Enter Sharon Donovan’s lab, who has studied 12 babies,  six were breast fed and six were formula fed.

This is where Robb contacted me, and generously invited me to College Station, Texas about a year and a half ago. Aside from enjoying Texan hospitality (big steaks) and meeting people, Robb brought me into this fascinating study. They needed a bionformatician to help analyze the gut transcriptome and gut metagenome data. I am very glad they contacted me, since this started a very enjoyable collaboration and a scientific journey whose results are published this week  in Genome Biology. I was put in touch with two great statisticians, Ivan Ivanov and Scott Schwartz, also at Texas A&M. We put our heads together, and came up with  a strategy.

Analysis flowchart. Reproduced from Genome Biology 2012, 13:R32 doi:10.1186/gb-2012-13-4-r32 under BMC CC2.0 license. Click to enlarge.

Continue reading The Inside Poop →

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

It’s a smORF world, after all?

ResearchBlogging.org

Here is a study that looked for a type of genes that the authors felt was neglected by classic genomic annotation. The research shows how to employed concepts in molecular evolution to validate the existence of these genes.

Some background: the first question we ask after assembling a genome is: “where are the genes”? Not an easy question to answer, since a gene is classically defined as a unit of heredity. It may code for RNA, protein, or sometimes, nothing at all. The actual implementation of the “unit of heredity” can take several physical forms, each one of them different. Therefore, the algorithms for finding genes would depend on which type gene one is looking for, exactly.

A somewhat more tractable question is: “where are the open reading frames”? Open reading frames or ORFs are those stretches of DNA that code for proteins.  Indeed, most gene calling software actually identifies ORFs. There are many attributes that go into an ORF calling algorithm: the frequency of the bases  (or k-mers of bases) in the suspected coding regions, the signals for the beginning and ends of introns, the existence of non-coding regions that aid transcription such as promoters and enhancers, the location on the chromosome with relation to other ORFs, and the length of the of the final product. The latter criterion is actually quite important, as many ORF-calling algorithms will discount anything coding for a protein that is shorter than 100 amino acids as being “too short”. The reason for employing this length cutoff, is that the number of false positives increases dramatically when ORFs coding for proteins shorter than 100aa (or 300 nucleotides) are called. Therefore, most gene-callers would just tend to discard any short peptides.

But throwing away the baby with the bathwater is not a good solution, since short peptides are known to be responsible for many of life’s activities: mating pheromones, small compound transporters, hormones, neurotransmitters and regulation of other proteins’ activities, to name a few. Many of these short peptides are the result of the cleavage of larger proteins, which means that the ORFs encoding for them are originally longer than 300bp.  But some may actually have their own ORFs, coding only for them. How can we find those small ORFs or smORFs out? How many of them are there? Is the number of smORFs large enough to make it worth re-annotating genomes?

Click to enlarge. Gene Structure. Source: Wikimedia commons. Credit: Forluvoft

Emmanuel Ladoukakis from the University of Crete and colleagues from the university of Essex, UK have set up a bioinformatic pipeline to look for smORFs in the Drosophila melanogaster genome. Bear with me, there are a few steps in this pipeline. But there’s a lot to learn about genomics just from looking at what they did, and why they took those steps.

Here’s what they did: 1) Find smORF candidates: they looked for all potential smORFs (starting with a start codon and ending with an in-frame stop codon, 30-300bp long) in those parts of D. melanogaster’s genome that were annotated as non-coding. To keep things simple, they looked only for intron-less smORFs: smORFs that are encoded consecutively in the DNA.  They found 593,586 potential sequences. 2) Remove transposons: they then removed all those that had a similarity to transposons. Transposons are DNA elements that multiply in the chromosome: something like an internal virus, only usually benign. They may carry bits of other genes they “grab” on the way, but they are not functional. They were left with 556,554 sequences 3) Big step: look for homologs in another fly species: they then looked for smORFs with similar  translated amino-acid sequences in D. pseudoobscura, which diverged from the melanogaster  25 to 55 million years ago. The reason they looked for similar amino-acid sequences was that if there is a selection to conserve a smORF, it would be on the protein, and not at the DNA level. This step reduced the number of smORF candidates by 93%: from 556,554 down to 43,210.  Looking only for 4) global alignments, (another big step)  they found 4,561 smORF candidates by looking at alignments of whole smORF sequences, not only of partial local similarities. this reduced the number of candidates by 72% from the  step (3). We are now down to 0.8% of the original 593,586 smORF candidates.

Quite a filtering process. Note the huge elimination: 99.2% of all initial smORFs candidates are gone. I believe that they decided to sacrifice sensitivity in favor of specificity

So they had 4,561 smORF candidates conserved between two flies. Still, how many ORFs got in by chance? Hard to know, but they continued to rely on evolutionary conservation as a guideline. There may be smORFs that appeared independently in melanogaster and pseudoobscura after they separated 55 million years ago,  but the main evidence for true smORFs would be their evolutionary conservation between the two fly species.

To get even more specific, they now 5) looked for shared synteny:  conservation not only of sequence, but also of the genomic context: the sequences surrounding it. That brought the number down to 3,314.

OK, so they looked for conservation based on homology and based on synteny. Anything more? Well, yes. The next step would be to 6) look for evolutionarily selected smORFs. The two evolutionary criteria they used until now were homology and synteny. Now comes a third:  selection. If  smORF candidates are actually coding, they will be subject to  purifying selection, that is, to selection that eliminates deleterious mutations. This is evident in a low rate of non-synonymous vs. synonymous substitutions, or a Ka/Ks ratio of << 1. (Read about Ka/Ks ratios also here.) 7) Looking at what actually gets transcribed in Drosophila (from looking at the transcriptome) this number was whittled down to a final 401.

Click to enlarge. Search pipeline for Drosophila smORFs. Diagram of the smORF search pipeline followed in this study. The percentages of smORFs passing each filter are indicated. For full details, see Results and Materials and methods. CDS, coding DNA sequence; Dm, Drosophila melanogaster; Dp, Drosophila pseudoobscura; Ka/Ks, ratio of non-synonymous (Ka) to synonymous (Ks) nucleotide substitution.Ladoukakis et al. Genome Biology 2011 12:R118 doi:10.1186/gb-2011-12-11-r118

So the chosen 401 smORFs are evolutionarily conserved, both in sequence and in synteny, subject to purifyng selection (by Ka/Ks ratio) and produce a transcript. The authors obviously went for specificity over sensitivity: they looked for “good bet” smORFs rather than a large number of candidates. What I like about this study is the way that the authors used a large number of evolutionary traits that can be used as attributes for identifying smORFs. They also were careful to rule out, as much as possible, that these smORFs that may be a result of a larger transcript. This is a really nice molecular evolution work. There is no experimental evidence yet of the functionality of these smORFs: those are left to future proteomic and fly geneticists. But the idea of a small(er) world of genes, hiding in plain site among the more familiar large ones, does have its appeal, and may yield some surprises about how are genomes are structured.

Finally, for the evolutionary biologists: read the paper; there is quite a lot more to it that what I wrote. I just gave the highlights.

 


Ladoukakis, E., Pereira, V., Magny, E., Eyre-Walker, A., & Couso, J. (2011). Hundreds of putatively functional small open reading frames in Drosophila Genome Biology, 12 (11) DOI: 10.1186/gb-2011-12-11-r118

 

http://genomebiology.com/2011/12/11/R118/abstract

 

 

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

And I should go because?

Found this in my inbox:

Dear Dr.Iddo Friedberg,    

Greeting from OMICS Group!

I came across your contribution entitled “Biopython: freely available Python tools for computational molecular biology and bioinformatics” published in the Journal of Bioinformatics and thought your expertise would be an excellent fit for Toxicology-2012 Conference that OMICS Group is hosting.

 

I’m just wondering how many legitimate calls for participation I am missing due to the increasing amounts of conference spam in my inbox.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Biocuration 2012

 

Great meeting:  Biocuration 2012, Georgetown University, DC.  When I leave a meeting with my head exploding with new ideas and a need to try them all out at once, I know I got my money’s worth, and then some. Even a three hour flight delay followed by discovering my car with a dead battery at 1am at the deserted Dayton Airport parking lot did not dampen my enthusiasm upon return. I will make sure my dome light is off before I leave my car  the next time though. To follow are bits and pieces from the meeting I enjoyed. I’m doing this mostly from memory, two days later, so I may have an addendum once I get my notes together.

What is biocuration? Well, anything that has to do with annotating, labeling, indexing, identifying biological entities. Almost exclusively genes in this conference. Genome databases, especially those of model organisms, employ curators to annotate, check and re-annotate the genomic data Here’s a more elaborate explanation, taken from the website of the International Society for Biocuration:

Biocuration involves the translation and integration of information relevant to biology into a database or resource that enables integration of the scientific literature as well as large data sets. Accurate and comprehensive representation of biological knowledge, as well as easy access to this data for working scientists and a basis for computational analysis, are primary goals of biocuration.

The goals of biocuration are achieved thanks to the convergent endeavors of biocurators, software developers and researchers in bioinformatics. Biocurators provide essential resources to the biological community such that databases have become an integral part of the tools researchers use on a daily basis for their work.

 

Day 1 started off with many community annotation tools. I thought that the Wikipedia model for annotation was dead, but maybe I’m wrong. Many community efforts use a large number of experts, as opposed to a huge number of non-experts, which is what the speakers at the first session were discussing. Pombase (whose title drew some chuckles from the French speakers at my table), the Tetrahymna Genome Database Wiki and the Gene Wiki were presented. The Gene Wiki, presented by Andrew Su from TSRI is a bona-fide crowdsourcing approach, not just Wikipedia-like but actually comprised of a set of 10,000 gene definition stubs folded into Wikipedia. Jennifer Harrow from Sanger presented a poster with an accession model of annotations: the “blessed annotator” who has been trained for 3 months and has the run of the wiki, and the “gatekeeper”, who has been trained in a 2-day workshop, and whose contributions need to be monitored. Lots of talks about trusted annotators, etc. Perhaps we should look to cryptography’s “circles of trust” to enable trusted annotations yet increase the number of curators. (I use “curation” and “annotation” interchangeably throughout.)

An afternoon workshop, discussed who are biocurators. If you are a biocurator, there’s a good probability you are 31-50 years young (80%), female (60%), with a PhD (76%), been through the academic mill and found it to be a bad fit for one reason or the other. You like your work, you rarely burn out, it is challenging and stimulating, you are not in it for the money. (Few people in non-industry science are.)  Actually, since non-profit science is run on soft money, funding is a serious concern, and your job may have a shorter half-life that you would care for it to have, as you are probably employed on a 3-5 year contract. Your boss is rarely a biocurator her/himself, which may mean that your job description may sometimes be ill-defined.

After  that, there was a  whole session devoted to curation workflows and tools. If  you are setting up your own genomic database, check these out: WebApollo,  CvManGO and the Reactome. Attila Csordas from EBI presented PRIDE, a tool for curating proteomic data. While proteomic data are growing, there are few choices of software tools to annotate them. So PRIDE is a welcome player in the field.

 Day 2 had a “Genomics, metagenomics comparative genomics” session, only without the metagenomics. 🙁  What I really liked was the ViralZone resource for viral genomes, out of SIB. High time someone did this for the most abundant biological particle on Earth, and the one responsible for most diversity in life.

The breakout sessions were my favorite, getting a change to interact with like-minded people interested in similar questions. (That is, those that share my prejudices.) I went to the one organized by Marc Robinson-Rechavi and Frederic Bastian which dealt with the question of quality in gene annotation.  Here is the problem: when we annotate a gene with a function (or functions), we also need to say what is the evidence that brought us to think that this gene does what it does. The most popular vocabulary for annotating genes is the Gene Ontology or GO. GO provides us with evidence codes which allow the curator to say what is the evidence for the function they assign to a gene. Those range from experimental evidence codes such as “inferred from mutant phenotype” which are always entered by a human curator, to “Inferred from Electronic Annotation” which have no human oversight. These evidence codes are used as a proxy for quality: people generally tend to accept that evidence from an experiment may be stronger evidence that that gene does what it does than an electronic one. That may not necessarily be true. For example, high-throughput experiments that results in many genes getting assigned with annotations wholesale. Even with the uncharacteristically low) 5% error rate, a single paper used as a source from which 5,000 genes are annotated would result in 25 wrongly annotated genes.  In addition, these types of experiments supply annotations that are not very specific, such as “protein binding” or “embryonic development”, terms that in many cases are too general to be useful. On  the other hand, Nives Škunca of ETH Zurich has shown a beautiful study about how fully automated annotations may not be as inferior to human-curated ones as most people think, with some caveats. (Note: Nives also showed her work in a poster that won the best poster award at the meeting, and this work has just been accepted to PLoS Computational Biology. I will try to blog more about it once it’s published, it’s really brilliant.) The discussion revolved around how we should ascertain the quality of annotations, what would be considered a useful annotation, and how can we establish trustworthiness. Seems like there is quite a bit of work to be done, as people are only beginning to realize that this is a more complex problem than we thought. A major player in this will be the Evidence Ontology or ECO, an elaborate ontology in the making describing lines of evidence for gene annotation.

Day 3: Atilla Csordas, whom I mentioned earlier, organized an unconference session early morning. A few of us gave brief talks there. Ben Good from Andrew Su’s lab talked about biocuration through games, with harnessing  The idea is to do for biocuration what fold.it has done for protein folding. The Dizeez game quizzes you about diseases related to genes, and scores you according to how well you link genes to diseases. But as Andrew says on his blog:

 Generally, the gene-disease links in structured databases will be reasonably correct (though likely not at all complete). When we analyze the game logs in aggregate, we expect that players’ answers will generally reinforce what’s already known. But given enough game player data, also expect that we’ll see multiple instances of gene-disease links that aren’t reflected in current annotation databases. And these are candidate novel annotations.

So there may be something there, although it is not the “wisdom of the crowds” that is being exploited, since I imagine that only people with advanced degrees in their field can contribute to Dizeez. You can see games from the Su lab on genegames.org. Sean Mooney from Buck talked about the Statistical Tracking of Ontological Phrases (STOP) project. The idea here is to automatically enrich GO annotation of genes with other ontologies, to get a more comprehensive description of their function, especially when it comes to disease.  I talked about the Critical Assessment of Function Annotations (we finally submitted the paper, yay!).  Atilla talked about annotating proteomic data.

Great meeting. A big thank you to the organizers, it went without a hitch.  Logistics, food, coffee were all fantastic. Looking forward to Cambridge nest year! EDIT: a virtual special issue of Database has been published for this meeting, Some of the talks are there as papers. Open Access, of course.

Finally, my favorite promotional item from the meeting:

 

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

You. Want. This. Job.

NSF grant funded, woohoo! Now I am hiring a programmer. So if you want to be part of a dynamic, growing lab, do lots of interesting stuff and upgrade yourself from just a great bioinformatician to a super-bioinformatician, this job’s for you.  You’ll be working primarily on microbial genome evolution, including setting up a kick-butt multi-genome database, and all sorts of interesting distractions.  See below for the nitty-gritty. Original ad here: https://www.miamiujobs.com, job posting number: 0001377 . Pass on to interested parties. Three year position, renewable annually.

Microbiology: Scientific Programmer/Specialist to implement and maintain a genomic database web site; implement data management tools including relational database management applications for efficient storage and retrieval of genomic data; perform other duties as related to the position such as data and project management to ensure data are being processed in an efficient and timely manner; contribute to writing scientific manuscripts.

Required qualifications: BS or BA in Computer Science, bioinformatics, or a related discipline; demonstrated programming experience, particularly in Python and SQL databases; demonstrated web programming experience; knowledge of Linux/Unix; excellent spoken and written communication and documentation skills.

Preferred qualifications: Advanced degree (M.Sc. or Ph.D) or equivalent in Computer Science, Bioinformatics, Molecular Biology or a related discipline; experience in development of bioinformatic algorithms; knowledge of R programming; experience in development of or contribution to open source projects; experience in collaborative software development such as the use of version control software, writing and following software specifications, participation in code review; knowledge of basic molecular biology; experience with genomic browser programming, such as GMOD or equivalent.

Candidates should send a CV or resume and have three letters of reference sent separately to Dr. Iddo Friedberg at Friedberg.lab.jobs ‘at’ gmail ‘dot’ com. Screening of applications begins April 14, 2012 and will continue until the position is filled.

Miami University is an affirmative action/equal opportunity employer with smoke-free campuses. Consumer Information http://www.miami.muohio.edu/about-miami/publications-and-policies/student-consumer-info/. Hard copy upon request.

Ad in PDF.

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Dirty Genomics

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Repost: a very loose and circular association to Pi Day

ResearchBlogging.org

(Originally published March 14, 2009)

Happy Pi (π) Day! Americans write dates in the MM/DD/YYYY format instead of the DD/MM/YYYY format used by the rest of the world.  Usually a rather painful and confusing format if you did not grow up with it, causing checks to bounce and leases to expire for those who recently moved to the US, but it has a few benefits: you can take the numeric representation of March 14, and you have the first three digits of Pi. This coincidence is good enough to celebrate a day around the uber-celebrity of numbers. (Heh, I said  “around”). Everybody’s welcome.

This is the day all geeky bloggers come out and try to: (1) show how smart they are; (2) connect Pi, usually in some improbable and tenuous fashion, to whatever theme they have in their blogs and (3) try to make an original observation of pi no one else has made before. So that is exactly what I am going to do today.

Sort of.

Well,  probably not.

Smarts

Well, I remembered Pi day, didn’t I? OK, that does not show I’m smart, just shows my brain is a repository of useless trivia. Look at the time of publication of this post:  March14, 1:59am which is 3.14159. Hey, five digit time stamp that’s smart! (Not very original though, also I’m actually up at this time finishing a grant proposal).

1aym_bio_r_500
Human Rhinovirus capsid. Not a perfect sphere, but close connection to blog theme
A post with a less than tenuous connection to Pi

Some virus capsids are icosahedral. Not really spherical but sort-of. Bacteria have flagella motors that are circular. Micelles are usually spherical.  Microvesicles are spherical. All these are a good start for pi-topics.

Well, too bad. I actually want to write about circular proteins. Only “circular” in this case does not mean “circle shaped”:  hence, we are chucking Pi out the window right now. Stick around though, these proteins are really cool.

Formation of a peptide bond
Formation of a peptide bond

You were probably taught that proteins are linear chains of amino acids that fold into a shape that produces their function. The links connecting the chains are peptide bonds. But there is no real reason why the carboxy terminus (right side) and amino terminus (left side) would not bond themselves.  It just has never been observed, or looked for. Well, they do. And some proteins are circular, like a snake biting its own tail.

Structure and sequence of the cyclotide kalata B1
Structure and sequence of the cyclotide kalata B1

These cyclotides are very robust. For one, they are almost immune to proteases: enzymes that break up proteins. Many proteases attack the edge of the protein (exoproteases, because they start from the “outside”), but there are no edges to attack here. The disulfide bonds, their short length make them immune to endoproteases as well as to heat, pH, etc.

What do cyclotides do?

They protect the organism that produces them.  All kingdoms of life produce cyclotides, everything from bacteria to Rhesus monkeys. (Actually, I am not sure about Archaea). Cyclotides seem to act in different mechanisms: some form holes in the membrane of the attacking microbe;  plant cyclotides stunt the growth of feeding caterpillars. Interestingly, the same plant peptide, Kalata B1 induces uterine contractions in mammals. This is how it was discovered: a physician working in the Democratic Republic of Congo noticed that laboring women were drinking tea made from Oleanda affinis to induce childbirth. Theactive ingredient was the first cyclotide to be discovered. Since then, cyclotides have been shown to be antibiotic, antiviral and insecticidal.

Do humans produce cyclotides?

I could not find anything about that in the literature. So I took the amino acid sequence of a recently discovered monkey cyclotide, rhesus theta defensin 1 (RTD1) sequence and BLASTed it (TBLASTN: protein vs. nucleotide)  against the human genome. No results. Of course, this 5 minute trial proves very little. TBLASTNing short sequences  (the RTD1 is only 18aa long) is a bit sticky. If you are a beginning bioinformatics student looking for a course or rotation project, finding candidate Cyclotides in humans (or in other genomes) might be a good idea.  There are about 100 known sequences, so quite a bit for a training set to start from.  You can build a profile or an HMM, and do some more sensitive searches.

But what about Pi?

Sigh.. well, here is an XKCD oldie but goldie nerd litmus test… enjoy…


Trabi, M. (2002). Circular proteins — no end in sight Trends in Biochemical Sciences, 27 (3), 132-138 DOI: 10.1016/S0968-0004(02)02057-1

PELEGRINI, P., QUIRINO, B., & FRANCO, O. (2007). Plant cyclotides: An unusual class of defense compounds Peptides, 28 (7), 1475-1481 DOI: 10.1016/j.peptides.2007.04.025

Wang, C., Hu, S., Martin, J., Sjogren, T., Hajdu, J., Bohlin, L., Claeson, P., Goransson, U., Rosengren, K., Tang, J., Tan, N., & Craik, D. (2009). Combined X-ray and NMR analysis of the stability of the cyclotide cystine knot fold that underpins its insecticidal activity and potential use as drug scaffold Journal of Biological Chemistry DOI: 10.1074/jbc.M900021200

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

The Origin of Gender Symbols in Biology

ResearchBlogging.org

A quick post for International Women’s Day: how did the gender symbols originate in biology? What do ♀ and ♂ actually stand for?

The answer starts in antiquity, when planets and gods were almost synonymous. Religious rites (at least in Europe) were also associated with the working of metals. Thus, each heavenly body was associated with a metal, a god and provided with a proper symbol, thus:

1. Sun (gold) 2. Moon (silver) 3. Saturn (lead) 4. Jupiter (tin) 5. Mars (iron) 6. Mercury (mercury, duh) 7. Venus (copper) After woodcuts by Friz Kredel, published in Stearn 1962.

 

But how did the symbols of Mars (iron) and Venus (copper) migrate to describe sex in biology? It seems obvious to us that of all symbols, that of the god of war be assigned to male, and the goddess of love to female (stereotypes nonwithstanding), but who was the first who did that?

The answer can be traced to one of the greatest biologists of all times: Carl Linnaeus. He is better known for being the father of modern taxonomy: Linnaeus  is the reason that we uniquely identify organisms using genus and species names in Latin grammatical form, a system known as Linneael binomial nomnclature. From Homo sapiens to Escherichia coli, we all owe our scientific names to Linnaeus.

But Linnaeus was also the one to appropriate the planet symbols to biology. In his notes, he used the Venus symbol as shorthand for female and the Mars symbol as shorthand for male. He also used Saturn to denote woody plants, the Sun for annual plants and Jupiter for perennials. As for gender, the Mercury symbol was used by Linnaeus for hermaphrodite plants. However, that symbol’s meaning has changed over the years, at least in scientific shorthand, and is now used to denote virgin female (e.g. in genetic analysis).  Mars was also used by Linnaeus, somewhat confusingly, for biennial plants.

But how did the symbols actually originate? The accepted thought now is that they were derived by the Roman from the Greek initial letters for the planets / deities. So Phosphoros  Φωσφόρος (Greek: “Morning Star” or later the planet Venus) was abbreviated to Φκ and Thouros (Mars) to θρ further contracted over the years, by metal workers, astrologers and alchemists to the modern symbols.

Kronos (saturn); Zeus (Jupiter); Thouros (Mars); Phosphoros (Venus) Stilbon (Mercury). After Stearn 1962

 

William T. Stearn (1962). The Origin of the Male and Female Symbols of Biology Taxon, 11 (4), 109-113

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

Microbial Art

 

We have some really talented students in our department. And I don’t just mean the science. I am honored to present the colorful and hilarious microbial artwork of Amber Beckett. Created between gel runs at Natosha Finley’s lab:

Cereal Dilutions. Credit: Amber Beckett

 

Pepe the protein. Credit: Amber Beckett

StreptoCOWcus and Bortadella persussis. Credit: Amber Beckett

 

 

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks