Displaying posts tagged with

“programming”

The genomics programming language

Genomics is a new and exciting programming language based on Brainfsck. Here are the commands: g Move pointer to the right. e Move pointer to the left. n Increment the cell at the pointer. o Decrement the cell at the pointer. m Jump forward past the matching i if the cell at the current pointer [...]

Short bioinformatics hacks: reading mate-pairs from a fastq file

If you have a merged file of paired-end reads, here is a quick way to read them using Biopython: from Bio import SeqIO from itertools import izip_longest # Loop over pairs of reads readiter = SeqIO.parse(open(inpath), “fastq”) for rec1, rec2 in izip_longest(readiter, readiter): print rec1.id # do something with rec1 print rec2.id # do something [...]

Brainf**k while waiting for a flight

Warning: NSFW language. Brainfuck is a Turing-complete programming language consisting of eight commands, each of which is represented as a single character. > Increment the pointer. < Decrement the pointer. + Increment the cell at the pointer. – Decrement the cell at the pointer. . Output the ASCII value of the cell at the pointer. [...]

The Friedberg Lab is Recruiting Graduate Students

  The Friedberg Lab is recruiting graduate students, for both Master’s and Ph.D. WE ARE:  A dynamic young lab  interested in gene, gene cluster and genome evolution, understanding microbial communities and microbe-host interactions by metagenomic analyses, developing algorithms for understanding gene cluster evolution, and prediction of protein function from protein sequence and structure. YOU ARE: [...]

Short bioinformatics hacks: merging fastq files

So you received your mate-paired reads in two different files, and you need to merge them for your assembler. Here is a quick Python script to do that. You will need Biopython installed.   #!/usr/bin/env python from Bio import SeqIO import itertools import sys import os # Copyright(C) 2011 Iddo Friedberg # Released under Biopython [...]

You know your graduate student is frustrated when…

…you find this on the top of the paper pile on his desk:

The open source spammer: extracting email addresses from an openoffice.org document

I’m organizing a workshop later this month (see here, scroll to session V), and I have just received the attendees list from the main conference’s organizers. Since I need to spam send the attendees informative email on the specific workshop, I needed their email addresses. Here’s what I did. The file itself is MS Word [...]

Short bioinformatic hacks: reading between the genes

In celebration of the biohackathon happening now in Tokyo, I am putting up a script that is oddly missing from many bioinformatic packages: extracting intergenic regions. This one was written together with my student, Ian. As for the biohackathon itself, I’m not there, but I am following the tweets and  Brad Chapman’s excellent posts: Day [...]

Real programmers use…

A nice take on the vi / emacs wars Also, real programmers browse the web using the vimperator.

Going to GOA, pt. 2: children of a lesser GO

The source file associated with this post can be downloaded here. The last time I talked about how to read a GOA gene_associations file into a Python dictionary data structure.  Our goal was to find all genes that are annotated as hydrolases in the GOA gene_associations file. The tricky part is, most enzymes are not [...]

The Tao of Programming

I was recently reminded of this classic by Geoffrey James. Here are a few of my favorites. The whole text is available online. In the beginning was the Tao. The Tao gave birth to Space and Time. Therefore Space and Time are Yin and Yang of programming. Programmers that do not comprehend the Tao are [...]

Thankful for…

In no particular order or context. No personal stuff and by no means a complete list: WordPress (like, duh). Wikipedia (default for looking up new stuff) Wikis in general (great lab management tool. Don’t need LIMS) Open Access Publishing and Creative Commons licensing. FLOSS licensing (90% of the software I use, and 100% of what [...]

Short Bioinformatics Hacks: Glimmer Splitter

Glimmer is a program that predicts ORFs in bacterial and archeal genomes. The input is the assembled genome FASTA file, the output are several files of the predictions in different stages. The terminal output file is the .predict file. which looks something like this: >NODE_1_length_38001_cov_935.551880 orf00001 481      362  -2     1.45 orf00002      451      567  +1     0.59 [...]

Short bioinformatics hacks, ch. 2: chunk it.

First, a non-bioinformatic one liner, which is very relevant to most of us working on 3 different machines simultaneously, not including the 80 in our cluster. ssh-ing and giving your password each time is painful, and makes it almost impossible to do scripted file transfers, like backups. A good solution is shared key ssh in [...]

Short Bioinformatics Hacks, ch. 1

In any programming gig, and that includes bioinformatics, a lot of repeat scriptology comes cropping up. I decided to share some of that, pro publico bono, and also because I hope to start some sort of ongoing cookbook  for short bioinformatics hacks. If you have any cool short scripts you like to share, please email [...]