second generation sequencing | Byte Size Biology

Bioinformatics, programming, Software Comments turned off

Short bioinformatics hacks: merging fastq files

By Iddo on August 25th, 2011

So you received your mate-paired reads in two different files, and you need to merge them for your assembler. Here is a quick Python script to do that. You will need Biopython installed. #!/usr/bin/env python from Bio import SeqIO import itertools import sys import os # Copyright(C) 2011 Iddo Friedberg # Released under Biopython […]

Continue reading →

Genomics, Molecular biology 2 comments

Why it’s hard to assemble repetitive DNA regions

By Iddo on January 22nd, 2011

So here are EssOh and OhOne assembling a rather frustrating puzzle containing cows. The same 5-6 cow “characters” are repeated, which is a perfect way to illustrate low-complexity DNA sequences, and why they are hard to assemble, especially when the pieces are small, like those you get from some second generation sequencers.

Continue reading →

Bioinformatics, Genomics 1 comment

Closing gaps

By Iddo on May 30th, 2010

Geek alert: this post for coders. So you sequenced your genome, reached an optimally small number of contigs, they look sane, and now you would like to see what you need for the finishing stage. Namely, how many gaps you have and what are their sizes. UPDATE: “might just be worth clarifying this is for […]

Continue reading →

Byte Size Biology

The musings and ravings of a computational biologist about science, computers, music and, you know, stuff