Short bioinformatics hacks: merging fastq files
So you received your mate-paired reads in two different files, and you need to merge them for your assembler. Here is a quick Python script to do that. You will need Biopython installed.
#!/usr/bin/env python
from Bio import SeqIO
import itertools
import sys
import os
# Copyright(C) 2011 Iddo Friedberg
# Released under Biopython license. http://www.biopython.org/DIST/LICENSE
# Do not remove this comment
def merge_fastq(fastq_path1, fastq_path2, outpath):
outfile = open(outpath,"w")
fastq_iter1 = SeqIO.parse(open(fastq_path1),"fastq")
fastq_iter2 = SeqIO.parse(open(fastq_path2),"fastq")
for rec1, rec2 in itertools.izip(fastq_iter1, fastq_iter2):
SeqIO.write([rec1,rec2], outfile, "fastq")
outfile.close()
if __name__ == '__main__':
outpath = "%s.merged.fastq" % os.path.splitext(sys.argv[1])[0]
merge_fastq(sys.argv[1],sys.argv[2],outpath)
The neat trick is in line 13, using Python’s itertools to zip two iterators and loop over them in parallel two fastq records at a time.
How to use this script: download to a file you will call merge_fastq (or whatever). Then:
$ chmod +x merge_fastq
And you are ready to go.
$ ./merge_fastq myseq_1_.fastq myseq_2_.fastq
The merged file will be called myseq_1_.merged.fastq
Comments are closed.


















