Short bioinformatics hacks: merging fastq files
So you received your mate-paired reads in two different files, and you need to merge them for your assembler. Here is a quick Python script to do that. You will need Biopython installed.
#!/usr/bin/env python from Bio import SeqIO import itertools import sys import os # Copyright(C) 2011 Iddo Friedberg # Released under Biopython license. http://www.biopython.org/DIST/LICENSE # Do not remove this comment def merge_fastq(fastq_path1, fastq_path2, outpath): outfile = open(outpath,"w") fastq_iter1 = SeqIO.parse(open(fastq_path1),"fastq") fastq_iter2 = SeqIO.parse(open(fastq_path2),"fastq") for rec1, rec2 in itertools.izip(fastq_iter1, fastq_iter2): SeqIO.write([rec1,rec2], outfile, "fastq") outfile.close() if __name__ == '__main__': outpath = "%s.merged.fastq" % os.path.splitext(sys.argv[1])[0] merge_fastq(sys.argv[1],sys.argv[2],outpath)
The neat trick is in line 13, using Python’s itertools to zip two iterators and loop over them in parallel two fastq records at a time.
How to use this script: download to a file you will call merge_fastq (or whatever). Then:
$ chmod +x merge_fastq
And you are ready to go.
$ ./merge_fastq myseq_1_.fastq myseq_2_.fastq
The merged file will be called myseq_1_.merged.fastq
Comments are closed.