<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Byte Size Biology &#187; open source software</title>
	<atom:link href="http://bytesizebio.net/index.php/category/free-culture/open-source-software-free-culture/feed/" rel="self" type="application/rss+xml" />
	<link>http://bytesizebio.net</link>
	<description>The musings and ravings of a computational biologist about science, computers, music and, you know, stuff</description>
	<lastBuildDate>Mon, 06 Feb 2012 13:32:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Open Access: the Revolution Will be Convenient</title>
		<link>http://bytesizebio.net/index.php/2011/01/19/open-access-the-revolution-will-be-convenient/</link>
		<comments>http://bytesizebio.net/index.php/2011/01/19/open-access-the-revolution-will-be-convenient/#comments</comments>
		<pubDate>Wed, 19 Jan 2011 10:56:23 +0000</pubDate>
		<dc:creator>Iddo</dc:creator>
				<category><![CDATA[creative commons]]></category>
		<category><![CDATA[Funding]]></category>
		<category><![CDATA[open access]]></category>
		<category><![CDATA[open source software]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[grandmothers]]></category>
		<category><![CDATA[science funding]]></category>

		<guid isPermaLink="false">http://bytesizebio.net/?p=4543</guid>
		<description><![CDATA[Some time ago an article in Linux Journal discussed the adoption of free/open course software (FOSS) by the general public. The article (I can&#8217;t seem to find it now) talked about the people that do not care about the distinction between Free as in Free Beer vs. Free as in Freedom (libre). They want software [...]]]></description>
			<content:encoded><![CDATA[<div>Some time ago an article in <em>Linux Journal</em> discussed the adoption of <a href="http://en.wikipedia.org/wiki/Free_and_open_source_software" target="_blank">free/open course software (FOSS)</a> by the general public. The article (I can&#8217;t seem to find it now) talked about the people that do not care about the distinction between Free as in Free Beer vs. <a href="http://oreilly.com/openbook/freedom/" target="_blank">Free as in Freedom</a> (<em>libre</em>). They want software that works, and they are even willing to pay for it, although free would be nice. Also, the lack of licensing hassles is a serious bonus. The Open Source advocates and developers are the ones who care deeply about the dissemination model: code should be available more modification and reuse. Not because of the price tag, but because not sharing code hinders development. The success stories of the open source model are obvious: Internet and WWW protocols are open source, most servers are Linux based, Mac OSX is based on FreeBSD, and I&#8217;m writing this post from a Linux machine on WordPress. Also, the programmers and FOSS advocates are not starving: they are selling books, documentation, maintenance services and penguin T-shirts. My university is switching to Sakai, a <a href="http://sakaiproject.org/" target="_blank">FOSS based course management system</a> and they are hiring programmers to maintain it. The IT managers realize (I hope!) that the adoption of Sakai will not &#8220;free&#8221; as in no $$$, as these programmers will cost money.  The benefit of such a system over the closed system we have used so far would be to draw upon the general knowledge of the Sakai users community, and to be able to adopt and adapt modules for a learning system suited to my university&#8217;s needs.</div>
<p>﻿</p>
<div>
<div class="wp-caption alignnone" style="width: 442px"><a href="http://farm4.static.flickr.com/3461/3394518027_68115e9b26_o.jpg"><img class=" " title="Grandmother Laptop " src="http://farm4.static.flickr.com/3461/3394518027_68115e9b26_o.jpg" alt="" width="432" height="432" /></a><p class="wp-caption-text">Credit: mcwetboy on Flickr http://www.flickr.com/photos/mcwetboy/3394518027/</p></div>
</div>
<div><a href="http://www.android.com/" target="_blank">Android</a> is a Linux-based operating system for smartphones which works great. One of the reasons <a href="http://www.readwriteweb.com/archives/android_steals_market_share_from_iphone.php">Android gained such a large market share from Apple&#8217;s iPhone</a> is Android&#8217;s <a href="http://ostatic.com/blog/android-to-offer-a-foss-friendly-marketplace" target="_blank">FOSS-friendliness for app developers</a>, as well as the operating system&#8217;s portability to many platforms.</div>
<div>The not-so-successful story is FOSS in desktops. Windows still rules, and frankly up until recently, Linux desktops were not that great. They failed the &#8220;grandmother test&#8221;, in which you got your grandmother who is used to windows to try and adopt Linux. There was too much under-the-hood knowledge needed for granny to be able to even do her email and word processing on a Linux machine. I believe that now the main hindrance to adopting Linux as a desktop is not the granny test, but simply things like inertia and compatibility of certain software. The Linux desktop is quite usable now.</div>
<div>Which brings us to the point that the adoption of FOSS by most computer users is not one of ideology, but of convenience. If they can get the job done for free, fine. If they have to pay some money for it, fine too, as long as they are not milked into continuous upgrade and support (and sometimes even that works). But they want a convenient and familiar working platform. Linux is a choice for servers because it is much better than Windows server. Android is cheaper and has more apps than iPhone, (in a large part due to the open development model) and you are not locked into hardware. Purchasers of Android phones take all of these into consideration, not the openness of the system, since most of them will never use Android in a way which directly exploits its openness. Yes, they do benefit indirectly from openness, but  that is not what attracts them.</div>
<div>So what has Open Access (the title) has to do with Open Source?</div>
<div>I believe that the advocates of scientific Open Access publication are in the same situation that Open Source advocates were in a few years ago. Advocates of both OA and FOSS models had to fight interest groups to gain acceptance. The respective fights have been mostly won. Both OA and FOSS have gained enough traction to stay and even be adopted, to some extent, by some of their previous opponents from the respective industries of publishing and software.</div>
<div>However, OA  adoption is not yet quite wide-spread.  From a recent <a href="http://news.sciencemag.org/scienceinsider/2011/01/quandary-scientists-prefer-readi.html" target="_blank">poll published in Science</a>, only 10% of the published papers are in OA journals, but 90% of  scientists support OA.  So OA is a good idea, but few adopt it. Reason: by analogy ot the Linux desktop, OA does not quite yet fit &#8220;user&#8221; expectations. You might say OA fails the &#8220;old professor&#8221; test.  it appears that most scientists care primarily about two things: the perceived prestige of the publication venue, and the associated price tag(*).  Also, most of the scientists polled did not care about such things as retaining copyright and <a href="http://creativecommons.org/" target="_blank">Creative Commons</a> (CC) licensing. These are the equivalent of Android users that do not care (or even know) about Open Source licensing. From a non-representative polling of my colleagues, it seems to me that many are unaware of  CC  and see licensing issues as niceties rather than essentials. So like in the world of Open Source it is convenience, rather than ideology, that will determine the adoption of Open Access. How much does it cost? Is it in a &#8220;good&#8221; journal? Those are the equivalent questions to those that your grandmother may ask: &#8220;can I email my grandkids with it&#8221; and &#8220;can I do my taxes with it&#8221;?</div>
<div>So while the Open Access movement, like the FOSS movement, is fueled by an ideal, and people carrying this ideal, the ultimate adoption will be one of convenience and self-interest.</div>
<div>Finally, here is a slideshow of the  Open Access poll highlights, from the website of the <a href="http://project-soap.eu/" target="_blank">Study of Open Access Publishing</a> project.</div>
<div id="__ss_5867180" style="width: 425px;"><strong><a title="SoapFall2010" href="http://www.slideshare.net/ProjectSoap/soapfall2010">SoapFall2010</a></strong><object id="__sse5867180" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=soap-fall2010-101122173809-phpapp02&amp;stripped_title=soapfall2010&amp;userName=ProjectSoap" /><param name="name" value="__sse5867180" /><param name="allowfullscreen" value="true" /><embed id="__sse5867180" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=soap-fall2010-101122173809-phpapp02&amp;stripped_title=soapfall2010&amp;userName=ProjectSoap" name="__sse5867180" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="padding: 5px 0 12px;">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/ProjectSoap">Project SOAP</a>.</div>
</div>
<div>&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</div>
<div>(*) One comment about the price tag: a lot has been said about how libraries have to pay to maintain subscription to toll-access journals, how that fee is rolled over to researchers in terms of overhead, and how open-access can eliminate that. I doubt widespread adoption of Open Access publication model would make much of a difference, but I confess I don&#8217;t understand very well how the economics of science publication work. Even with a wide adoption of open access, that would mean replacing one line item (overhead) with another (publication fees).  Also in the past, University of California researchers have threatened boycotts against Cell Press and Nature Publishing Group when the subscription hikes were deemed to high. Yes, institutional fees are part of the price tag. But also, most researchers would go with a closed-access subscriber-pays model, as long as the price is not perceived as exorbitant.</div>
]]></content:encoded>
			<wfw:commentRss>http://bytesizebio.net/index.php/2011/01/19/open-access-the-revolution-will-be-convenient/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>My Hype Cycle</title>
		<link>http://bytesizebio.net/index.php/2010/11/25/my-hype-cycle/</link>
		<comments>http://bytesizebio.net/index.php/2010/11/25/my-hype-cycle/#comments</comments>
		<pubDate>Thu, 25 Nov 2010 21:08:11 +0000</pubDate>
		<dc:creator>Iddo</dc:creator>
				<category><![CDATA[Free Culture]]></category>
		<category><![CDATA[open source software]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Funny]]></category>
		<category><![CDATA[hype cycle]]></category>
		<category><![CDATA[science culture]]></category>
		<category><![CDATA[technology culture]]></category>

		<guid isPermaLink="false">http://bytesizebio.net/?p=4302</guid>
		<description><![CDATA[The hype cycle characterizes the over-excitement and subsequent disappointment with new technologies. I expanded this a bit to include research and social trends in science which seem prevalent nowadays. Any views represented in this hype cycle diagram are my own, and in no way represent the  views of my employers, family, friends, neighbors, greengrocer, auto [...]]]></description>
			<content:encoded><![CDATA[<p>The hype cycle characterizes the over-excitement and subsequent disappointment with new technologies. I expanded this a bit to include research and social trends in science which seem prevalent nowadays.</p>
<p>Any views represented in this hype cycle diagram are my own, and in no way represent the  views of my employers, family, friends, neighbors, greengrocer, auto mechanic, my skin microbiome or my internet provider who just slapped me with a 30% fee increase.</p>
<div id="attachment_4303" class="wp-caption alignnone" style="width: 591px"><a href="http://bytesizebio.net/wp-content/uploads/2010/11/hypecycle.png"><img class="size-large wp-image-4303  " title="hypecycle" src="http://bytesizebio.net/wp-content/uploads/2010/11/hypecycle-1024x469.png" alt="" width="581" height="266" /></a><p class="wp-caption-text">Click for full size. Template (without writing) taken from Wikimedia Commons, under GFDL. Credit for template: Jeremy Kemp.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://bytesizebio.net/index.php/2010/11/25/my-hype-cycle/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Bioinformatics Open Source Conference 2010 (and a poll)</title>
		<link>http://bytesizebio.net/index.php/2010/06/14/bioinformatics-open-source-conference-2010-and-a-poll/</link>
		<comments>http://bytesizebio.net/index.php/2010/06/14/bioinformatics-open-source-conference-2010-and-a-poll/#comments</comments>
		<pubDate>Mon, 14 Jun 2010 15:22:29 +0000</pubDate>
		<dc:creator>Iddo</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[open source software]]></category>
		<category><![CDATA[Biojava]]></category>
		<category><![CDATA[Bioperl]]></category>
		<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Bioruby]]></category>
		<category><![CDATA[BOSC]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[meeting]]></category>

		<guid isPermaLink="false">http://bytesizebio.net/?p=3745</guid>
		<description><![CDATA[The 11th Annual Bioinformatics Open Source Conference (BOSC) 2010 is coming up in Boston, July 9-10 2010. The BOSC meetings are a great get-together of a community of programmers who are like-minded in their advocacy of open source code for science, and specifically for bioinformatics. The whole thing is run by volunteers who take a [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.open-bio.org/wiki/BOSC_2010" target="_blank">11th Annual Bioinformatics Open Source Conference</a> (BOSC) 2010 is coming up in Boston, July 9-10 2010. The BOSC meetings are a great get-together of a community of programmers who are like-minded in their advocacy of open source code for science, and specifically for bioinformatics. The whole thing is run by volunteers who take a lot of time and effort to bring a top-notch meeting every year, so a big thanks to this year&#8217;s <a href="http://www.open-bio.org/wiki/BOSC_2010#Organizing_Committee" target="_blank">organizing committee</a>!</p>
<p>If you are reading this, and you are in Boston on those dates, consider showing up, it is a great experience. There will also be a <a href="http://www.open-bio.org/wiki/Codefest_2010" target="_blank">codefest</a> on the two days before the meeting. This year&#8217;s topic is <a href="http://www.open-bio.org/wiki/Codefest_2010" target="_blank">cloud computing for bioinformatics</a>. If you like using <a href="http://aws.amazon.com/" target="_blank">AWS</a> for bioinformatics or if you want to learn more, this is your chance. Amazon have provided a <a href="http://aws.amazon.com/education/" target="_blank">grant </a>towards this codefest. (Thanks!) <a href="http://www.biopython.org/wiki/Biopython" target="_blank">Biopython</a>, <a href="http://www.bioperl.org/wiki/Main_Page" target="_blank">Bioperl</a>, <a href="http://www.biojava.org/wiki/Main_Page" target="_blank">Biojava</a> and <a href="http://www.bioruby.org/" target="_blank">Bioruby</a> developers will all be there, tailoring code to the cloud.</p>
<p>Which brings me to the latest poll: if you are a bioinformatics programmer, which of the Bio* packages  are you using in your programming, if any? If more than one, check the one you use most frequently. Poll answers on the right. As with all Internet polls, you must be crazy if you take it at all seriously.</p>
]]></content:encoded>
			<wfw:commentRss>http://bytesizebio.net/index.php/2010/06/14/bioinformatics-open-source-conference-2010-and-a-poll/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>AMOS on Ubuntu</title>
		<link>http://bytesizebio.net/index.php/2010/04/19/amos-on-ubuntu/</link>
		<comments>http://bytesizebio.net/index.php/2010/04/19/amos-on-ubuntu/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 14:44:27 +0000</pubDate>
		<dc:creator>Iddo</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[open source software]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[AMOS]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[software installation]]></category>

		<guid isPermaLink="false">http://bytesizebio.net/?p=3485</guid>
		<description><![CDATA[AMOS is a suite of genome assembly and editing software. It includes assemblers, validation, visualization, and scaffolding tools.  I have been having some issues installing AMOS on Ubuntu  9.10.  Specifically, Ubuntu 9.10 has gcc 4.4, which breaks the compilation of the AMOS release version. However, the development version has been fixed to accommodate that. If [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS" target="_blank">AMOS</a> is a suite of genome assembly and editing software. It includes assemblers, validation, visualization, and scaffolding tools.  I have been having some issues installing AMOS on Ubuntu  9.10.  Specifically, Ubuntu 9.10 has gcc 4.4, which breaks the compilation of the AMOS release version. However, the development version has been fixed to accommodate that.</p>
<p>If you don&#8217;t know which Ubuntu version you are running, type:</p>
<pre>$ lsb_release -a</pre>
<p>No more than fifteen minutes after I posted my Q to the <a href="http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS#Bug_reports_and_support" target="_blank">amos-help</a> mailing list, <a href="http://www.awmc.uq.edu.au/index.html?page=128901" target="_blank">Florent Angly</a> came through with a solution. I am posting his email here.</p>
<blockquote><p>Hi,</p>
<p>This issue was fixed in the development version of AMOS. See below for instructions on how to install this version on Ubuntu:</p>
<p>Download either the regular or development version of AMOS. As of April 4, 2010,<br />
Minimo is only available from the development version of AMOS.<br />
i/ The regular AMOS version is available from <a href="http://sourceforge.net/projects/amos/files/" target="_blank">http://sourceforge.net/projects/amos/files/</a>, e.g.:<br />
$ wget <a href="http://sourceforge.net/projects/amos/files/amos/2.0.8/amos-2.0.8.tar.gz/download" target="_blank">http://sourceforge.net/projects/amos/files/amos/2.0.8/amos-2.0.8.tar.gz/download</a><br />
ii/ The development version of AMOS is in a CVS repository. To get it, run:<br />
$ cvs -z3 -d:pserver:anonymous@amos.cvs.sourceforge.net:/cvsroot/amos co -P AMOS</p>
<p>In the directory where the AMOS file are located, run the following to install<br />
the prerequisites:<br />
$ sudo aptitude install ash coreutils gawk gcc automake mummer mummer-doc libboost-dev</p>
<p>For the Hawkeye component of AMOS, you need Qt3:<br />
$ sudo aptitude install libqt3-headers</p>
<p>For the standard version of AMOS, skip to next step, but for the CVS development version, first, run:<br />
$ ./bootstrap</p>
<p>Then regardless of the version:<br />
$ ./configure &#8211;with-Qt-dir=/usr/share/qt3 &#8211;prefix=/usr/local/AMOS<br />
$ make<br />
$ make check<br />
$ sudo make install<br />
$ sudo ln -s /usr/local/AMOS/bin/* /usr/local/bin/</p>
<p>Now all the programs shipped in AMOS should be available from the command-line.<br />
For example try:<br />
$ Minimo -h<br />
Regards,</p>
<p>Florent</p></blockquote>
<p>You will need the AMOS development version for Ubuntu 9.10 (and above, presumably), but the regular version for 9.04 (and below). If you are getting the development version, you will also need to install cvs on your machine:</p>
<pre>$ sudo aptitude install cvs</pre>
<p>Hope this helps anyone struggling with installing AMOS on Ubuntu or other Linux platforms.</p>
]]></content:encoded>
			<wfw:commentRss>http://bytesizebio.net/index.php/2010/04/19/amos-on-ubuntu/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Short bioinformatic hacks: reading between the genes</title>
		<link>http://bytesizebio.net/index.php/2010/02/11/short-bioinformatic-hacks-reading-between-the-genes/</link>
		<comments>http://bytesizebio.net/index.php/2010/02/11/short-bioinformatic-hacks-reading-between-the-genes/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 03:53:10 +0000</pubDate>
		<dc:creator>Iddo</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[open source software]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://bytesizebio.net/?p=3009</guid>
		<description><![CDATA[In celebration of the biohackathon happening now in Tokyo, I am putting up a script that is oddly missing from many bioinformatic packages: extracting intergenic regions. This one was written together with my student, Ian. As for the biohackathon itself, I&#8217;m not there, but I am following the tweets and  Brad Chapman&#8217;s excellent posts: Day [...]]]></description>
			<content:encoded><![CDATA[<p>In celebration of the <a href="http://hackathon3.dbcls.jp/" target="_blank">biohackathon</a> happening now in Tokyo, I am putting up a script that is oddly missing from many bioinformatic packages: extracting intergenic regions. This one was written together with my student, Ian. As for the biohackathon itself, I&#8217;m not there, but I am following the <a href="http://twitter.com/#search?q=%23biohackathon" target="_blank">tweets</a> and  Brad Chapman&#8217;s excellent posts:</p>
<ul>
<li><a href="http://chapmanb.posterous.com/biohackathon-2010-day-1">Day 1</a></li>
<li><a href="http://chapmanb.posterous.com/biohackathon-2010-day-2-python-sparql-query-b">Day 2</a></li>
<li><a href="http://chapmanb.posterous.com/biohackathon-2010-day-3-fish-interoperating-a">Day 3</a></li>
<li><a href="http://chapmanb.posterous.com/biohackathon-2010-day-4-improved-python-sparq">Day 4</a></li>
</ul>
<p>About intergenic regions: intergenic regions are as interesting and sometimes even more interesting than the genes themselves: when you are interested in promoters, transcription factor binding sites or almost any other transcription regulation mechanism. Here&#8217;s a simple script to find intergenic regions. It reads a genbank formatted file and uses the information there to extract the intergenic regions. The sequences are written to a FASTA file.</p>
<pre class="brush:python">#!/usr/bin/env python
import sys
import Bio
from Bio import SeqIO, SeqFeature
from Bio.SeqRecord import SeqRecord
import os

# Copyright(C) 2009 Iddo Friedberg &amp; Ian MC Fleming
# Released under Biopython license. http://www.biopython.org/DIST/LICENSE
# Do not remove this comment
def get_interregions(genbank_path,intergene_length=1):
    seq_record = SeqIO.parse(open(genbank_path), "genbank").next()
    cds_list_plus = []
    cds_list_minus = []
    intergenic_records = []
    # Loop over the genome file, get the CDS features on each of the strands
    for feature in seq_record.features:
        if feature.type == 'CDS':
            mystart = feature.location._start.position
            myend = feature.location._end.position
            if feature.strand == -1:
                cds_list_minus.append((mystart,myend,-1))
            elif feature.strand == 1:
                cds_list_plus.append((mystart,myend,1))
            else:
                sys.stderr.write("No strand indicated %d-%d. Assuming +\n" %
                                  (mystart, myend))
                cds_list_plus.append((mystart,myend,1))

    for i,pospair in enumerate(cds_list_plus[1:]):
        # Compare current start position to previous end position
        last_end = cds_list_plus[i][1]
        this_start = pospair[0]
        strand = pospair[2]
        if this_start - last_end &gt;= intergene_length:
            intergene_seq = seq_record.seq[last_end:this_start]
            strand_string = "+"
            intergenic_records.append(
                  SeqRecord(intergene_seq,id="%s-ign-%d" % (seq_record.name,i),
                  description="%s %d-%d %s" % (seq_record.name, last_end+1,
                                                        this_start,strand_string)))
    for i,pospair in enumerate(cds_list_minus[1:]):
        last_end = cds_list_minus[i][1]
        this_start = pospair[0]
        strand = pospair[2]
        if this_start - last_end &gt;= intergene_length:
            intergene_seq = seq_record.seq[last_end:this_start]
            strand_string = "-"
            intergenic_records.append(
                  SeqRecord(intergene_seq,id="%s-ign-%d" % (seq_record.name,i),
                  description="%s %d-%d %s" % (seq_record.name, last_end+1,
                                                        this_start,strand_string)))
    outpath = os.path.splitext(os.path.basename(genbank_path))[0] + "_ign.fasta"
    SeqIO.write(intergenic_records, open(outpath,"w"), "fasta")

if __name__ == '__main__':
    if len(sys.argv) == 2:
         get_interregions(sys.argv[1])
    elif len(sys.argv) == 3:
         get_interregions(sys.argv[1],int(sys.argv[2]))
    else:
         print "Usage: get_intergenic.py gb_file [intergenic_length]"
         sys.exit(0)</pre>
<p>What are we seeing here?</p>
<p>Lines 11-16 are the preamble: we read the GenBank file using Biopython&#8217;s genbank parser in line 12.  Beacuse we expect a genome file, which contains a single record, this is a one-time read. Note that this is a rate limiting step, and can take a couple of seconds. Took me ~2secs to read the full <em>E. coli</em> genome on my Linux box.  We prepare one list for the + strand intergenic regions (13), another one for the minus strand intergenic regions (14) and one for all the records (line 15).</p>
<p>The rest of the code are three loop blocks: lines 16-28 I loop over the genbank features, extracting the coordinated of the genes themselves. Line 32-41 I find the intergenic regions on the + strand. Lines 42-52 I do the same for the &#8220;-&#8221; strand.</p>
<p>Now for a philosophical interlude: although there is a way to read all the intergenic regions in a single pass, I subscribe to the &#8220;code simple&#8221; doctrine of research software writing. Code performance optimization is a low priority for me. I&#8217;d much rather have something that is simple to write,read and modify. I also don&#8217;t want to spend too much time coding and elegant script for elegance&#8217;s sake, especially if I may not use it too much. Historically, scientific code written for research is mostly extinct: thrown away after a short lived hypothesis was tested and ended its days. Research coding is mostly throwaway glue code. Very rarely it matures into a product. Then, and only then, can you apply all those fine software engineering you learned in college. Before that, write fast and simple.</p>
<p>But I digress. Line 17 loops over the features in the genome file. Line 18 we identify if it is a coding sequence (CDS). If so, we identify the start position, and position and the strand the CDS is on. The list cds_list_minus is a list of 2-tuples. Each 2-tuple is the start and end positions of a CDS on the minus strand. (If you would like to go over the genes, as opposed to coding sequences, change line 18 to:</p>
<pre>if feature.type == 'gene':</pre>
<p>(or better yet, pass an argument that defines it.)</p>
<p>cds_list_plus, is, yes, the same as cds_list_minus, only for the plus strand (line 24).</p>
<p>Sometime, a CDS does not contain information on which strand it is. With genome files, that is usually the case with single stranded viral genomes. Therefore, we put in the default assumption that if there is no strand indication, then the feature is is on the plus strand. We generate a warning message nevertheless (lines 25-28).</p>
<p>Lines 30-41 we loop over the plus  strand list, and identify the coordinates between the genes. Python&#8217;s <strong>enumerate</strong> function is very useful here. The <strong>enumerate</strong> function allows us to iterate over a list, but at the same time keep track of which index we are in when looping over the list. So in line 30, <strong>pospair</strong> receives the start and end coordinates of a CDS as a 2-tuple, while <strong>i</strong> receives the actual number if the index in the plus strand CDS list. In that way, we can look back to the previous list member, find the coordinates where that CDS ends, and where the current CDS begins. The two coordinates make up the beginning and end of the intergenic regions between those two genes on that strand. In line 35 we check if the intergenic region length is equal to or larger than a threshold: suppose we are only interested in those intergenic regions that are longer than 100 bases? (The default value is 1, see line 11.) In lines 38-39 we build a biopython sequence object that contains an informative header, and the sequence of the intergenic region. The description which goes in the sequence header contains the start and end coordinates of the intergenic region, and the locus ID of the CDS directly downstream from it. The sequence object is appended to a list, which will eventually get written (lines 40-41).</p>
<p>Lines 42-52 are a repeat of lines 30-41, only for the minus strand.  Lines 53 &amp; 54: the list that contains all the intergenic region sequence objects gets written to its own fasta file.</p>
<p>Finally, line 56-63 are boilerplate code, that make this script runnable from the command line. Have fun looking at intergenic regions. Let me know of you find something interesting.</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2010/02/READ-BETWEEN-THE-LINES-57228.jpg"><img class="size-large wp-image-3227 alignnone" title="Read between the lines" src="http://bytesizebio.net/wp-content/uploads/2010/02/READ-BETWEEN-THE-LINES-57228-1024x783.jpg" alt="" width="491" height="376" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://bytesizebio.net/index.php/2010/02/11/short-bioinformatic-hacks-reading-between-the-genes/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Thankful for&#8230;</title>
		<link>http://bytesizebio.net/index.php/2009/11/26/thankful-for/</link>
		<comments>http://bytesizebio.net/index.php/2009/11/26/thankful-for/#comments</comments>
		<pubDate>Thu, 26 Nov 2009 22:24:32 +0000</pubDate>
		<dc:creator>Iddo</dc:creator>
				<category><![CDATA[blogging]]></category>
		<category><![CDATA[Free Culture]]></category>
		<category><![CDATA[open source software]]></category>
		<category><![CDATA[Social media]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[blast]]></category>
		<category><![CDATA[miscellaneous]]></category>
		<category><![CDATA[Music]]></category>
		<category><![CDATA[NIH]]></category>
		<category><![CDATA[open access]]></category>
		<category><![CDATA[Open Science]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://bytesizebio.net/?p=2772</guid>
		<description><![CDATA[In no particular order or context. No personal stuff and by no means a complete list: WordPress (like, duh). Wikipedia (default for looking up new stuff) Wikis in general (great lab management tool. Don&#8217;t need LIMS) Open Access Publishing and Creative Commons licensing. FLOSS licensing (90% of the software I use, and 100% of what [...]]]></description>
			<content:encoded><![CDATA[<p>In no particular order or context. No personal stuff and by no means a complete list:</p>
<p><a href="http://wordpress.org" target="_blank">WordPress</a> (like, duh).</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/icon_big.png"><img class="size-thumbnail wp-image-2773 alignnone" title="icon_big" src="http://bytesizebio.net/wp-content/uploads/2009/11/icon_big-150x133.png" alt="icon_big" width="90" height="80" /></a></p>
<p><a href="http://www.wikipedia.org/" target="_blank">Wikipedia</a> (default for looking up new stuff)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/600px-Wikipedia-logo.svg_.png"><img class="size-thumbnail wp-image-2774 alignnone" title="600px-Wikipedia-logo.svg" src="http://bytesizebio.net/wp-content/uploads/2009/11/600px-Wikipedia-logo.svg_-150x150.png" alt="600px-Wikipedia-logo.svg" width="90" height="90" /></a></p>
<p><a href="http://en.wikipedia.org/wiki/Wiki">Wikis in general</a> (great lab management tool. Don&#8217;t need LIMS)</p>
<p><a href="http://www.earlham.edu/~peters/fos/bethesda.htm" target="_blank">Open Access Publishing</a> and <a href="http://creativecommons.org/" target="_blank">Creative Commons</a> licensing.</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/cc.logo_.circle.jpg"><img class="alignnone size-thumbnail wp-image-2776" title="cc.logo.circle" src="http://bytesizebio.net/wp-content/uploads/2009/11/cc.logo_.circle-149x150.jpg" alt="cc.logo.circle" width="89" height="90" /></a></p>
<p><a href="http://www.opensource.org/" target="_blank">FLOSS licensing</a> (90% of the software I use, and 100% of what I write)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/opensource-logo.jpg"><img class="alignnone size-thumbnail wp-image-2777" title="opensource-logo" src="http://bytesizebio.net/wp-content/uploads/2009/11/opensource-logo-150x101.jpg" alt="opensource-logo" width="90" height="61" /></a></p>
<p>Science Bloggers (too numerous to link)</p>
<p>Science <a href="http://twitter.com/" target="_blank">tweeters</a> and <a href="http://friendfeed.com" target="_blank">FriendFeeders</a> (too numerous to link. That&#8217;s how I keep up with things)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/Facebook+Friendfeed-VS-Twitter.png"><img class="alignnone size-thumbnail wp-image-2778" title="Facebook+Friendfeed-VS-Twitter" src="http://bytesizebio.net/wp-content/uploads/2009/11/Facebook+Friendfeed-VS-Twitter-150x141.png" alt="Facebook+Friendfeed-VS-Twitter" width="90" height="85" /></a></p>
<p><a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi" target="_blank">BLAST</a> (Sometimes it feels like bioinformatics is should be renamed to blastology)</p>
<p><a href="http://www.latex-project.org/" target="_blank">LaTeX</a> (Wrote my dissertation in LaTeX, and never looked back)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/latex_lion.gif"><img class="alignnone size-thumbnail wp-image-2779" title="latex_lion" src="http://bytesizebio.net/wp-content/uploads/2009/11/latex_lion-145x150.gif" alt="latex_lion" width="71" height="74" /></a></p>
<p><a href="http://openoffice.org" target="_blank">OpenOffice.org</a> (because not everyone uses LaTeX).</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/OpenOfficeLogo.jpg"><img class="alignnone size-thumbnail wp-image-2780" title="OpenOfficeLogo" src="http://bytesizebio.net/wp-content/uploads/2009/11/OpenOfficeLogo-150x129.jpg" alt="OpenOfficeLogo" width="90" height="77" /></a></p>
<p><a href="http://www.citeulike.org/" target="_blank">CiteULike</a> (Keeping my reference library up to date and in good order)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/Citeulike_logo.png"><img class="alignnone size-thumbnail wp-image-2781" title="Citeulike_logo" src="http://bytesizebio.net/wp-content/uploads/2009/11/Citeulike_logo-150x37.png" alt="Citeulike_logo" width="150" height="37" /></a></p>
<p><a href="http://delicious.com/" target="_blank">Delicious</a> (Keeping my bookmarks up to date and in good order)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/delicious_logo.gif"><img class="alignnone size-thumbnail wp-image-2782" title="delicious_logo" src="http://bytesizebio.net/wp-content/uploads/2009/11/delicious_logo-150x150.gif" alt="delicious_logo" width="43" height="43" /></a></p>
<p>Gmail (because finding that document you sent me a month ago would be impossible otherwise)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/super-gmail-logo.png"><img class="alignnone size-thumbnail wp-image-2787" title="super-gmail-logo" src="http://bytesizebio.net/wp-content/uploads/2009/11/super-gmail-logo-150x111.png" alt="super-gmail-logo" width="72" height="54" /></a></p>
<p><a href="http://scholar.google.com/" target="_blank">Google Scholar </a>(For standing on the toes of Hobbits. Or something like that)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/mainG.png"><img class="alignnone size-thumbnail wp-image-2788" title="mainG" src="http://bytesizebio.net/wp-content/uploads/2009/11/mainG-150x84.png" alt="mainG" width="150" height="84" /></a></p>
<p><a href="http://images.google.com/images?client=firefox-a&amp;rls=com.ubuntu:en-US:unofficial&amp;um=1&amp;q=turkey+bird&amp;start=0" target="_blank">GIS</a> (for blogging and making class slides)</p>
<p><a href="http://www.vim.org" target="_blank">Vim</a> (because emacs blows)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/vim-editor_logo.png"><img class="alignnone size-thumbnail wp-image-2792" title="vim-editor_logo" src="http://bytesizebio.net/wp-content/uploads/2009/11/vim-editor_logo-150x150.png" alt="vim-editor_logo" width="54" height="54" /></a></p>
<p><a href="http://python.org" target="_blank">Python</a> (ease &amp; power)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/python_logo_without_textsvg.png"><img class="alignnone size-thumbnail wp-image-2783" title="python_logo_without_textsvg" src="http://bytesizebio.net/wp-content/uploads/2009/11/python_logo_without_textsvg-150x150.png" alt="python_logo_without_textsvg" width="54" height="54" /></a></p>
<p><a href="http://biopython.org">Biopython</a> (OK, conflict of interest here, since I contributed a bit)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/biopython.jpg"><img class="alignnone size-thumbnail wp-image-2784" title="biopython" src="http://bytesizebio.net/wp-content/uploads/2009/11/biopython-150x42.jpg" alt="biopython" width="150" height="42" /></a></p>
<p><a href="http://www.cas.muohio.edu/micro/people/">Friendly colleagues</a> (They certainly are!)</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/umured7.png"><img class="alignnone size-thumbnail wp-image-2785" title="umured7" src="http://bytesizebio.net/wp-content/uploads/2009/11/umured7-150x43.png" alt="umured7" width="150" height="43" /></a></p>
<p>Good students (gotta make my lab page).</p>
<p>Goulash for dinner. Can&#8217;t stand oven Turkey.</p>
<p><a href="http://bytesizebio.net/wp-content/uploads/2009/11/turkey.jpg"><img class="alignnone size-thumbnail wp-image-2786" title="turkey" src="http://bytesizebio.net/wp-content/uploads/2009/11/turkey-103x150.jpg" alt="turkey" width="103" height="150" /></a></p>
<p>Music. Especially the latest song that is going around in my head:</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="445" height="364" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/4jXgpRdL2-U&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x234900&amp;color2=0x4e9e00&amp;border=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="445" height="364" src="http://www.youtube.com/v/4jXgpRdL2-U&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x234900&amp;color2=0x4e9e00&amp;border=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://bytesizebio.net/index.php/2009/11/26/thankful-for/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

