Shakespeare’s Birthday and Evolution

William Shakespeare was baptized April 26, 1564. His birthday is traditionally commemorated on April 23 (incidentally, that is also the date of his death, in 1616). One interesting connection between Shakespeare and evolution was made by Richard Dawkins in his book The Blind Watchmaker: I am talking about the Weasel program. Weasel is an elegant illustration of the process that drives evolution by natural selection: random variation in offspring, coupled with non-random selection. The connection to Shakespeare is in the quote from Hamlet: “Methinks it is like a weasel”. Achieving this exact sentence through random processes alone (a monkey hacking away at a typewriter)  would take and estimated 1040 generations. But achieving it through random mutations coupled with targeted selection makes for a much shorter number of generations: less than 100, usually.

If you are unfamiliar with the Weasel program, read the Wikipedia entry before reading on.

In honor of The Bard’s birthday, I have penned a short Python program which performs Weasel. You may need to tweak it for Windows (remove the first #! line which is a Linux thing). When running the program, the user selects 3 parameters: the target net result (in the form of a string), the number of offspring per generation, and the mutation rate.

Here is a demo run, with 300 offspring per generation, and a mutation rate of 0.05 per position:

 

   ./weasel.py "methinks it is like a weasel" 300 0.05
   1 `,<o?[}= g!yq[}?*v=.+ :y<<:g
   2 `,<o?[}= gjyq[}?*v=.m :y<<eg
   3 `,<o?[}= gjyq[ ?*v=.m :y<<eg
   4 `,<o?[}= ijyq[ ?*v=.m :y<<eg
   5 `,<o?[k= ijyq[ ?*v=.m :y<<eg
   6 `g<o?[k= ijyq[ l*v=.m :y<<eg
   7 `g<o?[k= ijyq[ l*ve.m :y<<eg
   8 `g<o?[k= ij q[ l*ve.m :y<<eg
   9 `v<o?[k= ij qs l*ve.m :p<<eg
  10 ev<o?[k= ij qs l*ve.v :paieg
  11 ev<o?[k( ij qs l*ke.v :paieg
  12 ev<o?[k( it qs l*ke.v :paieg
  13 ev<o?[k( it qs l_ke.v wpaieg
  14 ev<oi[k( it qs l_ke.v wpaieg
  15 ev<oi[k( it qs l_ke.a wpaieg
  16 ev<oi[k( it [s l_ke.a wpaiel
  17 ev<oi[k( it is l_ke.a wpaiel
  18 ev<oi[k( it is l_ke.a wea*el
  19 ev<oi[k( it is l_ke.a wea*el
  20 mv<oi[k( it is l_ke.a wea*el
  21 mv<oi[k( it is llke\a weasel
  22 mv<,i[k( it is llke\a weasel
  23 mv<hi[k( it is llke\a weasel
  24 mvzhi[k( it is like\a weasel
  25 mvzhi[k( it is like\a weasel
  26 mv(hi[k( it is like\a weasel
  27 mv(hi[ks it is like\a weasel
  28 mv(hi[ks it is like\a weasel
  29 mv(hi[ks it is like\a weasel
  30 mvthi[ks it is like&a weasel
  31 mvthi[ks it is like&a weasel
  32 mvthimks it is like&a weasel
  33 mvthimks it is like&a weasel
  34 mvthimks it is like&a weasel
  35 mvthimks it is like&a weasel
  36 mvthimks it is like&a weasel
  37 mxthimks it is like&a weasel
  38 mxthimks it is like&a weasel
  39 mxthimks it is like&a weasel
  40 mxthimks it is like&a weasel
  41 mxthimks it is like&a weasel
  42 mxthimks it is like&a weasel
  43 mxthimks it is like a weasel
  44 mxthimks it is like a weasel
  45 mxthilks it is like a weasel
  46 mxthilks it is like a weasel
  47 mxthilks it is like a weasel
  48 mxthinks it is like a weasel
  49 mxthinks it is like a weasel
  50 mxthinks it is like a weasel
  51 mxthinks it is like a weasel
  52 mxthinks it is like a weasel
  53 mxthinks it is like a weasel
  54 mxthinks it is like a weasel
  55 mxthinks it is like a weasel
  56 mxthinks it is like a weasel
  57 mxthinks it is like a weasel
  58 methinks it is like a weasel

And here is the Python code:

#!/usr/bin/python
import string
import random
import sys
import copy
# Copyright(C) 2011 Iddo Friedberg
# Released under Biopython license. http://www.biopython.org/DIST/LICENSE
# Do not remove this comment

ALLCHARS = string.lowercase+' '+string.punctuation
def loopweasel(target_string,n_offspring,mut_rate):
    i = 1
    target_string = target_string.lower()
    current_string = list(''.join(random.choice(ALLCHARS)
                        for i in range(len(target_string))))
    print "    %s" % target_string
    while target_string != ''.join(current_string):
        print "%4d %s" % (i,''.join(current_string))
        i += 1
        offsprings = create_offspring(current_string,
                                    n_offspring,mut_rate)
        current_string = evolve_string(offsprings, target_string)
    print "%4d %s" % (i,''.join(current_string))

def create_offspring(current_string,n_offspring,mut_rate):
    offspring_list = []
    for i in (range(n_offspring)):
        offspring = []
        for c in current_string:
            if random.random() < mut_rate:
                offspring.append(random.choice(ALLCHARS))
            else:
                offspring.append(c)
        offspring_list.append(offspring)
    return offspring_list

def diffseq(a,b):
    diffcount = 0
    for i,j in zip(a,b):
        if i != j:
            diffcount += 1
    return diffcount

def evolve_string(offspring_list, target_string):
    best_match = (2000,'')
    for offspring in offspring_list:
        diffscore = diffseq(offspring, target_string)
        if diffscore < best_match[0]:
            best_match = (diffscore,offspring)
    return best_match[1]

if __name__ == '__main__':
    if len(sys.argv) < 4:
        print "Usage: weasel target_string n_offspring mutation_rate"
        print
        print "target_string: the string you would eventually evolve into"
        print "n_offspring: number of offspring per generation"
        print "mutation_rate rate of mutation per position, 0=< m <1"
        sys.exit(1)
    target_string = sys.argv[1]
    n_offspring = int(sys.argv[2])
    mut_rate = float(sys.argv[3])
    for i in target_string:
        if i not in ALLCHARS:
            print "Error, string can only contain %s" % ALLCHARS
            sys.exit(1)
    if mut_rate >= 1.0 or mut_rate < 0:
        print "Error: 0 =< mutation rate < 1"
        sys.exit(1)
    loopweasel(target_string,n_offspring, mut_rate)

A quick guide: “loopweasel” (line 11) is the main bit that loops through generations. In each loop iteration it first calls “create_offspring” (line 25) which creates a list with the “offspring” strings (300 in the above example). Mutations are inserted (or not) in line 30. Control returns to loopweasel, which calls evolve_string (line 44). All offspring strings are score to the target (“methinks it is like a weasel”) string, using the “diffseq” code (line 37). (Is there a built-in way in Python for doing that? I could not find any.) The best scoring match is not he chosen offspring. It is printed, and the loop is repeated. Repeat until best-fitted offspring matches the target string.

It is actually quite fun to play with different mutation rates and offspring / generation numbers. Try it. Then rent a good Shakespeare movie. As a horror fan, I’m going to watch Titus tonight.

Titus Andronicus. Source: Wikipedia

Share and Enjoy:
  • Fark
  • Digg
  • Technorati
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Reddit
  • Twitter
  • FriendFeed
  • PDF
  • email
  • Print
  • Google Bookmarks

10 Responses to “Shakespeare’s Birthday and Evolution”

  1. Nick says:

    Wow, so I’m pretty glad I found out about the Weasel program. I thought it was an amazing concept. So amazing, in fact, that I wrote my own in Java. Thought I might as well share it here: http://write.fm/weasel

    In the current version, instead of having a constant mutation probability per character, it mutates a pre-determined amount of characters per generation. I should probably change that to make it more biology-y.

    Also, it has an adaptive character set. That means it only uses the characters present in the destination string. So it won’t waste time with punctuation or uppercase letters if they aren’t used in the destination string.

  2. What is the fitness function, i.e. the reward for a correct letter?
    Are the correct letters locked in once found?

  3. Iddo says:

    @Bjørn

    The fitness function is in lines 37-42. It is the Hamming distance between “methinks it is like a weasel” and any of the current offspring character strings.

    The correct letters are not locked in. In fact, if you increase the mutation rate enough, you will find you go through many more generations (or until you give up and hit ctrl-C) to get to “methinks it is like a weasel”.

  4. Okay, got it.

    In other words, you choose the one most fit individual each generation to have 300 offspring, so there is no drift.

    If you use Wright-Fisher selection you’ll have drift due to the stochastic selection, which will make the simulation more realistic, but will take much longer to find the target. How large population sizes can you handle?

  5. Iddo says:

    “In other words, you choose the one most fit individual each generation to have 300 offspring, so there is no drift.”

    Well, not me. The weasel program was written by Dawkins, this is only a Python implementation.

    Also, the Weasel program was never intended to model evolution accurately. “[It] was only meant to demonstrate the power of cumulative selection as compared to random selection, and show the complete unrealism of the popular notion of natural selection as ‘monkeys pounding on typewriters’. ” (Wikipedia

    That said, it would be interesting to jack up this program and add some genetic drift model. I might do it some time, or are there any takers?

  6. Actually, I made one a couple of years ago that can use either of these two methods of selection.

    With the same params as you (N=300, mu=0.05, and 100 replicates), I get:
    Winner-takes-all: mean updates to target = 39.98, stdev = 6.5056
    Wright-Fisher does not converge with those parameters. The mutation rate is too high, so the population suffers from mutational meltdown (hovers around 17 correct out of 28 chars).

    But with mu = 0.01 I get this:
    Winner-takes-all: mean updates to target = 58.26, stdev = 14.833
    Wright-Fisher: mean = 705.73, stdev = 141.99

    Fun!

  7. Update: Actually, I just re-read my code, and it turns out I used a fitness function that was 1.2^hits, where hits is the number of correct chars. That function goes up faster than linear, which makes the individuals fitter than just the number of hits when hits>15.

    If I use fitness = hits, winner-takes-all, N=300, mu=0.05, I get mean = 41.95, std = 8.807. Not much different.

  8. There is a one-line Hamming distance function in Python:
    a=”string of 12″
    b=”-fling of –”
    print sum([i!=j for i,j in zip(a,b)])

    This will work with either strings or lists—anything zippable.

  9. Iddo says:

    @gasstationwithoutpumps
    Ha! Nice one-liner. Thanks Kevin!

  10. eryksun says:

    @gasstationwithoutpumps,

    You don’t need a list. sum will work with a generator. You could also handle sequences of different length:

    >>> a = “string of 12″
    >>> b = “-fling of –”
    >>> c = “-fling of –0123456789″
    >>> print(abs(len(a) – len(b)) + sum(i != j for i, j in zip(a, b)))
    5
    >>> print(abs(len(a) – len(c)) + sum(i != j for i, j in zip(a, c)))
    15