Shakespeare’s Birthday and Evolution
William Shakespeare was baptized April 26, 1564. His birthday is traditionally commemorated on April 23 (incidentally, that is also the date of his death, in 1616). One interesting connection between Shakespeare and evolution was made by Richard Dawkins in his book The Blind Watchmaker: I am talking about the Weasel program. Weasel is an elegant illustration of the process that drives evolution by natural selection: random variation in offspring, coupled with non-random selection. The connection to Shakespeare is in the quote from Hamlet: “Methinks it is like a weasel”. Achieving this exact sentence through random processes alone (a monkey hacking away at a typewriter) would take and estimated 1040 generations. But achieving it through random mutations coupled with targeted selection makes for a much shorter number of generations: less than 100, usually.
If you are unfamiliar with the Weasel program, read the Wikipedia entry before reading on.
In honor of The Bard’s birthday, I have penned a short Python program which performs Weasel. You may need to tweak it for Windows (remove the first #! line which is a Linux thing). When running the program, the user selects 3 parameters: the target net result (in the form of a string), the number of offspring per generation, and the mutation rate.
Here is a demo run, with 300 offspring per generation, and a mutation rate of 0.05 per position:
./weasel.py "methinks it is like a weasel" 300 0.05 1 `,<o?[}= g!yq[}?*v=.+ :y<<:g 2 `,<o?[}= gjyq[}?*v=.m :y<<eg 3 `,<o?[}= gjyq[ ?*v=.m :y<<eg 4 `,<o?[}= ijyq[ ?*v=.m :y<<eg 5 `,<o?[k= ijyq[ ?*v=.m :y<<eg 6 `g<o?[k= ijyq[ l*v=.m :y<<eg 7 `g<o?[k= ijyq[ l*ve.m :y<<eg 8 `g<o?[k= ij q[ l*ve.m :y<<eg 9 `v<o?[k= ij qs l*ve.m :p<<eg 10 ev<o?[k= ij qs l*ve.v :paieg 11 ev<o?[k( ij qs l*ke.v :paieg 12 ev<o?[k( it qs l*ke.v :paieg 13 ev<o?[k( it qs l_ke.v wpaieg 14 ev<oi[k( it qs l_ke.v wpaieg 15 ev<oi[k( it qs l_ke.a wpaieg 16 ev<oi[k( it [s l_ke.a wpaiel 17 ev<oi[k( it is l_ke.a wpaiel 18 ev<oi[k( it is l_ke.a wea*el 19 ev<oi[k( it is l_ke.a wea*el 20 mv<oi[k( it is l_ke.a wea*el 21 mv<oi[k( it is llke\a weasel 22 mv<,i[k( it is llke\a weasel 23 mv<hi[k( it is llke\a weasel 24 mvzhi[k( it is like\a weasel 25 mvzhi[k( it is like\a weasel 26 mv(hi[k( it is like\a weasel 27 mv(hi[ks it is like\a weasel 28 mv(hi[ks it is like\a weasel 29 mv(hi[ks it is like\a weasel 30 mvthi[ks it is like&a weasel 31 mvthi[ks it is like&a weasel 32 mvthimks it is like&a weasel 33 mvthimks it is like&a weasel 34 mvthimks it is like&a weasel 35 mvthimks it is like&a weasel 36 mvthimks it is like&a weasel 37 mxthimks it is like&a weasel 38 mxthimks it is like&a weasel 39 mxthimks it is like&a weasel 40 mxthimks it is like&a weasel 41 mxthimks it is like&a weasel 42 mxthimks it is like&a weasel 43 mxthimks it is like a weasel 44 mxthimks it is like a weasel 45 mxthilks it is like a weasel 46 mxthilks it is like a weasel 47 mxthilks it is like a weasel 48 mxthinks it is like a weasel 49 mxthinks it is like a weasel 50 mxthinks it is like a weasel 51 mxthinks it is like a weasel 52 mxthinks it is like a weasel 53 mxthinks it is like a weasel 54 mxthinks it is like a weasel 55 mxthinks it is like a weasel 56 mxthinks it is like a weasel 57 mxthinks it is like a weasel 58 methinks it is like a weasel
And here is the Python code:
#!/usr/bin/python import string import random import sys import copy # Copyright(C) 2011 Iddo Friedberg # Released under Biopython license. http://www.biopython.org/DIST/LICENSE # Do not remove this comment ALLCHARS = string.lowercase+' '+string.punctuation def loopweasel(target_string,n_offspring,mut_rate): i = 1 target_string = target_string.lower() current_string = list(''.join(random.choice(ALLCHARS) for i in range(len(target_string)))) print " %s" % target_string while target_string != ''.join(current_string): print "%4d %s" % (i,''.join(current_string)) i += 1 offsprings = create_offspring(current_string, n_offspring,mut_rate) current_string = evolve_string(offsprings, target_string) print "%4d %s" % (i,''.join(current_string)) def create_offspring(current_string,n_offspring,mut_rate): offspring_list = [] for i in (range(n_offspring)): offspring = [] for c in current_string: if random.random() < mut_rate: offspring.append(random.choice(ALLCHARS)) else: offspring.append(c) offspring_list.append(offspring) return offspring_list def diffseq(a,b): diffcount = 0 for i,j in zip(a,b): if i != j: diffcount += 1 return diffcount def evolve_string(offspring_list, target_string): best_match = (2000,'') for offspring in offspring_list: diffscore = diffseq(offspring, target_string) if diffscore < best_match[0]: best_match = (diffscore,offspring) return best_match[1] if __name__ == '__main__': if len(sys.argv) < 4: print "Usage: weasel target_string n_offspring mutation_rate" print print "target_string: the string you would eventually evolve into" print "n_offspring: number of offspring per generation" print "mutation_rate rate of mutation per position, 0=< m <1" sys.exit(1) target_string = sys.argv[1] n_offspring = int(sys.argv[2]) mut_rate = float(sys.argv[3]) for i in target_string: if i not in ALLCHARS: print "Error, string can only contain %s" % ALLCHARS sys.exit(1) if mut_rate >= 1.0 or mut_rate < 0: print "Error: 0 =< mutation rate < 1" sys.exit(1) loopweasel(target_string,n_offspring, mut_rate)
A quick guide: “loopweasel” (line 11) is the main bit that loops through generations. In each loop iteration it first calls “create_offspring” (line 25) which creates a list with the “offspring” strings (300 in the above example). Mutations are inserted (or not) in line 30. Control returns to loopweasel, which calls evolve_string (line 44). All offspring strings are score to the target (“methinks it is like a weasel”) string, using the “diffseq” code (line 37). (Is there a built-in way in Python for doing that? I could not find any.) The best scoring match is not he chosen offspring. It is printed, and the loop is repeated. Repeat until best-fitted offspring matches the target string.
It is actually quite fun to play with different mutation rates and offspring / generation numbers. Try it. Then rent a good Shakespeare movie. As a horror fan, I’m going to watch Titus tonight.
Wow, so I’m pretty glad I found out about the Weasel program. I thought it was an amazing concept. So amazing, in fact, that I wrote my own in Java. Thought I might as well share it here: http://write.fm/weasel
In the current version, instead of having a constant mutation probability per character, it mutates a pre-determined amount of characters per generation. I should probably change that to make it more biology-y.
Also, it has an adaptive character set. That means it only uses the characters present in the destination string. So it won’t waste time with punctuation or uppercase letters if they aren’t used in the destination string.
What is the fitness function, i.e. the reward for a correct letter?
Are the correct letters locked in once found?
@Bjørn
The fitness function is in lines 37-42. It is the Hamming distance between “methinks it is like a weasel” and any of the current offspring character strings.
The correct letters are not locked in. In fact, if you increase the mutation rate enough, you will find you go through many more generations (or until you give up and hit ctrl-C) to get to “methinks it is like a weasel”.
Okay, got it.
In other words, you choose the one most fit individual each generation to have 300 offspring, so there is no drift.
If you use Wright-Fisher selection you’ll have drift due to the stochastic selection, which will make the simulation more realistic, but will take much longer to find the target. How large population sizes can you handle?
“In other words, you choose the one most fit individual each generation to have 300 offspring, so there is no drift.”
Well, not me. The weasel program was written by Dawkins, this is only a Python implementation.
Also, the Weasel program was never intended to model evolution accurately. “[It] was only meant to demonstrate the power of cumulative selection as compared to random selection, and show the complete unrealism of the popular notion of natural selection as ‘monkeys pounding on typewriters’. ” (Wikipedia
That said, it would be interesting to jack up this program and add some genetic drift model. I might do it some time, or are there any takers?
Actually, I made one a couple of years ago that can use either of these two methods of selection.
With the same params as you (N=300, mu=0.05, and 100 replicates), I get:
Winner-takes-all: mean updates to target = 39.98, stdev = 6.5056
Wright-Fisher does not converge with those parameters. The mutation rate is too high, so the population suffers from mutational meltdown (hovers around 17 correct out of 28 chars).
But with mu = 0.01 I get this:
Winner-takes-all: mean updates to target = 58.26, stdev = 14.833
Wright-Fisher: mean = 705.73, stdev = 141.99
Fun!
Update: Actually, I just re-read my code, and it turns out I used a fitness function that was 1.2^hits, where hits is the number of correct chars. That function goes up faster than linear, which makes the individuals fitter than just the number of hits when hits>15.
If I use fitness = hits, winner-takes-all, N=300, mu=0.05, I get mean = 41.95, std = 8.807. Not much different.
There is a one-line Hamming distance function in Python:
a=”string of 12″
b=”-fling of –”
print sum([i!=j for i,j in zip(a,b)])
This will work with either strings or lists—anything zippable.
@gasstationwithoutpumps
Ha! Nice one-liner. Thanks Kevin!
@gasstationwithoutpumps,
You don’t need a list. sum will work with a generator. You could also handle sequences of different length:
>>> a = “string of 12”
>>> b = “-fling of –”
>>> c = “-fling of –0123456789”
>>> print(abs(len(a) – len(b)) + sum(i != j for i, j in zip(a, b)))
5
>>> print(abs(len(a) – len(c)) + sum(i != j for i, j in zip(a, c)))
15