Monday, June 6, 2011

A Better Prisoner's Dilemma Simulation?

In my previous post I had started work on a python program named generations to try to replicate the simulations of Robert Axelrod in comparing differing Prisoner's Dilemma strategies. At the time I was simulating single organisms ('Critters') with different strategies accumulating food over a number of iterations with the critter with highest food at the end of the iterations judged the winner.

The result of this simple simulation was that two strategies - Grudger and Tit-for-tat - were continually vying for the top spot, with Grudger taking the mantle more often than not. In Axelrod's tournament Tit-for-Tat was the unambiguous winner, so I knew my generations script needed some work.

(Read my previous post for an introduction to Prisoner's Dilemma, why it interests me and what the various strategies were.)

I've been developing the simulation since. Technically it is a bit of a mess, but when I next refactor I will be focusing on new parts of the model aside from PD and so I wanted to deliver this conclusion to Prisoner's Dilemma now.

In this post I want to talk about how I changed the simulation and what effect these changes had. For the impatient - yes I have managed to tweak the simulation such that Tit-for-tat is the unambigously best strategy. Many critters had to die to make it so...

 Go forth and multiply

In beginning my simulation only ran the same 5 organisms against each other, with each accumulating food indefinitely but at different rates as their strategies proved more or less successful. I specifically wanted to uncover how differing frequencies of strategies influenced the success of themselves or others, so I had to break free of this fixed model.

To this end I introduced reproduction into the simulation, and turned the simulation into one focused on measuring accumulated food to measuring reproductive success. My process for controlling reproduction was simple: When a critter accumulated enough food to pay an arbitrary cost of reproduction, another critter of the same strategy was created, and the cost deducted from the parent.

This change meant that any critter would eventually reproduce - and reproduce multiple times - if it was even marginally successful. More successful strategies would accumulate food faster and thus reproduce faster. Since the offspring could eventually reproduce, this change let to an exploding growth in the populations being tracked.

At the end of the simulation I could now see the success of the strategies: A count of the number of individuals with each strategy and their combined amount of food (which would become an increasingly irrelevant metric).

Adding reproduction and the resulting exploding population immediately pointed out the computational unfeasibility of running the simulation over a large number of iterations in its current form. Each additional critter had to interact with every other which form an exponentially growing computational task. Iteration after iteration, the simulation slowed down.

The results at this stage were unsatisfactory. Over the iteration counts I was prepared to run the simulation for, huge variations were seen from execution to execution. In some executions the Grudgers became the most prolific. In others it was Tit-for-tat. The random, cheater and sucker strategies were competitive on some runs, but not in others.

The source of this variance in outcome was the fortunes of the random strategy and those it interacted with. At this stage the random strategy produced the only non-deterministic part of the simulation. While growth was unrestricted the number of random interactions (interactions involving the Random strategy) was steadily increasing, even as the proportion of those interactions decreased as a part of the total number of interactions.

Everyone Eats... Everybody Dies

To put the breaks on the explosive population growth I next added the concept of death to the simulation. A Critter would die if it ever had an amount of food lower than 0. In theory a Critter could be 'suckered' through a string of interactions into this position, but with the success of the more altruistic strategies, this wasn't very likely. To make success in interactions essential, I added an attritional element to the simulation - That Critters, regardless of their success in interactions - expend a fixed amount of food each iteration.

These additions did not drastically change the outcomes of the simulation. Population growth was slowed, but it was not halted. Within a thousand iterations there were a thousand critters and every iteration more were added than were added the iteration prior. Reaching 1500 iterations took several minutes under these computation conditions.

The success of the random and cheater strategies declined noticeably, however. In the case of the random strategy, statistically assured runs of bad luck thinned their numbers. The cheater strategy could no longer afford to wait between successful cheats, and against the resurgent grudger and tit-for-tat strategies, success for a cheater most often came when encountering the offspring of these two strategies that had never met this individual cheater before.

A new form of Random

With the decline of the Random strategy in the population I wondered if a tweak to its strategy that made cooperation more likely than defection would prosper, or at least be reliably more successful than my initial equal probability implementation. In a simulated world dominated by altruists that held a grudge, defecting once ever two interactions was not performing well.

I changed the random strategy to be more initially flexible and allow the relative weighting of likelihood for cooperation, whilst breeding this weighting true in successive generations. I then added an additional random strategy to the simulation's initial state that was 3 times more likely to cooperate than defect, This strategy was known as Random 3.

To my surprise, the new cooperative random strategy faired no better in the simulation than the completely random strategy. I am still assessing why this might be the case. Grudgers would never forgive a defecting random, but a random strategy less likely to defect would be less likely to continually arouse the punitive defections from the tit-for-tat strategy, which should have been a noticeable advantage.

Limited Interactions

Until this point every iteration provided an opportunity for every critter to interact with every other critter. Universal interaction presented some problems. Apart from contributing to the exponential complexity of the computation, the sheer predictability of each iteration's interactions- varying only from iteration to iteration based on the addition of offspring- meant that altruistic strategies, once entrenched, gained an inevitable advantage that shielded every altruistic critter from risk. For example, if altruistic critters (ie critters with Tit-for-tat or Grudger strategies) made up 75% of the overall population, 75% of all an individual altruistic critter's interactions were certain to end in cooperation, making the consequences of the remaining interactions immaterial to the critter's eventual reproductive success.

Furthermore, as a simulation of non-zero sum social interactions, this universal interaction on every iteration was not realistic. No analagous organism has the capacity to continuously expand its capacity for interaction to all members of a population as the same population undergoes explosive growth.

Because of these issues I implemented a change to the available interactions for each critter each iteration. 5 iterations are chosen at random from amongst the possible combinations. Over multiple iterations and across the entire population, a given strategy will interact with a representative amount of each strategy in the population, but on any given iteration an altruistic critter might interact with entirely non-altruistic strategies. The effect of this change was not noticeable apart from an improvement in speed of the simulation.

The Constrained Environment

At this stage the population the Critter population within my simulation was unendingly growing, albeit at a slower rate, as countless Critters interacted and generated food out of... well nothing. In any real world ecosystem, however, the environment places a constraint on the population size that lives off of it. Apart from the need to achieve a minimum of success in the interaction stakes, my critters had an easy life of interacting and reproducing, and their success, especially for altruistic critters in an altruistically dominated world, did not come at the expense of any other critter.

I wanted to place a population cap on my Critters. This population cap would ideally not sacrifice the successful Critters from current and future generations in order to be maintained. I wanted the population cap to exert a selection pressure on the population that was analagous to the selection pressures in the real world.

To this end I decided not to directly enforce a population cap, but rather, enforce a hard limit to the amount of food available each iteration. With this limit in place, once interactions within an iteration had exhausted the environmental supply of food, no further interactions (and no additions or subtractions of food) were possible. I had no basis to prefer any particular strategy to always appear at the head of the line however, so I was careful to again randomise the interactions which were being considered.

Implemented this constrained environment had a dramatic effect. Firstly, the endless growth was over. For a particular environmental food limit there would be a set population that never varied more than 10 or 20 Critters either side. Suddenly running simulations across much larger numbers of iterations was possible. The effect on the outcome was startling as well: In simulations of 10,000 iterations, cheater and random strategies died out completely early on. The sucker strategy performed well for a time before dwindling and dying out. As before, both guardedly altruistic strategies grudger and tit-for-tat performed well out of the gate, but over the time period the grudger strategy fell by the wayside and leaving tit-for-tat to take the field - just like in the 1980's Axelrod Tournament I was trying to replicate.

Enabling Comebacks

That might have been the end of the story, but I wanted to ensure strategies which were extinct early in the simulation didn't fair better in the late game. For example, would a cheater strategy have more success against a field dominated only by Tit-for-tat critters as opposed to the tit-for-tat/grudger strategy duo seen in the early game? Would a Cooperatively slanted Random strategy do better without the Cheater strategy? Could any strategy upset an entrench tit-for-tat population?

Apart from answering these questions, the question of realism also begged the question. The history of evolution shows that once a trick is learnt, many organisms can retreat to old strategies and ways of life across evolutionary timespans when the environment changes.

I could have simply monitored the environment for strategies that had completely died off and reinjected candidate critters from those strategies at some future time. This seemed overly artificial to me however, so I instead engineered a constant (and yet small - 1%) chance of mutation in any Critter's offspring. When this mutation happened, a Critter with any strategy could emerge. This ensured that no matter which strategy or strategies were in pre-eminence, a 'background noise' of alternative strategies were always waiting in the background to take advantage of a beneficial change in the environment.  (An environment is beneficial or not is solely dependent - in this simulation at least - on which other strategies are present in the environment).

There was no amazing change in outcome with this facility for mutation in place. The 'loser strategies' (the randoms, cheaters and later, suckers and grudgers) would invariable go extinct, reappear in small numbers, perhaps become extinct again before re-appearing.

Try it yourself

If I have been vague on providing specific numbers in this description (such as the amount of food consumed per iteration, the exact payoffs for cheating, defection and cooperation, etc) it has been because I have been acutely aware of how arbitrary some of my costs and allowances have been. Gross changes these arbitrary numbers will change the economics of strategy success.

The way these economic constants affect the simulations is interesting and worthy of experimentation.

Question: If the payoff for cheating is doubled, how what effect does this have on the population?

Answer: Cheaters lead early before the inevitable population crash. Grudgers rise highest out of the critter carnage.

Question: Does an environment with a higher food-cap, supporting a higher population, have more room for alternative strategies on the fringes?

Answer: Only slightly

If you want to find out for yourself, download my generations python script (use tag v1.5-PD for this version - as I mentioned in my previous blog generations will not track solely the Prisoner's Dilemma scenario).

The script should be easy to use - run the script without arguments to use default values

  • The -i or --iterations option lets you specify the number of iterations run
  • The -t or --track option lets you specify an arbitrary critter to track (the initial critter names are c1, s1, r1, r2, t1 and g1 after the strategies. Their offspring are c1_1, c1_2 and c1_1_1, for example, named after their lineage and their birth order from their parent). 
  • The -r or --reporting options lets you specify the reporting interval
  • The -f or --filename option lets you specify a filepath to save a tab-separated datafile to for analysis in spreadsheet and charting programs.
  • Use -h or --help to get usage information.
For example:

Mimas:src bendavies$ ./ -i 10000 -f cheatpayoff.tsv -r 50

Runs the simulation for 10000 iterations, reports every 50 iterations and outputs the population data to cheatpayoff.tsv

I wouldn't pay too much attention to the technical implementation, since the code is somewhat between plugin, event and direct manipulation architectures at the moment.

What is next?

This exercise was interesting to me since it allowed me to observe emergent population trends based on individual strategies. More-over, I am keen to leave the confines of the prisoner's dilemma simulation and try simulation of other behavioural strategies under evolutionary processes. Some answers I wouldn't mind pursuing would be:

  • Can I evolve cooperative strategies from scratch against a selfish population?
  • What factors influence parental investment?
  • How might embryological complexity weigh in reproduction strategies?
  • Can I define a critter genome that is flexible enough to 'mutate slightly', light enough to be computationally feasible and yet expressive enough to allow interesting emergent behaviours and qualities to emerge?
  • How would critters move, if given the choice, in an environment whose food resources did not reset every iteration, but rather depleted. How would fragmentation of population affect interaction strategies?
  • Would it be possible to evolve critters from a common base to two or more very different lifestyles?
All of this will take time, and I need to take a break to do some work on My Web Brain and he3-appengine-lib. Still, I'm tempted to simply dive right in....

If you have any comments on this above exercise please let me know. This has been an absurdly long blog post, but I wanted to share my experience and take the opportunity to examine how this experiment played out.

No comments:

Post a Comment