The prisoner's dilemma is a fundamental problem in game theory that demonstrates why two people might not cooperate even if it is in both their best interests to do so. It was originally framed by Merrill Flood and Melvin Dresher working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence payoffs and gave it the "prisoner's dilemma" name (Poundstone, 1992).
A classic example of the prisoner's dilemma (PD) is presented as follows:
- Two suspects are arrested by the police. The police have insufficient evidence for a conviction, and, having separated the prisoners, visit each of them to offer the same deal. If one testifies for the prosecution against the other (defects) and the other remains silent (cooperates), the defector goes free and the silent accomplice receives the full 10-year sentence. If both remain silent, both prisoners are sentenced to only six months in jail for a minor charge. If each betrays the other, each receives a five-year sentence. Each prisoner must choose to betray the other or to remain silent. Each one is assured that the other would not know about the betrayal before the end of the investigation. How should the prisoners act?
If we assume that each player cares only about minimizing his or her own time in jail, then the prisoner's dilemma forms a non-zero-sum game in which two players may each either cooperate with or defect from (betray) the other player. In this game, as in most game theory, the only concern of each individual player (prisoner) is maximizing his or her own payoff, without any concern for the other player's payoff. The unique equilibrium for this game is a Pareto-suboptimal solution, that is, rational choice leads the two players to both play defect, even though each player's individual reward would be greater if they both played cooperatively.
In the classic form of this game, cooperating is strictly dominated by defecting, so that the only possible equilibrium for the game is for all players to defect. No matter what the other player does, one player will always gain a greater payoff by playing defect. Since in any situation playing defect is more beneficial than cooperating, all rational players will play defect, all things being equal.
In the iterated prisoner's dilemma, the game is played repeatedly. Thus each player has an opportunity to punish the other player for previous non-cooperative play. If the number of steps is known by both players in advance, economic theory says that the two players should defect again and again, no matter how many times the game is played. However, this analysis fails to predict the behavior of human players in a real iterated prisoners dilemma situation, and it also fails to predict the optimum algorithm when computer programs play in a tournament. Only when the players play an indefinite or random number of times can cooperation be an equilibrium, technically a subgame perfect equilibrium meaning that both players defecting always remains an equilibrium and there are many other equilibrium outcomes. In this case, the incentive to defect can be overcome by the threat of punishment.
In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games, for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it merely difficult or expensive, not necessarily impossible, to coordinate their activities to achieve cooperation.
I found this to be fascinating. Yes, I'm a total geek.
No comments:
Post a Comment