Mathematics of Game Design: Probability and Statisticsby Mendel Schmiedekamp
Mathematics of Game Design: Probability and Statisticsby Mendel Schmiedekamp
Mathematics of Game Design: Probability and Statistics
Probability and statistics are two approaches to modeling real systems. Probability is used to provide a prescriptive model, making a simple functional system for the model. Statistics are used to provide descriptive models, taking existing systems and extracting features and properties. In holistic game design, generating setting out of system uses probability to create a prescriptive model for the setting's properties, while generating system out of setting uses statistics to extract properties of the setting for use in generating the system.
Probability is a study of the occurrence of events. In a system, the chance of a given event is determined by a probability function, P(E), which associates a number between 0 and 1 to each event, E, in the system. This number is the chance or certainty of the event occurring.
For example, in the system defined by adding together the results of a 6-sided and a 10-sided die, P(4) = 1/20, while P(7) = 1/10.
Since setting the probability of each event can be very difficult a variety of rules exist to derive probabilities of complex events from simpler ones.
The additive rule applies to parallel events which are combined for the purposes of some larger event. For example, the event of the dice mechanic above rolling a 4 is actually made up of three events (1,3), (2,2), and (3,1), where the first number is the 6-sided die, and the second is the 10-sided one. The additive rule states that the probability of any of a set of distinct events occurring is the sum of their respective probabilities. In this case each of these events has the probability 1/60, so the total probability is 1/20.
The multiplicative rule applies to sequential events which are combined to generate a single resulting event. For example, the event of rolling (1,3) is actually two events in sequence, rolling a 1 on the 6-sided die, and rolling a 3 on the 10-sided die. The multplicative rule states that the probability of the resulting event is the product of the sequential events which result in it. We assume the dice are fair (each side has equal probability) and so get that the probability of rolling a 1 on a 6-sided die is 1/6 and the probability of rolling a 3 on a 10-sided die is 1/10. As a result the probability of rolling (1,3) is 1/60.
Using just these two rules we can determine the probability of each value generated by summing the 6-sided and 10-sided dice.
A third way to combine probabilities is to consider the probability of some event given another event. This is written, P(E|F), the probability of event E given event F. For example, the probability that of rolling a 4 on the 6-sided die if a match has been rolled is 1/6, because there are 6 such equally likely matches and one of them is (4,4).
Independence of events is a very important concept in probability. Two events are independent or only if the probability of both events occurring is equal to the product of the individual probabilities of the events:
Independence Condition: P(E and F) = P(E)P(F)
This condition is also a formula that can be used when two events are added to a model as independent events. Conceptually this means that these two events do not interact or interfere, and so each is determined independently.
For example, in the above mechanic, the events of getting a match and getting a 4 on the 6-sided die are independent. The probability is 1/60 which is equal to the 1/10 probability of rolling a match times the 1/6 probability of rolling 4 on the 6-sided die. However, the events of rolling a match and getting a 4 on the 10-sided die, are not independent. The probability of these two events is also 1/60, and the probability of rolling a match is still 1/10 while the probability of rolling a 4 is also 1/10, but 1/10 * 1/10 does not equal 1/60. Intuitively these are not independent because the value of the 10-sided die does affect the probability of a match, in particular if the 10-sided die rolls above a 6 then no match can occur.
Once independence is considered there are several formulas which can be used to calculate the probability of events:
P(E or F) = P(E) + P(F) - P(E and F)
For example, with the above mechanic the chance of getting a 4 is 1/20 and the chance of getting a match is 1/10, but the chance of either is not 1/10 + 1/20 = 3/20, because each of these probabilities counts the event (2,2). So we subtract the double counted event's probability, 1/60, from the sum and get 3/20 - 1/60 = 2/15.
P(E|F) = P(E and F) / P(F)
For example, with the above mechanic the probability of getting a match and a four on the 6-sided die is 1/60, likewise the probability of getting a match is 1/10. Hence the probability of getting a 4 on the 6-sided die if a match is rolled is 1/60 / 1/10 = 1/6.
The central feature of statistics is a distribution. This is a list of data, which can be thought of as events and the number of instances of each event. The following is a distribution from a session of the game Exalted. These are unmodified dice pool sizes and the number of times a pool of this size was used in the session.
Mean or Average of a Distribution
The most common statistical analysis is the mean of a distribution. This is the sum of the events, weighted by the number of instances divided by the total number instances.. For the above distribution the mean is:
(2/116 * 1) + (7/116 * 2) + (13/116) * 3 + ... + (3/116 * 10) = 6.2067
The mean is often useful to determine the general bias of the distribution, and provides the expected value if you continue to record these events over time.
Median or Middle of a Distribution
A more useful analysis of a distribution is to find the value which lies in the middle of the distribution. This ensures that very high or very low values don't become too important, a significant problem in using the means.
In the above example, the median is 7. This indicates that although there are low values in the distribution the the dice pools tend to center around 7, rather than 6.
Often the median provides a clearer understanding of the distribution, as it doesn't let very rare events, called outliers, contaminate the properties if the distribution as much.
Most Probable Value or Peak
This analysis is very different than both median and mean, since it compares all events equally. Instead it treats the distribution as a graph of a function and looks for the global maximum. This is done by searching for the largest number of instances. In the case of the above data this is 8.
This value can often be misleading, but it serves as one way to evaluate how well the data fits a Gaussian curve (a centrally peaked distribution), since the closeness of the peak to the average is a typically a property of Gaussian distributions. This can be useful because Gaussian distributions are easy to model as random systems.
Local Maxima and MinimaAnother approach is to treat the entire distribution as a curve and finding the
maxima (points where both neighbors have a lower number of instances) and minima (points where both neighbors have a higher number of instances). In the above distribution, if we ignore the endpoints, there are two maxima, 4 and 8 and one minima, 6.
This analysis shows that there are two lobes to the data, one in the area below 6 and one in the area above 6. The first peak is shallower and wider, while the second peak is narrower and taller. This indicates why the mean lies between these two peaks.
Often distributions are not simple random events, but rather have multiple random elements, which cause a complex distribution. The result is that these simpler random distributions combine to form more complex ones.
One way this can occur is if two distributions are added together. The result is that the distribution will be large where either of the components is large. This final distribution is analogous to our example, which could be considered a combination of a shallow distribution around 4, and a narrow distribution around 8. This combination indicates that there are two different elements which determine the size of a die pool rolled in a game of Exalted, perhaps one in character creation, and one during play.
Next Month: Seeds of Enlightenment