Introduction to Discrete Probability Distributions
Discrete versus Continuous Variables
A discrete variable typically originates from a counting process while a continuous variable usually comes from a measuring process. An easy way to make the distinction between a discrete and a continuous variable is that discrete variables are usually whole numbers with no decimals. Continuous variables on the other hand frequently take the form of decimals. For instance, the number of people which exist within a group is a discrete variable because it’s always a whole number, while a person’s weight would be continuous since it can typically be measured to multiple decimal places.
The Probability Distribution for a Discrete Variable
A probability distribution for a discrete variable is simply a compilation of all the range of possible outcomes and the probability associated with each possible outcome. Since, probability in general, by definition, must sum to 1, the summation of all the possible outcomes must sum to 1. For example, if you’re flipping a coin once, there’s a 1 in 2 chance it will land on heads, and a 1 in 2 chance it will land on tails; 1/2 + 1/2 = 1. In this way, measuring probability is similar to the use of percentages. Percentages are always measured out of 100 while probability is always measured out of 1. This is true for all probability measurements
Expected Value for a Discrete Variable
The expected value for a discrete variable is essentially the same as the population mean. In this way, the expected value is calculated simply by finding the product of each possible outcome and its associated probability and doing a summation at the end.
Standard Deviation and Variance of a Discrete Variable
Standard deviation is basically a measure of how much each data point varies away from the mean; it’s also often described as the spread of the distribution. Quantitatively, the standard deviation is simply the square root of variance. This quantitative definition confirms the fact that variance must always be a positive number since numerically the evaluation of standard deviation would be impossible otherwise. Conversely, the standard deviation can be both positive and negative as each data point can be both above and below the mean value. Standard deviation and variance as concepts are also discussed in an earlier post called Basics of Variability.
Binomial distribution
The binomial distribution is a type of mathematical model. Mathematical models allow us to easily calculate the probability of occurrence of any specific value of the variable of interest. The binomial distribution is used in situations where the discrete variable is the number of occurrences in a sample of n observations.
There are 4 properties of the Binomial Distribution:
- The sample must consist of a fixed number of observations, n.
- Each and every observation can be categorized into one of two mutually exclusive and collectively exhaustive categories.
- The probability of an event of interest, p, is constant across all observations. Therefore the probability of a non-event of interest, 1 – p (sometimes called q) is constant for all observations.
- Observations are all independent. This simply means the probability of occurrence of any observation is not dependent on any other observation.
The Binomial distribution formula:
P (X = x | n, p) = n!/(x! (n – x)!) p^x * (1-p)^(n-x)
And
P(X = x | n,p) = probability that X = x events of interest, where n and p are as follows:
n = number of observations
p = probability of an event of interest (prob.of success)
I – p = q = probability of not having an event of interest (prob. of failure)
x = number of events of interest (no. of successes) in the sample (X = 0,1,2, …, n)
n!/(x! (n-x)!) = The number of combinations of x events of interest out of n observations. This calculation does not take into account the order in which the events actually occur. If the order was important, that would involve calculating a permutation, not a combination. To see a video depicting calculation of combinations, please click here. Another version of this formula which may be easier to read can be found in the Statistics Formula Glossary post.
When conducting calculations for binomial distributions, there are three distinct possibilities that may be encountered.
Example: There are 10 golf balls in a bag, consisting of 6 orange balls and 4 yellow balls. If we define success as the likelihood of picking an orange ball and therefore failure as not picking an orange ball (and therefore picking a yellow ball), we can illustrate the three distinct possibilities that may be encountered in calculations.
If 6 golf balls are to be selected at random (without replacement):
- What is the probability of picking exactly 4 orange balls?
P (X = x | n, p) = n!(x! (n – x)!) p^x * (1-p)^(n-x)
And
P(X = x | n,p) = probability that X = x events of interest, when n and p
N = number of observations = 6
P = probability of an event of interest = 6/10 = 0.6
I – p(q) = prob. of not having an event of interest = 0.4
X = number of events of interest (no. of successes) in the sample (X = 0,1,2, …, n) = 4
P (X = x | n, p) = 6!/(4! (6 – 4)!) 0.6^4 *(1-0.6)^(6-4) = 0.3110
- What is the probability of picking at least 4 orange balls?
This equates to: prob. of 4 orange + prob. of 5 orange + prob. of 6 orange
P (X = x | n, p) = 6!/(4! (6 – 4)!) 0.6^4 * (1-0.6)^(6-4)+ 6!/(5! (6 – 5)!) 0.6^5 * (1-0.6)^(6-5) + 6!/(6! (6 – 6)!) 0.6^6 * (1-0.6)^(6-6)=
= 0.31104 + 0.186624 + 0.046656 = 0.5443
- What is the probability of picking less than 4 orange balls?
This equates to: prob. of 0 orange + prob. of 1 orange + prob. of 2 orange + prob. of 3 orange
= 1 – Prob. of at least 4 orange balls = 1 – 0.5443 = 0.4557
Mean of the Binomial Distribution
The mean, ℳ, of the binomial distribution is the product of the sample size, n, and the probability of an event of interest (success), p.
ℳ = E (X) = np
This is the value that is statistically most likely to occur. For instance, consider the example of tossing two unbiased dice, the range of values that may result extends from 2 to 12. The mean value is actually 7. This is because there are six distinct ways to get a value of 7. They are 1& 6, 6 & 1, 2 & 5, 5 & 2, 3 & 4, and 4 & 3. This represents 6 distinct possibilities out of a total of 36 possibilities, which is the most likely result to occur from all the distinct possibilities.
Standard Deviation of the Binomial Distribution
The standard deviation of the binomial distribution, ?, is the square root of the variance.
? = √(Var (X))= √(np (1-p))
Poisson distribution
The Poisson distribution is another type of mathematical model. The Poisson distribution applies when we want to determine the number of occurrences of a particular event in some fixed interval of time and space. This fixed interval of time and space is often called an area of opportunity. Within the area of opportunity, there can be multiple occurrences of an event.
There are 4 properties of the Poisson Distribution:
- The area of opportunity must be defined by time, length, surface area etc. Per the Poisson distribution, we can determine the number of times a particular event occurs in a given area of opportunity.
- The probability that an event occurs in a given area of opportunity must be the same for all the areas of opportunity.
- The number of events that occur in each and every area of opportunity is independent of the number of events that occur in any area of opportunity
- The probability that two or more events will occur in any area of opportunity approximates to zero as the area of opportunity becomes smaller.
The Poisson distribution formula:
P (X = x | λ) = (e^(-λ)*λ^(x!))/x!
where
P (X = x | λ) = probability that X = x events in an area of opportunity given λ
λ = expected number of events per unit
e = mathematical constant approximated by 2.71828
x = number of events (x = 0,1,2, …, n)
Example: Imagine that the mean number of cars that pass an intersection in a 1-minute interval is 5.0.
- What is the probability that in a given minute, exactly four cars will arrive?
P (X = 4 | λ = 5) = (e^(-5)*(5.0)^(4)!)/4! = 0.1755
- What is the probability that more than four cars will arrive in a given minute?
The probability that more than four cars will arrive:
P (X > 4) = P (X = 4) + P (X = 5) + P (X = 6) + …+
Since all probabilities in a distribution sum to 1:
P (X > 4) = 1 – P(X <= 4) = 1 – [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)]
= 1 – (0.0067 + 0.0337 + 0.0842 + 0.1404 + 0.1755)
= 1 – 0.4405
= 0.5595
Some of the material in this post was obtained from Statistics for Managers: Using Microsoft Excel, Eighth Edition.