{"id":146,"date":"2019-07-01T18:50:35","date_gmt":"2019-07-01T18:50:35","guid":{"rendered":"http:\/\/blogs.ubalt.edu\/jboettinger\/?p=146"},"modified":"2020-02-19T20:46:44","modified_gmt":"2020-02-20T01:46:44","slug":"statistics-discrete-probability-distributions","status":"publish","type":"post","link":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/2019\/07\/01\/statistics-discrete-probability-distributions\/","title":{"rendered":"Statistics: Discrete Probability Distributions"},"content":{"rendered":"<h1><b>Introduction to Discrete Probability Distributions<\/b><\/h1>\n<h2><b>Discrete versus Continuous Variables<\/b><\/h2>\n<p><span style=\"font-weight: 400\">A <strong>discrete variable<\/strong> typically originates from a counting process while a <strong>continuous variable<\/strong> usually comes from a measuring process. An easy way to make the distinction between a discrete and a continuous variable is that discrete variables are usually whole numbers with no decimals. Continuous variables on the other hand frequently take the form of decimals. For instance, the number of people which exist within a group is a discrete variable because it\u2019s always a whole number, while a person\u2019s weight would be continuous since it can typically be measured to multiple decimal places.<\/span><\/p>\n<h2><b>The Probability Distribution for a Discrete Variable<\/b><\/h2>\n<p><span style=\"font-weight: 400\">A <strong>probability distribution<\/strong> for a discrete variable is simply a compilation of all the range of possible outcomes and the probability associated with each possible outcome. Since, probability in general, by definition, must sum to 1, the summation of all the possible outcomes must sum to 1. For example, if you&#8217;re flipping a coin once, there&#8217;s a 1 in 2 chance it will land on heads, and a 1 in 2 chance it will land on tails; 1\/2 + 1\/2 = 1. In this way, measuring probability is similar to the use of percentages. Percentages are always measured out of 100 while probability is always measured out of 1. This is true for all probability measurements<!--more--><\/span><\/p>\n<h2><b>Expected Value for a Discrete Variable<\/b><\/h2>\n<p><span style=\"font-weight: 400\">The expected value for a discrete variable is essentially the same as the population mean. In this way, the expected value is calculated simply by finding the product of each possible outcome and its associated probability and doing a summation at the end.<\/span><\/p>\n<h2><b>Standard Deviation and Variance of a Discrete Variable<\/b><\/h2>\n<p><span style=\"font-weight: 400\"><strong>Standard deviation<\/strong> is basically a measure of how much each data point varies away from the mean; it&#8217;s also often described as the spread of the distribution. Quantitatively, the standard deviation is simply the square root of variance. This quantitative definition confirms the fact that variance must always be a positive number since numerically the evaluation of standard deviation would be impossible otherwise. Conversely, the standard deviation can be both positive and negative as each data point can be both above and below the mean value. Standard deviation and variance as concepts are also discussed in an earlier post called Basics of Variability.\u00a0<\/span><\/p>\n<h2><b>Binomial distribution<\/b><\/h2>\n<p><span style=\"font-weight: 400\">The<strong> binomial distribution<\/strong> is a type of mathematical model. Mathematical models allow us to easily calculate the probability of occurrence of any specific value of the variable of interest. The binomial distribution is used in situations where the discrete variable is the number of occurrences in a sample of n observations.<\/span><\/p>\n<p><span style=\"font-weight: 400\">There are 4 properties of the Binomial Distribution:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The sample must consist of a fixed number of observations, n.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Each and every observation can be categorized into one of two mutually exclusive and collectively exhaustive categories.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The probability of an event of interest, p, is constant across all observations. Therefore the probability of a non-event of interest, 1 &#8211; p (sometimes called q) is constant for all observations.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Observations are all independent. This simply means the probability of occurrence of any observation is not dependent on any other observation.<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400\">The Binomial distribution formula:<\/span><\/p>\n<p><span style=\"font-weight: 400\">P (X = x | n, p) = <\/span><span style=\"font-weight: 400\">n!\/(<\/span><span style=\"font-weight: 400\">x! (n &#8211; x)!)<\/span> <span style=\"font-weight: 400\">p^x *\u00a0<\/span><span style=\"font-weight: 400\">(1-p<\/span><span style=\"font-weight: 400\">)^(<\/span><span style=\"font-weight: 400\">n-x)\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">And\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">P(X = x | n,p) = probability that X = x events of interest, where n and p are as follows:<\/span><\/p>\n<p><span style=\"font-weight: 400\">n = number of observations<\/span><\/p>\n<p><span style=\"font-weight: 400\">p = probability of an event of interest (prob.of success)<\/span><\/p>\n<p><span style=\"font-weight: 400\">I &#8211; p = q = probability of not having an event of interest (prob. of failure)<\/span><\/p>\n<p><span style=\"font-weight: 400\">x = number of events of interest (no. of successes) in the sample (X = 0,1,2, \u2026, n)<\/span><\/p>\n<p><span style=\"font-weight: 400\">n!\/(<\/span><span style=\"font-weight: 400\">x! (n-x)!)<\/span> <span style=\"font-weight: 400\">= The number of combinations of x events of interest out of n observations. This calculation does not take into account the order in which the events actually occur. If the order was important, that would involve calculating a permutation, not a combination. To see a video depicting calculation of combinations, please click <\/span><a href=\"https:\/\/www.youtube.com\/watch?v=WWv0RUxDfbs\"><span style=\"font-weight: 400\">here<\/span><\/a><span style=\"font-weight: 400\">. Another version of this formula which may be easier to read can be found in the <a href=\"http:\/\/blogs.ubalt.edu\/mathsupportcenter\/2019\/06\/30\/statistics-formula-glossary\/\">Statistics Formula Glossary<\/a> post.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">When conducting calculations for binomial distributions, there are three distinct possibilities that may be encountered.\u00a0<\/span><\/p>\n<p><em><span style=\"font-weight: 400\">Example: There are 10 golf balls in a bag, consisting of 6 orange balls and 4 yellow balls. If we define success as the likelihood of picking an orange ball and therefore failure as not picking an orange ball (and therefore picking a yellow ball), we can illustrate the three distinct possibilities that may be encountered in calculations.<\/span><\/em><\/p>\n<p><em>If 6 golf balls are to be selected at random (without replacement):<\/em><\/p>\n<ul>\n<li><span style=\"text-decoration: underline\"><em><span style=\"font-weight: 400\">What is the probability of picking exactly 4 orange balls?<\/span><\/em><\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">P (X = x | n, p) = <\/span><span style=\"font-weight: 400\">n!(<\/span><span style=\"font-weight: 400\">x! (n &#8211; x)!)<\/span> <span style=\"font-weight: 400\">p^<\/span><span style=\"font-weight: 400\">x *\u00a0<\/span><span style=\"font-weight: 400\">(1-p<\/span><span style=\"font-weight: 400\">)^(<\/span><span style=\"font-weight: 400\">n-x)<\/span><\/p>\n<p><span style=\"font-weight: 400\">And\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">P(X = x | n,p) = probability that X = x events of interest, when n and p<\/span><\/p>\n<p><span style=\"font-weight: 400\">N = number of observations = 6<\/span><\/p>\n<p><span style=\"font-weight: 400\">P = probability of an event of interest = 6\/10 = 0.6<\/span><\/p>\n<p><span style=\"font-weight: 400\">I &#8211; p(q) = prob. of not having an event of interest = 0.4\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">X = number of events of interest (no. of successes) in the sample (X = 0,1,2, \u2026, n) = 4<\/span><\/p>\n<p><span style=\"font-weight: 400\">P (X = x | n, p) = <\/span><span style=\"font-weight: 400\">6!\/(<\/span><span style=\"font-weight: 400\">4! (6 &#8211; 4)!)<\/span> <span style=\"font-weight: 400\">0.6^<\/span><span style=\"font-weight: 400\">4 *<\/span><span style=\"font-weight: 400\">(1-0.6<\/span><span style=\"font-weight: 400\">)^(<\/span><span style=\"font-weight: 400\">6-4)<\/span> <span style=\"font-weight: 400\">= <\/span><b>0.3110<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"text-decoration: underline\"><em><span style=\"font-weight: 400\">What is the probability of picking at least 4 orange balls?<\/span><\/em><\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">This equates to: prob. of 4 orange + prob. of 5 orange + prob. of 6 orange<\/span><\/p>\n<p><span style=\"font-weight: 400\">P (X = x | n, p) = <\/span><span style=\"font-weight: 400\">6!\/(<\/span><span style=\"font-weight: 400\">4! (6 &#8211; 4)!)<\/span> <span style=\"font-weight: 400\">0.6^<\/span><span style=\"font-weight: 400\">4 *\u00a0<\/span><span style=\"font-weight: 400\">(1-0.6<\/span><span style=\"font-weight: 400\">)^(<\/span><span style=\"font-weight: 400\">6-4)<\/span><span style=\"font-weight: 400\">+ <\/span><span style=\"font-weight: 400\">6!\/(<\/span><span style=\"font-weight: 400\">5! (6 &#8211; 5)!)<\/span> <span style=\"font-weight: 400\">0.6^<\/span><span style=\"font-weight: 400\">5 *\u00a0<\/span><span style=\"font-weight: 400\">(1-0.6<\/span><span style=\"font-weight: 400\">)^(<\/span><span style=\"font-weight: 400\">6-5)\u00a0<\/span><span style=\"font-weight: 400\">+ <\/span><span style=\"font-weight: 400\">6!\/(<\/span><span style=\"font-weight: 400\">6! (6 &#8211; 6)!)<\/span> <span style=\"font-weight: 400\">0.6^<\/span><span style=\"font-weight: 400\">6 *\u00a0<\/span><span style=\"font-weight: 400\">(1-0.6<\/span><span style=\"font-weight: 400\">)^(<\/span><span style=\"font-weight: 400\">6-6)<\/span><span style=\"font-weight: 400\">=\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">= 0.31104 + 0.186624 + 0.046656 = <\/span><b>0.5443<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"text-decoration: underline\"><em><span style=\"font-weight: 400\">What is the probability of picking less than 4 orange balls?<\/span><\/em><\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">This equates to: prob. of 0 orange + prob. of 1 orange + prob. of 2 orange + prob. of 3 orange<\/span><\/p>\n<p><span style=\"font-weight: 400\">= 1 &#8211; Prob. of at least 4 orange balls = 1 &#8211; 0.5443 = <\/span><b>0.4557<\/b><\/p>\n<h2><b>Mean of the Binomial Distribution<\/b><\/h2>\n<p><span style=\"font-weight: 400\">The mean, \u2133,\u00a0 of the binomial distribution is the product of the sample size, n, and the probability of an event of interest (success), p.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">\u2133 = E (X) = np<\/span><\/p>\n<p><span style=\"font-weight: 400\">This is the value that is statistically most likely to occur. For instance, consider the example of tossing two unbiased dice, the range of values that may result extends from 2 to 12. The mean value is actually 7. This is because there are six distinct ways to get a value of 7. They are 1&amp; 6, 6 &amp; 1, 2 &amp; 5, 5 &amp; 2, 3 &amp; 4, and 4 &amp; 3. This represents 6 distinct possibilities out of a total of 36 possibilities, which is the most likely result to occur from all the distinct possibilities.<\/span><\/p>\n<h2><b>Standard Deviation of the Binomial Distribution<\/b><\/h2>\n<p><span style=\"font-weight: 400\">The standard deviation of the binomial distribution, ?, is the square root of the variance.<\/span><\/p>\n<p><span style=\"font-weight: 400\">? = \u221a(<\/span><span style=\"font-weight: 400\">Var (X))<\/span><span style=\"font-weight: 400\">=<\/span>\u00a0\u221a(<span style=\"font-weight: 400\">np (1-p))<\/span><\/p>\n<h2><b>Poisson distribution<\/b><\/h2>\n<p><span style=\"font-weight: 400\">The <strong>Poisson distribution<\/strong> is another type of mathematical model. The Poisson distribution applies when we want to determine the number of occurrences of a particular event in some fixed interval of time and space. This fixed interval of time and space is often called an area of opportunity. Within the area of opportunity, there can be multiple occurrences of an event.\u00a0\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">There are 4 properties of the Poisson Distribution:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The area of opportunity must be defined by time, length, surface area etc. Per the Poisson distribution, we can determine the number of times a particular event occurs in a given area of opportunity.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The probability that an event occurs in a given area of opportunity must be the same for all the areas of opportunity.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The number of events that occur in each and every area of opportunity is independent of the number of events that occur in any area of opportunity<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The probability that two or more events will occur in any area of opportunity approximates to zero as the area of opportunity becomes smaller.<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400\">The Poisson distribution formula:<\/span><\/p>\n<p><span style=\"font-weight: 400\">P (X = x | \u03bb) = (<\/span><span style=\"font-weight: 400\">e^(<\/span><span style=\"font-weight: 400\">-\u03bb)*<\/span><span style=\"font-weight: 400\">\u03bb^(<\/span><span style=\"font-weight: 400\">x<\/span><span style=\"font-weight: 400\">!))\/<\/span><span style=\"font-weight: 400\">x!<\/span><\/p>\n<p><span style=\"font-weight: 400\">where\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">P (X = x | \u03bb) = probability that X = x events in an area of opportunity given \u03bb\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">\u03bb = expected number of events per unit<\/span><\/p>\n<p><span style=\"font-weight: 400\">e = mathematical constant approximated by 2.71828<\/span><\/p>\n<p><span style=\"font-weight: 400\">x = number of events (x = 0,1,2, \u2026, n)<\/span><\/p>\n<p><em><span style=\"font-weight: 400\">Example: Imagine<\/span><span style=\"font-weight: 400\"> that the mean number of cars that pass an intersection in a 1-minute interval is 5.0. <\/span><\/em><\/p>\n<ul>\n<li><span style=\"text-decoration: underline\"><em><span style=\"font-weight: 400\">What is the probability that in a given minute, exactly four cars will arrive?\u00a0<\/span><\/em><\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">P (X = 4 | \u03bb = 5) = (<\/span><span style=\"font-weight: 400\">e^(<\/span><span style=\"font-weight: 400\">-5)*<\/span><span style=\"font-weight: 400\">(5.0)^(<\/span><span style=\"font-weight: 400\">4)<\/span><span style=\"font-weight: 400\">!)\/<\/span><span style=\"font-weight: 400\">4!<\/span> <span style=\"font-weight: 400\">= <\/span><b>0.1755<\/b><\/p>\n<ul>\n<li><span style=\"text-decoration: underline\"><em><span style=\"font-weight: 400\">What is the probability that more than four cars will arrive in a given minute?<\/span><\/em><\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">The probability that more than four cars will arrive:<\/span><\/p>\n<p><span style=\"font-weight: 400\">P (X &gt; 4) = P (X = 4) + P (X = 5) + P (X = 6) + \u2026+<\/span><\/p>\n<p><span style=\"font-weight: 400\">Since all probabilities in a distribution sum to 1:<\/span><\/p>\n<p><span style=\"font-weight: 400\">P (X &gt; 4) = 1 &#8211; P(X &lt;= 4) = 1 &#8211; [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)]<\/span><\/p>\n<p><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0= 1 &#8211; (0.0067 + 0.0337 + 0.0842 + 0.1404 + 0.1755)<\/span><\/p>\n<p><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0= 1 &#8211; 0.4405<\/span><\/p>\n<p><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0=\u00a0 <\/span><b>0.5595<\/b><span style=\"font-weight: 400\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>Some of the material in this post was obtained from\u00a0<em>Statistics for Managers: Using Microsoft Excel, Eighth Edition.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction to Discrete Probability Distributions Discrete versus Continuous Variables A discrete variable typically originates from a counting process while a continuous variable usually comes from a measuring process. An easy way to make the distinction between a discrete and a continuous variable is that discrete variables are usually whole numbers with no decimals. Continuous variables [&hellip;]<\/p>\n","protected":false},"author":1362,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[153,148,149,152,158,159,160],"tags":[5,114,111,109,110,113,115,112,41,39],"_links":{"self":[{"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/posts\/146"}],"collection":[{"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/users\/1362"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/comments?post=146"}],"version-history":[{"count":7,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/posts\/146\/revisions"}],"predecessor-version":[{"id":597,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/posts\/146\/revisions\/597"}],"wp:attachment":[{"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/media?parent=146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/categories?post=146"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/tags?post=146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}