{"id":93,"date":"2019-06-18T18:53:47","date_gmt":"2019-06-18T18:53:47","guid":{"rendered":"http:\/\/blogs.ubalt.edu\/jboettinger\/?p=93"},"modified":"2019-10-17T20:25:22","modified_gmt":"2019-10-17T20:25:22","slug":"statistics-correlation","status":"publish","type":"post","link":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/2019\/06\/18\/statistics-correlation\/","title":{"rendered":"Statistics: Correlation"},"content":{"rendered":"<h1><b>Introduction to Correlation and Regression<\/b><\/h1>\n<p>So far we&#8217;ve been talking about analyses which involve variables which are split up into categorical or discrete variables (ex. treatment A, B, C) compared to a dependent variable which is continuous (ex. plant height). However, there is a way to look at two variables which have continuous data:\u00a0<strong>correlation.<\/strong> A correlation will tell you the characteristics of a relationship such as <strong>direction<\/strong> (either positive or negative), <strong>form<\/strong> (we often work with linear relationships), and <strong>strength<\/strong> of the relationship. Strength and direction can be understood with the number which is given at the end of an analysis (r).<\/p>\n<p>A\u00a0<strong>positive correlation<\/strong> is one in which the increased value of one variable results in the increased value of another. For example, height and weight &#8211; as height increased, weight also tends to increase. A\u00a0<strong>negative correlation<\/strong> is one in which the increased value of one variable results in the decrease of another. For example, as the temperature outside increases, hot chocolate sales will decrease. This is what is meant by the\u00a0direction of a correlation. An r-value with a negative sign in front of it means a negative correlation and one without a negative sign means a positive correlation.<!--more--><\/p>\n<p>R-values exist on a plane between -1 and 1. The closer a number is to 1, the stronger its positive relationship. The closer a number is to -1, the stronger its negative relationship. The closer a number is to 0, the weaker its relationship, no matter if its negative or positive.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-126\" src=\"http:\/\/blogs.ubalt.edu\/jboettinger\/wp-content\/uploads\/sites\/1114\/2019\/06\/correlation-300x88.jpg\" alt=\"\" width=\"607\" height=\"178\" srcset=\"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-content\/uploads\/sites\/1114\/2019\/06\/correlation-300x88.jpg 300w, https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-content\/uploads\/sites\/1114\/2019\/06\/correlation-624x183.jpg 624w, https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-content\/uploads\/sites\/1114\/2019\/06\/correlation.jpg 643w\" sizes=\"(max-width: 607px) 100vw, 607px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-127\" src=\"http:\/\/blogs.ubalt.edu\/jboettinger\/wp-content\/uploads\/sites\/1114\/2019\/06\/correlation2real-300x198.jpg\" alt=\"\" width=\"505\" height=\"333\" \/><\/p>\n<h2>Pearson Correlation<\/h2>\n<p>The most common type of correlation used is the\u00a0<strong>Pearson Correlation<\/strong>. It measures the degree and direction of the <i><span style=\"font-weight: 400\">linear relationship between two variables. It will measure a perfect linear relationship. Every change in variable X has a corresponding change in variable Y. The possible range of an r-value is between -1 and 1. R is calculated in the following way: r = covariability \/ variability of X and Y separately.\u00a0<\/span><\/i><\/p>\n<p>There are some important factors to take into consideration when using and interpreting the Pearson Correlation.<\/p>\n<ol>\n<li><em><span style=\"text-decoration: underline\">Correlation does not demonstrate causation.<\/span><\/em><em>\u00a0<\/em> This is something very important to remember; just because two variables have a correlation doesn&#8217;t mean that one is causing the other. There may be another factor (Z) that we haven&#8217;t measured which may be the real reason. Take into account that when ice cream sales go up, so do the number of drownings in an area. Does that mean that ice creams are causing people to drown? Consider that maybe increased temperatures result in more ice cream consumption as well as an increase in the number of people who are going out swimming. You can never know if there&#8217;s a third lingering factor with just a correlation.<\/li>\n<li>The value of the correlation is affected by the range of scores in the data. For example, if you&#8217;re looking at how height and age correlate, if your sample is just made up of people who are 20 or older, you probably will get a weak correlation, as most adults no longer grow. However, if your sample is 17 or younger, you&#8217;re likely to find a decent positive correlation.<\/li>\n<li><span style=\"font-weight: 400\">Extreme points (outliers) have an impact. Data points which vary greatly from the others may sometimes need to be removed as their presence affects the correlation.<\/span><\/li>\n<li><span style=\"font-weight: 400\">Correlation cannot be interpreted as a proportion.<\/span><\/li>\n<\/ol>\n<h2>Coefficient of Determination<\/h2>\n<p>The\u00a0<strong>coefficient of determination<\/strong> is a measurement of the proportion of variability in one variable that can be determined from the relationship with the other variable (r squared). In other words, it&#8217;s used to analyze how differences in one variable can be explained by a difference in a second variable. The example given by Statistics How To is that\u00a0<em>when\u00a0<\/em>you get pregnant has a direct relation to when they give birth. Link to the whole article <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/coefficient-of-determination-r-squared\/\">here<\/a>. This measure is usually reported along the lines of this: &#8220;75% of the variation in Y can be explained by the variation in X.&#8221;<\/p>\n<h2>Other Types of Correlation<\/h2>\n<p>While Pearson Correlation is the most commonly used, there are times when the data one collects warrants the use of a different kind of correlation. Some are listed below:<\/p>\n<ul>\n<li><strong>Partial Correlation<\/strong>:\u00a0<span style=\"font-weight: 400\">A partial correlation measures the relationship between two variables while controlling the influence of a third variable by holding it constant.<\/span><\/li>\n<li><span style=\"font-weight: 400\"><strong>Spearman correlation<\/strong>: Used when both variables are measured on an ordinal scale; Used when the relationship is consistently directional but may not be linear.<\/span><\/li>\n<li><span style=\"font-weight: 400\"><strong>Point-biserial correlation<\/strong>: Measures relationship between two variables when o<\/span><span style=\"font-weight: 400\">ne variable has only two values (dichotomous value)<\/span><\/li>\n<li><span style=\"font-weight: 400\"><strong>Phi-coefficient<\/strong>: Both variables are dichotomous.\u00a0<\/span><span style=\"font-weight: 400\">Both variables are re-coded to values 0 and 1<\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Introduction to Correlation and Regression So far we&#8217;ve been talking about analyses which involve variables which are split up into categorical or discrete variables (ex. treatment A, B, C) compared to a dependent variable which is continuous (ex. plant height). However, there is a way to look at two variables which have continuous data:\u00a0correlation. A [&hellip;]<\/p>\n","protected":false},"author":1347,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[151,153,148,152,158,155,156,150,159,160],"tags":[4,3,5,102,95,97,98,8,100,103,101,106,105,99,7,6,91,90,104,96],"_links":{"self":[{"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/posts\/93"}],"collection":[{"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/users\/1347"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/comments?post=93"}],"version-history":[{"count":4,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/posts\/93\/revisions"}],"predecessor-version":[{"id":131,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/posts\/93\/revisions\/131"}],"wp:attachment":[{"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/media?parent=93"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/categories?post=93"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.ubalt.edu\/mathsupportcenter\/wp-json\/wp\/v2\/tags?post=93"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}