The following post is about breaking down the uses for different types of tests. More importantly, it’s designed to help you know what test to use based on the question being asked. This is not a comprehensive list of all the statistical tests out there, so if you feel that there is something missing which you would like to be included, please leave a comment below. All formulas for the tests presented here can be found in the Statistics Formula Glossary post. At the bottom is a decision tree which may be helpful in visualizing the purpose of this post. Continue reading
Tag Archives: MATH84
Statistics Formula Glossary
Stats Formula Glossary (Word) as of 7/16/2019
Stats Formula Glossary (PDF) as of 7/16/2019
Attached to this post are a PDF version and a Word document version of a glossary of formulas that may be helpful to keep around when practicing statistical problems for homework or studying for an upcoming test.
Please keep in mind that although these formulas work, they may not be the versions that your professors have taught you to use. It may also be that this formula sheet has formulas for problems you don’t need to know how to solve for the purposes of your class. If this is the case, we encourage you to download the Word document version so that you may add to, subtract from, or edit the glossary to fit your own individual needs.
Please keep in mind that this formula sheet may be edited after having been posted; the copy you download today may be different from the copy posted tomorrow. This list is by no means comprehensive of all formulas used in the field of statistics. If you have suggestions on formulas you would like to see added to this list, please leave a comment underneath of this post and we will take your suggestion into consideration. Good luck!
Statistics: Regression
Introduction to Linear Regression
Linear regression is a method for determining the best-fitting line through a set of data. In a lot of ways, it’s similar to a correlation since things like r and r squared are still used. The one difference is that the purpose of regression is prediction. The best-fitting line is calculated through the minimization of total squared error between the data points and the line.
The equation used for regression is Y = a +bx or some variation of that. If you remember from algebra class, this formula is like Y=mx+b. This is because they are both the linear equation. Although you may be asked to report r and r squared, the purpose of regression is to be able to find values for the slope (b) and the y-intercept (a) that creates a line that best fits through the data. Continue reading
Statistics: Correlation
Introduction to Correlation and Regression
So far we’ve been talking about analyses which involve variables which are split up into categorical or discrete variables (ex. treatment A, B, C) compared to a dependent variable which is continuous (ex. plant height). However, there is a way to look at two variables which have continuous data: correlation. A correlation will tell you the characteristics of a relationship such as direction (either positive or negative), form (we often work with linear relationships), and strength of the relationship. Strength and direction can be understood with the number which is given at the end of an analysis (r).
A positive correlation is one in which the increased value of one variable results in the increased value of another. For example, height and weight – as height increased, weight also tends to increase. A negative correlation is one in which the increased value of one variable results in the decrease of another. For example, as the temperature outside increases, hot chocolate sales will decrease. This is what is meant by the direction of a correlation. An r-value with a negative sign in front of it means a negative correlation and one without a negative sign means a positive correlation. Continue reading
Statistics: Two-Factor ANOVA
Introduction to Two-Factor ANOVAs
So far we’ve talked about tests which are used if there is one independent variable, either with two levels or more. This is not the limit of how much we can include in a single analysis. In a two-factor ANOVA, there is more than one independent variable and each of those variables can have two or more levels. Take this example into consideration:
A farmer wants to know the best combination of products to use to maximize her crop yield. She decides to test out three different fertilizer brands (A, B, and C) and two different kinds of seeds (Y and Z). Each product is paired once with another for a total of 6 conditions: AY, BY, CY, AZ, BZ, CZ.
A two-factor ANOVA considers more than one factor and considers the joint impact of factors. This means that instead of running a new study every time you want to see how an independent variable affects a specific dependent variable, you can run an experiment with two different independent variables and seeing how they each impact the dependent variable and you get to see if the two independent variables do anything together to affect the dependent variable. These are called main effects and interactions. Keeping the example going, if we find that no matter what the seed type is that fertilizers A, B, and C resulted in different crop yields from one another, we would say there is a main effect for fertilizer type. If no matter what the fertilizer type is there is a difference between the crop yields of seeds Y and Z, we would say that there is a main effect for seed type. If there are times that the two factors influence each other (for example, let’s say that fertilizer worked much better specifically when paired with Y seeds), we would say there’s an interaction. The defining characteristic of an interaction is when the effect of one factor depends on the different levels of a second factor or the impact of another factor, either amplifying or reducing the effect based on the level. Continue reading
Statistics: Repeated Measures ANOVA
Repeated Measures One-way ANOVA
Just like when we talked about independent samples t-tests and repeated measures t-tests, ANOVAs can have the same distinction. Independent one-way ANOVAs use samples which are in no way related to each other; each sample is completely random, uses different individuals, and those individuals are not paired in any meaningful way. In a repeated measures one-way ANOVA, individuals can be in multiple treatment conditions, be paired with other individuals based on important characteristics, or simply matched based on a relationship to one another (twins, siblings, couples, etc.). What’s important to remember that in a repeated measures one-way ANOVA, we are still given the opportunity to work with multiple levels, not just two like with a t-test.
Advantages:
- Individual differences among participants do not influence outcomes or influence them very little because everyone is either paired up on important participant characteristics or they are the same person in multiple conditions.
- A smaller number of subjects needed to test all the treatments.
- Ability to assess an effect over time.
Statistics: Independent One-Way ANOVA
Independent One-way ANOVA
An ANOVA (ANalysis Of VAriance) is a test that is run either to compare multiple independent variables with two or more levels each, or one independent variable with more than 2 levels. You can technically also run an ANOVA in the same cases you would run a t-test and come up with the same results, but this isn’t common practice, as t-tests are easier to compute by hand.
For the purposes of this post, a One-way ANOVA is a test which compares the means of multiple samples (more than 2) which are connected by the same independent variable. An example of this might be comparing the growth of plans who receive no water (Group 1) a little water (Group 2), a moderate amount of water (Group 3), and a lot of water (Group 4).
A factor is another name for an independent variable. As mentioned earlier, ANOVAs can sometimes have more than one factor, but for now we’re only working with one, just like we have before. A level is a group within that independent variable. Using the example from before, the groups in which the plants are put in are the levels (no water, little water, some water, a lot of water) and the independent variable itself is just water amount. Continue reading
Statistics: Repeated Measures t-test
Repeated Measures T-Test
A repeated measures or paired samples design is all about minimizing confounding variables like participant characteristics by either using the same person in multiple levels of a factor or pairing participants up in each group based on similar characteristics or relationship and then having them take part in different treatments. Matched subjects is another word used to describe this kind of test and it is used specifically to refer to designs in which different people are matched up by their characteristics. Participants are often matched by age, gender, race, socioeconomic status, or other demographic features, but can also be matched up on other characteristics the researchers might consider possible confounds. Twin studies are a good example of this kind of design; one twin has to be matched up with the other – they can’t be matched to someone else’s twin.
To reiterate the differences between a repeated measures t-test and the other kinds of tests you may have learned up to this point, a single sample t-test revolves around drawing conclusions about a treated population based on a sample mean and an untreated population mean (no standard deviation). An independent sample t-tests are all about comparing the means of two samples (usually a control group/untreated group and a treated group) to draw inferences about how there might be differences between those two groups in the broader population. Different, randomly assigned participants are used in each group. Related samples t-tests are like independent sample t-tests except they use the same person for multiple test groups or they match people based on their characteristics or relationships to cut down on extraneous variables which may interfere with the data. Continue reading
Statistics: Independent t-tests
Independent t-test
In our last post, we talked about single sample t-tests, which is a way of comparing the mean of a population with the mean of a sample to look for a difference. With two-sample t-tests, we are now trying to find a difference between two different sample means. More specifically, independent t-tests involve comparing the means of two samples which are distinctly different from one another in regards to the individuals within each sample. For example, a group of pet owners vs. a group of folks who don’t own pets. These two groups are completely independent of one another. This distinction will be important in a later post.
A more technical explanation of the difference between a single sample and two-sample is that a single sample t-test revolves around drawing conclusions about a treated population based on a sample mean and an untreated population mean (no standard deviation). An independent sample t-tests are all about comparing the means of two samples (usually a control group/untreated group and a treated group) to draw inferences about how there might be differences between those two groups in the broader population
There are some distinct advantages and disadvantages to this approach when compared to other approaches. To avoid confusion, we won’t describe the other approaches here but will just mark the advantages and disadvantages of this one here for your consideration: Continue reading
Statistics: Introduction to the t-statistic
Introduction to the t-statistic
Z-tests vs. t-tests
Z-tests compare the means between a population and a sample and require information that is usually unavailable about populations, namely the variance/standard deviation. Single sample t-tests compare the population mean to a sample mean, but only require one variance/standard deviation, and that’s from the sample. This is where estimated standard error comes in. It’s used as an estimate of the real standard error, σM, when the value of σ is unknown. It is computed using the sample variance or sample standard deviation and provides an estimate of the standard distance between a sample mean, M, and the population mean, μ, (or rather, the mean of sample means). It’s an “error” because it’s the distance between what the sample mean is and what it would ideally be since we would rather have the population standard deviation. The formula for estimated standard error is s/√n.
The formula for the t-test itself is: with the bottom portion referring to the estimated standard error. You may see this written as sM instead. Continue reading