0 mg
In conducting this experiment, the experimenter had two research questions:
To answer these questions, the experimenter intends to use one-way analysis of variance.
Before you crunch the first number in one-way analysis of variance, you must be sure that one-way analysis of variance is the correct technique. That means you need to ask two questions:
Let's address both of those questions.
As we discussed in the previous lesson (see One-Way Analysis of Variance: Fixed Effects ), one-way analysis of variance is only appropriate with one experimental design - a completely randomized design. That is exactly the design used in our cholesterol study, so we can check the experimental design box.
We also learned in the previous lesson that one-way analysis of variance makes three critical assumptions:
Therefore, for the cholesterol study, we need to make sure our data set is consistent with the critical assumptions.
The assumption of independence is the most important assumption. When that assumption is violated, the resulting statistical tests can be misleading.
The independence assumption is satisfied by the design of the study, which features random selection of subjects and random assignment to treatment groups. Randomization tends to distribute effects of extraneous variables evenly across groups.
Violations of normality can be a problem when sample size is small, as it is in this cholesterol study. Therefore, it is important to be on the lookout for any indication of non-normality.
There are many different ways to check for normality. On this website, we describe three at: How to Test for Normality: Three Simple Tests . Given the small sample size, our best option for testing normality is to look at the following descriptive statistics:
The table below shows the mean, median, skewness, and kurtosis for each group from our study.
Group 1, 0 mg | Group 2, 50 mg | Group 3, 100 mg | |
---|---|---|---|
Mean | 258 | 246 | 210 |
Median | 270 | 240 | 210 |
Range | 90 | 60 | 60 |
Skewness | -0.40 | -0.51 | 0.00 |
Kurtosis | -0.18 | -0.61 | 2.00 |
In all three groups, the difference between the mean and median looks small (relative to the range ). And skewness and kurtosis measures are consistent with a normal distribution (i.e., between -2 and +2). These are crude tests, but they provide some confidence for the assumption of normality in each group.
Note: With Excel, you can easily compute the descriptive statistics in Table 1. To see how, go to: How to Test for Normality: Example 1 .
When the normality of variance assumption is satisfied, you can use Hartley's Fmax test to test for homogeneity of variance. Here's how to implement the test:
Σj=1 - X ) | |
s = | |
( n - 1 ) |
where X i, j is the score for observation i in Group j , X j is the mean of Group j , and n j is the number of observations in Group j .
Here is the variance ( s 2 j ) for each group in the cholesterol study.
Group 1, 0 mg | Group 2, 50 mg | Group 3, 100 mg |
---|---|---|
1170 | 630 | 450 |
F RATIO = s 2 MAX / s 2 MIN
F RATIO = 1170 / 450
F RATIO = 2.6
where s 2 MAX is the largest group variance, and s 2 MIN is the smallest group variance.
where n is the largest sample size in any group.
Note: The critical F values in the table are based on a significance level of 0.05.
Here, the F ratio (2.6) is smaller than the Fmax value (15.5), so we conclude that the variances are homogeneous.
Note: Other tests, such as Bartlett's test , can also test for homogeneity of variance. For the record, Bartlett's test yields the same conclusion for the cholesterol study; namely, the variances are homogeneous.
Having confirmed that the critical assumptions are tenable, we can proceed with a one-way analysis of variance. That means taking the following steps:
Now, let's execute each step, one-by-one, with our cholesterol medication experiment.
For every experimental design, there is a mathematical model that accounts for all of the independent and extraneous variables that affect the dependent variable. In our experiment, the dependent variable ( X ) is the cholesterol level of a subject, and the independent variable ( β ) is the dosage level administered to a subject.
For example, here is the fixed-effects model for a completely randomized design:
X i j = μ + β j + ε i ( j )
where X i j is the cholesterol level for subject i in treatment group j , μ is the population mean, β j is the effect of the dosage level administered to subjects in group j ; and ε i ( j ) is the effect of all other extraneous variables on subject i in treatment j .
For fixed-effects models, it is common practice to write statistical hypotheses in terms of the treatment effect β j . With that in mind, here is the null hypothesis and the alternative hypothesis for a one-way analysis of variance:
H 0 : β j = 0 for all j
H 1 : β j ≠ 0 for some j
If the null hypothesis is true, the mean score (i.e., mean cholesterol level) in each treatment group should equal the population mean. Thus, if the null hypothesis is true, mean scores in the k treatment groups should be equal. If the null hypothesis is false, at least one pair of mean scores should be unequal.
The significance level (also known as alpha or α) is the probability of rejecting the null hypothesis when it is actually true. The significance level for an experiment is specified by the experimenter, before data collection begins.
Experimenters often choose significance levels of 0.05 or 0.01. For this experiment, let's use a significance level of 0.05.
Analysis of variance begins by computing a grand mean and group means:
X = ( 1 / 15 ) * ( 210 + 210 + ... + 270 + 240 )
X 1 = 258
X 2 = 246
X 3 = 210
In the equations above, n is the total sample size across all groups; and n j is the sample size in Group j .
A sum of squares is the sum of squared deviations from a mean score. One-way analysis of variance makes use of three sums of squares:
SSB = 5 * [ ( 238-258 ) 2 + ( 238-246) 2 + ( 238-210 ) 2 ]
SSW = 2304 + ... + 900 = 9000
SST = 784 + 4 + 1084 + ... + 784 + 784 + 4
SST = 15,240
It turns out that the total sum of squares is equal to the between-groups sum of squares plus the within-groups sum of squares, as shown below:
SST = SSB + SSW
15,240 = 6240 + 9000
The term degrees of freedom (df) refers to the number of independent sample points used to compute a statistic minus the number of parameters estimated from the sample points.
To illustrate what is going on, let's find the degrees of freedom associated with the various sum of squares computations:
Here, the formula uses k independent sample points, the sample means X j . And it uses one parameter estimate, the grand mean X , which was estimated from the sample points. So, the between-groups sum of squares has k - 1 degrees of freedom ( df BG ).
df BG = k - 1 = 5 - 1 = 4
Here, the formula uses n independent sample points, the individual subject scores X i j . And it uses k parameter estimates, the group means X j , which were estimated from the sample points. So, the within-groups sum of squares has n - k degrees of freedom ( df WG ).
n = Σ n i = 5 + 5 + 5 = 15
df WG = n - k = 15 - 3 = 12
Here, the formula uses n independent sample points, the individual subject scores X i j . And it uses one parameter estimate, the grand mean X , which was estimated from the sample points. So, the total sum of squares has n - 1 degrees of freedom ( df TOT ).
df TOT = n - 1 = 15 - 1 = 14
The degrees of freedom for each sum of squares are summarized in the table below:
Sum of squares | Degrees of freedom |
---|---|
Between-groups | k - 1 = 2 |
Within-groups | n - k =12 |
Total | n - 1 = 14 |
A mean square is an estimate of population variance. It is computed by dividing a sum of squares (SS) by its corresponding degrees of freedom (df), as shown below:
MS = SS / df
To conduct a one-way analysis of variance, we are interested in two mean squares:
MS WG = SSW / df WG
MS WG = 9000 / 12 = 750
MS BG = SSB / df BG
MS BG = 6240 / 2 = 3120
The expected value of a mean square is the average value of the mean square over a large number of experiments.
Statisticians have derived formulas for the expected value of the within-groups mean square ( MS WG ) and for the expected value of the between-groups mean square ( MS BG ). For one-way analysis of variance, the expected value formulas are:
E( MS WG ) = σ ε 2
Σj=1 | |
E( MS ) = σ + | |
( k - 1 ) |
E( MS BG ) = σ ε 2 + nσ β 2
In the equations above, E( MS WG ) is the expected value of the within-groups mean square; E( MS BG ) is the expected value of the between-groups mean square; n is total sample size; k is the number of treatment groups; β j is the treatment effect in Group j ; σ ε 2 is the variance attributable to everything except the treatment effect (i.e., all the extraneous variables); and σ β 2 is the variance due to random selection of treatment levels.
Notice that MS BG should equal MS WG when the variation due to treatment effects ( β j for fixed effects and σ β 2 for random effects) is zero (i.e., when the independent variable does not affect the dependent variable). And MS BG should be bigger than the MS WG when the variation due to treatment effects is not zero (i.e., when the independent variable does affect the dependent variable)
Conclusion: By examining the relative size of the mean squares, we can make a judgment about whether an independent variable affects a dependent variable.
Suppose we use the mean squares to define a test statistic F as follows:
F(v 1 , v 2 ) = MS BG / MS WG
F(2, 12) = 3120 / 750 = 4.16
where MS BG is the between-groups mean square, MS WG is the within-groups mean square, v 1 is the degrees of freedom for MS BG , and v 2 is the degrees of freedom for MS WG .
Defined in this way, the F ratio measures the size of MS BG relative to MS WG . The F ratio is a convenient measure that we can use to test the null hypothesis. Here's how:
What does it mean for the F ratio to be significantly greater than one? To answer that question, we need to talk about the P-value.
In an experiment, a P-value is the probability of obtaining a result more extreme than the observed experimental outcome, assuming the null hypothesis is true.
With analysis of variance, the F ratio is the observed experimental outcome that we are interested in. So, the P-value would be the probability that an F statistic would be more extreme (i.e., bigger) than the actual F ratio computed from experimental data.
We can use Stat Trek's F Distribution Calculator to find the probability that an F statistic will be bigger than the actual F ratio observed in the experiment. Enter the between-groups degrees of freedom (2), the within-groups degrees of freedom (12), and the observed F ratio (4.16) into the calculator; then, click the Calculate button.
From the calculator, we see that the P ( F > 4.16 ) equals about 0.04. Therefore, the P-Value is 0.04.
Recall that we specified a significance level 0.05 for this experiment. Once you know the significance level and the P-value, the hypothesis test is routine. Here's the decision rule for accepting or rejecting the null hypothesis:
Since the P-value (0.04) in our experiment is smaller than the significance level (0.05), we reject the null hypothesis that drug dosage had no effect on cholesterol level. And we conclude that the mean cholesterol level in at least one treatment group differed significantly from the mean cholesterol level in another group.
The hypothesis test tells us whether the independent variable in our experiment has a statistically significant effect on the dependent variable, but it does not address the magnitude of the effect. Here's the issue:
With this in mind, it is customary to supplement analysis of variance with an appropriate measure of effect size. Eta squared (η 2 ) is one such measure. Eta squared is the proportion of variance in the dependent variable that is explained by a treatment effect. The eta squared formula for one-way analysis of variance is:
η 2 = SSB / SST
where SSB is the between-groups sum of squares and SST is the total sum of squares.
Given this formula, we can compute eta squared for this drug dosage experiment, as shown below:
η 2 = SSB / SST = 6240 / 15240 = 0.41
Thus, 41 percent of the variance in our dependent variable (cholesterol level) can be explained by variation in our independent variable (dosage level). It appears that the relationship between dosage level and cholesterol level is significant not only in a statistical sense; it is significant in a practical sense as well.
It is traditional to summarize ANOVA results in an analysis of variance table. The analysis that we just conducted provides all of the information that we need to produce the following ANOVA summary table:
Analysis of Variance Table
Source | SS | df | MS | F | P |
---|---|---|---|---|---|
BG | 6,240 | 2 | 3,120 | 4.16 | 0.04 |
WG | 9,000 | 12 | 750 | ||
Total | 15,240 | 14 |
This ANOVA table allows any researcher to interpret the results of the experiment, at a glance.
The P-value (shown in the last column of the ANOVA table) is the probability that an F statistic would be more extreme (bigger) than the F ratio shown in the table, assuming the null hypothesis is true. When the P-value is bigger than the significance level, we accept the null hypothesis; when it is smaller, we reject it. Here, the P-value (0.04) is smaller than the significance level (0.05), so we reject the null hypothesis.
To assess the strength of the treatment effect, an experimenter might compute eta squared (η 2 ). The computation is easy, using sum of squares entries from the ANOVA table, as shown below:
η 2 = SSB / SST = 6,240 / 15,240 = 0.41
For this experiment, an eta squared of 0.41 means that 41% of the variance in the dependent variable can be explained by the effect of the independent variable.
In this lesson, we showed all of the hand calculations for a one-way analysis of variance. In the real world, researchers seldom conduct analysis of variance by hand. They use statistical software. In the next lesson, we'll analyze data from this problem with Excel. Hopefully, we'll get the same result.
The purpose of a one-way ANOVA test is to determine the existence of a statistically significant difference among several group means. The test uses variances to help determine if the means are equal or not. To perform a one-way ANOVA test, there are five basic assumptions to be fulfilled:
The null hypothesis is that all the group population means are the same. The alternative hypothesis is that at least one pair of means is different. For example, if there are k groups
H 0 : μ 1 = μ 2 = μ 3 = ... = μ k
H a : At least two of the group means μ 1 , μ 2 , μ 3 , ..., μ k are not equal. That is, μ i ≠ μ j for some i ≠ j .
The graphs, a set of box plots representing the distribution of values with the group means indicated by a horizontal line through the box, help in the understanding of the hypothesis test. In the first graph (red box plots), H 0 : μ 1 = μ 2 = μ 3 and the three populations have the same distribution if the null hypothesis is true. The variance of the combined data is approximately the same as the variance of each of the populations.
If the null hypothesis is false, then the variance of the combined data is larger, which is caused by the different means as shown in the second graph (green box plots).
This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.
Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.
Access for free at https://openstax.org/books/statistics/pages/1-introduction
© Apr 16, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.
ANOVA, short for Analysis of Variance, is a statistical method used to see if there are significant differences between the averages of three or more unrelated groups. This technique is especially useful when comparing more than two groups, which is a limitation of other tests like the t-test and z-test. For example, ANOVA can compare average IQ scores across several countries—like the US, Canada, Italy, and Spain—to see if nationality influences IQ scores. Ronald Fisher developed ANOVA in 1918, expanding the capabilities of previous tests by allowing for the comparison of multiple groups at once. This method is also referred to as Fisher’s analysis of variance, highlighting its ability to analyze how a categorical variable with multiple levels affects a continuous variable.
The use of ANOVA depends on the research design. Commonly, ANOVAs are used in three ways: one-way ANOVA , two-way ANOVA , and N-way ANOVA.
One-Way ANOVA is a statistical method used when we’re looking at the impact of one single factor on a particular outcome. For instance, if we want to explore how IQ scores vary by country, that’s where One-Way ANOVA comes into play. The “one way” part means we’re only considering one independent variable, which in this case is the country, but remember, this country variable can include any number of categories, from just two countries to twenty or more.
Moving a step further, Two-Way ANOVA, also known as factorial ANOVA, allows us to examine the effect of two different factors on an outcome simultaneously. Building on our previous example, we could look at how both country and gender influence IQ scores. This method doesn’t just tell us about the individual effects of each factor but also lets us explore interactions between them. An interaction effect means the impact of one factor might change depending on the level of the other factor. For example, the difference in IQ scores between genders might vary from one country to another, suggesting that the effect of gender on IQ is not consistent across all countries.
When researchers have more than two factors to consider, they turn to N-Way ANOVA, where “n” represents the number of independent variables in the analysis. This could mean examining how IQ scores are influenced by a combination of factors like country, gender, age group, and ethnicity all at once. N-Way ANOVA allows for a comprehensive analysis of how these multiple factors interact with each other and their combined effect on the dependent variable, providing a deeper understanding of the dynamics at play.
In summary, ANOVA is a versatile statistical tool that scales from analyzing the effect of one factor (One-Way ANOVA) to multiple factors (Two-Way or N-Way ANOVA) on an outcome. By using ANOVA, researchers can uncover not just the direct effects of independent variables on a dependent variable but also how these variables interact with each other, offering rich insights into complex phenomena.
Schedule a time to speak with an expert using the calendar below.
User-friendly Software
Transform raw data to written interpreted results in seconds.
Omnibus ANOVA test:
The null hypothesis for an ANOVA is that there is no significant difference among the groups. The alternative hypothesis assumes that there is at least one significant difference among the groups. After cleaning the data, the researcher must test the assumptions of ANOVA. They must then calculate the F -ratio and the associated probability value ( p -value). In general, if the p -value associated with the F is smaller than .05, then the null hypothesis is rejected and the alternative hypothesis is supported. If the null hypothesis is rejected, one concludes that the means of all the groups are not equal. Post-hoc tests tell the researcher which groups are different from each other.
So what if you find statistical significance? Multiple comparison tests
When you conduct an ANOVA, you are attempting to determine if there is a statistically significant difference among the groups. If you find that there is a difference, you will then need to examine where the group differences lay.
At this point you could run post-hoc tests which are t tests examining mean differences between the groups. There are several multiple comparison tests that can be conducted that will control for Type I error rate, including the Bonferroni , Scheffe, Dunnet, and Tukey tests.
One-way ANOVA: Are there differences in GPA by grade level (freshmen vs. sophomores vs. juniors)?
Two-way ANOVA: Are there differences in GPA by grade level (freshmen vs. sophomores vs. juniors) and gender (male vs. female)?
The level of measurement of the variables and assumptions of the test play an important role in ANOVA. In ANOVA, the dependent variable must be a continuous (interval or ratio) level of measurement. The independent variables in ANOVA must be categorical (nominal or ordinal) variables. Like the t -test, ANOVA is also a parametric test and has some assumptions. ANOVA assumes that the data is normally distributed. The ANOVA also assumes homogeneity of variance, which means that the variance among the groups should be approximately equal. ANOVA also assumes that the observations are independent of each other. Researchers should keep in mind when planning any study to look out for extraneous or confounding variables. ANOVA has methods (i.e., ANCOVA) to control for confounding variables.
Testing of the Assumptions
These assumptions can be tested using statistical software (like Intellectus Statistics!). The assumption of homogeneity of variance can be tested using tests such as Levene’s test or the Brown-Forsythe Test. Normality of the distribution of the scores can be tested using histograms, the values of skewness and kurtosis, or using tests such as Shapiro-Wilk or Kolmogorov-Smirnov. The assumption of independence can be determined from the design of the study.
It is important to note that ANOVA is not robust to violations to the assumption of independence. This is to say, that even if you violate the assumptions of homogeneity or normality, you can conduct the test and basically trust the findings. However, the results of the ANOVA are invalid if the independence assumption is violated. In general, with violations of homogeneity the analysis is considered robust if you have equal sized groups. With violations of normality, continuing with the ANOVA is generally ok if you have a large sample size .
Researchers have extended ANOVA in MANOVA and ANCOVA. MANOVA stands for the multivariate analysis of variance. MANOVA is used when there are two or more dependent variables. ANCOVA is the term for analysis of covariance. The ANCOVA is used when the researcher includes one or more covariate variables in the analysis.
Check out our online course for conducting an ANOVA here .
Algina, J., & Olejnik, S. (2003). Conducting power analyses for ANOVA and ANCOVA in between-subjects designs. Evaluation & the Health Professions, 26 (3), 288-314.
Cardinal, R. N., & Aitken, M. R. F. (2006). ANOVA for the behavioural sciences researcher . Mahwah, NJ: Lawrence Erlbaum Associates.
Davison, M. L., & Sharma, A. R. (1994). ANOVA and ANCOVA of pre- and post-test, ordinal data. Psychometrika, 59 (4), 593-600.
Levy, M. S., & Neill, J. W. (1990). Testing for lack of fit in linear multiresponse models based on exact or near replicates. Communications in Statistics – Theory and Methods, 19 (6), 1987-2002.
Tsangari, H., & Akritas, M. G. (2004). Nonparametric ANCOVA with two and three covariates. Journal of Multivariate Analysis, 88 (2), 298-319.
Turner, J. R., & Thayer, J. F. (2001). Introduction to analysis of variance: Design, analysis, & interpretation . Thousand Oaks, CA: Sage Publications.
Wilcox, R. R. (2005). An approach to ANCOVA that allows multiple covariates, nonlinearity, and heteroscedasticity. Educational and Psychological Measurement, 65 (3), 442-450.
Wright, D. B. (2006). Comparing groups in a before-after design: When t test and ANCOVA produce different results. British Journal of Educational Psychology, 76 , 663-675.
To Reference this Page : Statistics Solutions. (2013). ANOVA . Retrieved from https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/anova/
Related Pages:
Statistics Solutions can assist with your quantitative analysis by assisting you to develop your methodology and results chapters. The services that we offer include:
Data Analysis Plan
Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling , Path analysis, HLM, Cluster Analysis )
Please call 727-442-4290 to request a quote based on the specifics of your research, schedule using the calendar on this page, or email [email protected]
Part of the book series: Springer Undergraduate Mathematics Series ((SUMS))
4123 Accesses
5 Citations
In this chapter, a method for the analysis of an experiment that has more than two groups of observations is described. The main objective is to determine if there are significant differences among the population means of the groups, which are assumed to be random samples from normally distributed populations. The analysis is based on an examination of variation between and within groups, and is often called the one-way analysis of variance (or one-way ANOVA for short). We explicitly consider all possible sources of variation before carefully explaining how an ANOVA is conducted.
This is a preview of subscription content, log in via an institution to check access.
Subscribe and save.
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Authors and affiliations.
School of Mathematics, Cardiff University, Cardiff, UK
Jonathan Gillard
You can also search for this author in PubMed Google Scholar
Correspondence to Jonathan Gillard .
Reprints and permissions
© 2020 Springer Nature Switzerland AG
Gillard, J. (2020). One-Way Analysis of Variance (ANOVA). In: A First Course in Statistical Inference. Springer Undergraduate Mathematics Series. Springer, Cham. https://doi.org/10.1007/978-3-030-39561-2_6
DOI : https://doi.org/10.1007/978-3-030-39561-2_6
Published : 21 April 2020
Publisher Name : Springer, Cham
Print ISBN : 978-3-030-39560-5
Online ISBN : 978-3-030-39561-2
eBook Packages : Mathematics and Statistics Mathematics and Statistics (R0)
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Policies and ethics
In the case where one is dealing with $k \ge 3$ samples all of the same size $n$, the calculations involved are much simpler, so let us consider this scenario first.
The strategy behind an ANOVA test relies on estimating the common population variance in two different ways: 1) through the mean of the sample variances -- called the variance within samples and denoted $s^2_w$, and 2) through the variance of the sample means -- called the variance between samples and denoted $s^2_b$.
When the means are not significantly different, the variance of the sample means will be small, relative to the mean of the sample variances. When at least one mean is significantly different from the others, the variance of the sample means will be larger, relative to the mean of the sample variances.
Consequently, precisely when at least one mean is significantly different from the others, the ratio of these estimates $$F = \frac{s^2_b}{s^2_w}$$ which follows an $F$-distribution, will be large (i.e., somewhere in the right tail of the distribution).
To calculate the variance of the sample means, recall that the Central Limit Theorem tells us that $$\sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}}$$ Solving for the variance, $\sigma^2$, we find $$\sigma^2 = n\sigma^2_{\overline{x}}$$ Thus, we can estimate $\sigma^2$ with $$s^2_b = n s^2_{\overline{x}}$$
Calculating the mean of the sample variances is straight-forward, we simply average $s^2_1, s^2_2, \ldots, s^2_k$. Thus, $$s^2_w = \frac{\sum s^2_i}{k}$$
Given the construction of these two estimates for the common population variance, their quotient $$F = \frac{s^2_b}{s^2_w}$$ gives us a test statistic that follows an $F$-distribution with $k-1$ degrees of freedom associated with the numerator and $(n-1) + (n-1) + \cdots + (n-1) = k(n-1) = kn - k = N - k$ degrees of freedom associated with the denominator.
The grand mean of a set of samples is the total of all the data values divided by the total sample size (or as a weighted average of the sample means). $$\overline{X}_{GM} = \frac{\sum x}{N} = \frac{\sum n\overline{x}}{\sum n}$$
The total variation (not variance) is comprised the sum of the squares of the differences of each mean with the grand mean. $$SS(T) = \sum (x - \overline{X}_{GM})^2$$
The between group variation due to the interaction between the samples is denoted SS(B) for sum of squares between groups . If the sample means are close to each other (and therefore the grand mean) this will be small. There are k samples involved with one data value for each sample (the sample mean), so there are k-1 degrees of freedom. $$SS(B) = \sum n(\overline{x} - \overline{X}_{GM})^2$$
The variance between the samples, $s^2_b$ is also denoted by MS(B) for mean square between groups . This is the between group variation divided by its degrees of freedom. $$s^2_b = MS(B) = \frac{SS(B)}{k-1}$$
The within group variation due to differences within individual samples, denoted SS(W) for sum of squares within groups . Each sample is considered independently, so no interaction between samples is involved. The degrees of freedom is equal to the sum of the individual degrees of freedom for each sample. Since each sample has degrees of freedom equal to one less than their sample sizes, and there are $k$ samples, the total degrees of freedom is $k$ less than the total sample size: $df = N - k$. $$SS(W) = \sum df \cdot s^2$$
The variance within samples $s^2_w$ is also denoted by MS(W) for mean square within groups . This is the within group variation divided by its degrees of freedom. It is the weighted average of the variances (weighted with the degrees of freedom). $$s^2_w = MS(W) = \frac{SS(W)}{N-k}$$
Here again we find an $F$ test statistic by dividing the between group variance by the within group variance -- and as before, the degrees of freedom for the numerator are $(k-1)$ and the degrees of freedom for the denominator are $(N-k)$. $$F = \frac{s^2_b}{s^2_w}$$
All of this sounds like a lot to remember, and it is. However, the following table might prove helpful in organizing your thoughts: $$\begin{array}{l|c|c|c|c|} & \textrm{SS} & \textrm{df} & \textrm{MS} & \textrm{F}\\\hline \textrm{Between} & SS(B) & k-1 & \displaystyle{s^2_b = \frac{SS(B)}{k-1}} & \displaystyle{\frac{s^2_b}{s^2_w} = \frac{MS(B)}{MS(W)}}\\\hline \textrm{Within} & SS(W) & N-k & \displaystyle{s^2_w = \frac{SS(W)}{N-k}} & \\\hline \textrm{Total} & SS(W) + SS(B) & N-1 & & \\\hline \end{array}$$
Notice that each Mean Square is just the Sum of Squares divided by its degrees of freedom, and the F value is the ratio of the mean squares.
Importantly, one must not put the largest variance in the numerator, always divide the between variance by the within variance. If the between variance is smaller than the within variance, then the means are really close to each other and you will want to fail to reject the claim that they are all equal.
The null hypothesis is rejected if the test statistic from the table is greater than the F critical value with k-1 numerator and N-k denominator degrees of freedom.
If the decision is to reject the null, then the conclusion is that at least one of the means is different. However, the ANOVA test does not tell you where the difference lies. For this, you need another test, like the Scheffe' test described below, applied to every possible pairing of samples in the original ANOVA test.
In the previous lessons, we learned how to perform inference for a population mean from one sample and also how to compare population means from two samples (independent and paired). In this Lesson, we introduce Analysis of Variance or ANOVA. ANOVA is a statistical method that analyzes variances to determine if the means from more than two populations are the same. In other words, we have a quantitative response variable and a categorical explanatory variable with more than two levels. In ANOVA, the categorical explanatory is typically referred to as the factor.
Let's use the following example to look at the logic behind what an analysis of variance is after.
We want to see whether the tar contents (in milligrams) for three different brands of cigarettes are different. Two different labs took samples, Lab Precise and Lab Sloppy.
Lab Precise took six samples from each of the three brands and got the following measurements:
Sample | Brand A | Brand B | Brand C |
---|---|---|---|
1 | 10.21 | 11.32 | 11.60 |
2 | 10.25 | 11.20 | 11.90 |
3 | 10.24 | 11.40 | 11.80 |
4 | 9.80 | 10.50 | 12.30 |
5 | 9.77 | 10.68 | 12.20 |
6 | 9.73 | 10.90 | 12.20 |
Average | \(\bar{y}_1= 10.00\) | \(\bar{y}_2= 11.00\) | \(\bar{y}_3= 12.00\) |
Lab Sloppy also took six samples from each of the three brands and got the following measurements:
Sample | Brand A | Brand B | Brand C |
---|---|---|---|
1 | 9.03 | 9.56 | 10.45 |
2 | 10.26 | 13.40 | 9.64 |
3 | 11.60 | 10.68 | 9.59 |
4 | 11.40 | 11.32 | 13.40 |
5 | 8.01 | 10.68 | 14.50 |
6 | 9.70 | 10.36 | 14.42 |
Average | \(\bar{y}_1= 10.00\) | \(\bar{y}_2= 11.00\) | \(\bar{y}_3= 12.00\) |
The sample means from the two labs turned out to be the same and thus the differences in the sample means from the two labs are zero.
From which data set can you draw more conclusive evidence that the means from the three populations are different?
We need to compare the between-sample-variation to the within-sample-variation. Since the between-sample-variation from Lab Sloppy is large compared to the within-sample-variation for data from Lab Precise, we will be more inclined to conclude that the three population means are different using the data from Lab Precise. Since such analysis is based on the analysis of variances for the data set, we call this statistical method the Analysis of Variance (or ANOVA) .
Before we go into the details of the test, we need to determine the null and alternative hypotheses. Recall that for a test for two independent means, the null hypothesis was \(\mu_1=\mu_2\). In one-way ANOVA, we want to compare \(t\) population means, where \(t>2\). Therefore, the null hypothesis for analysis of variance for \(t\) population means is:
\(H_0\colon \mu_1=\mu_2=...\mu_t\)
The alternative, however, cannot be set up similarly to the two-sample case. If we wanted to see if two population means are different, the alternative would be \(\mu_1\ne\mu_2\). With more than two groups, the research question is “Are some of the means different?." If we set up the alternative to be \(\mu_1\ne\mu_2\ne…\ne\mu_t\), then we would have a test to see if ALL the means are different. This is not what we want. We need to be careful how we set up the alternative. The mathematical version of the alternative is...
\(H_a\colon \mu_i\ne\mu_j\text{ for some }i \text{ and }j \text{ where }i\ne j\)
This means that at least one of the pairs is not equal. The more common presentation of the alternative is:
\(H_a\colon \text{ at least one mean is different}\) or \(H_a\colon \text{ not all the means are equal}\)
Recall that when we compare the means of two populations for independent samples, we use a 2-sample t -test with pooled variance when the population variances can be assumed equal.
For more than two populations, the test statistic, \(F\), is the ratio of between group sample variance and the within-group-sample variance. That is,
\(F=\dfrac{\text{between group variance}}{\text{within group variance}}\)
Under the null hypothesis (and with certain assumptions), both quantities estimate the variance of the random error, and thus the ratio should be close to 1. If the ratio is large, then we have evidence against the null, and hence, we would reject the null hypothesis.
In the next section, we present the assumptions for this test. In the following section, we present how to find the between group variance, the within group variance, and the F-statistic in the ANOVA table.
Assumptions for one-way anova test.
There are three primary assumptions in ANOVA:
A general rule of thumb for equal variances is to compare the smallest and largest sample standard deviations. This is much like the rule of thumb for equal variances for the test for independent means. If the ratio of these two sample standard deviations falls within 0.5 to 2, then it may be that the assumption is not violated.
Recall the application from the beginning of the lesson. We wanted to see whether the tar contents (in milligrams) for three different brands of cigarettes were different. Lab Precise and Lab Sloppy each took six samples from each of the three brands (A, B and C). Check the assumptions for this example.
The graph shows no obvious violations from Normal, but we should proceed with caution.
Variable | Mean | StDev |
---|---|---|
Precise Brand A | 10.000 | 0.257 |
Precise Brand B | 11.000 | 0.365 |
Precise Brand C | 12.000 | 0.276 |
The smallest standard deviation is 0.257, and twice the value is 0.514. The largest standard deviation is less than this value. Since the sample sizes are the same, it is safe to assume the standard deviations (and thus the variances) are equal.
The samples were taken independently, so there is no indication that this assumption is violated.
The sample size is small. We should check for obvious violations using the Normal Probability Plot.
Variable | Mean | StDev |
---|---|---|
Sloppy Brand A | 10.000 | 1.384 |
Sloppy Brand B | 11.000 | 1.308 |
Sloppy Brand C | 12.000 | 2.360 |
The smallest standard deviation is 1.308, and twice the value is 2.616. The largest standard deviation is less than this value. Since the sample sizes are the same, it is safe to assume the standard deviations (and thus the variances) are equal.
In this section, we present the Analysis of Variance Table for a completely randomized design, such as the tar content example.
Random samples of size \(n_1, …, n_t\) are drawn from the respective \(t\) populations. The data would have the following format:
|
|
| |||
---|---|---|---|---|---|
1 | \(y_{11}\) | \(y_{12}\) | ... | \(y_{1n_1}\) | \(\bar{y}_{1.}\) |
2 | \(y_{21}\) | \(y_{22}\) | ... | \(y_{2n_2}\) | \(\bar{y}_{2.}\) |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
\(t\) | \(y_{t1}\) | \(y_{t2}\) | ... | \(y_{tn_t}\) | \(\bar{y}_{t.}\) |
\(t\): The total number of groups
\(y_{ij}\): The \(j^{th}\) observation from the \(i^{th}\) population.
\(n_i\): The sample size from the \(i^{th}\) population.
\(n_T\): The total sample size: \(n_T=\sum_{i=1}^t n_i\).
\(\bar{y}_{i.}\): The mean of the sample from the \(i^{th}\) population.
\(\bar{y}_{..}\): The mean of the combined data. Also called the overall mean.
Recall that we want to examine the between group variation and the within group variation. We can find an estimate of the variations with the following:
It can be derived that \(\text{TSS } = \text{ SST } + \text{ SSE}\).
We can set up the ANOVA table to help us find the F-statistic. Hover over the light bulb to get more information on that item.
Source | Df | SS | MS | F | P-value |
---|---|---|---|---|---|
Treatment | \(t-1\) | \(\text{SST}\) | \(\text{MST}=\dfrac{\text{SST}}{t-1}\) | \(\dfrac{\text{MST}}{\text{MSE}}\) | |
Error | \(n_T-t\) | \(\text{SSE}\) | \(\text{MSE}=\dfrac{\text{SSE}}{n_T-t}\) | ||
Total | \(n_T-1\) | \(\text{TSS}\) |
The p-value is found using the F-statistic and the F-distribution. We will not ask you to find the p-value for this test. You will only need to know how to interpret it. If the p-value is less than our predetermined significance level, we will reject the null hypothesis that all the means are equal.
The ANOVA table can easily be obtained by statistical software and hand computation of such quantities are very tedious.
If our test of the null hypothesis is rejected , we conclude that not all the means are equal: that is, at least one mean is different from the other means. The ANOVA test itself provides only statistical evidence of a difference, but not any statistical evidence as to which mean or means are statistically different.
For instance, using the previous example for tar content, if the ANOVA test results in a significant difference in average tar content between the cigarette brands, a follow up analysis would be needed to determine which brand mean or means differ in tar content. Plus we would want to know if one brand or multiple brands were better/worse than another brand in average tar content. To complete this analysis we use a method called multiple comparisons.
Multiple comparisons conducts an analysis of all possible pairwise means. For example, with three brands of cigarettes, A, B, and C, if the ANOVA test was significant, then multiple comparison methods would compare the three possible pairwise comparisons:
These are essentially tests of two means similar to what we learned previously in our lesson for comparing two means. However, the methods here use an adjustment to account for the number of comparisons taking place. Minitab provides three adjustment choices. We will use the Tukey adjustment which is an adjustment on the t-multiplier based on the number of comparisons.
Note! We don’t go in the theory behind the Tukey method. Just note that we only use a multiple comparison technique in ANOVA when we have a significant result.
In the next section, we present an example to walk through the ANOVA results.
Using minitab to perform one-way anova.
If the data entered in Minitab are in different columns, then in Minitab we use:
Test the hypothesis that the means are the same vs. at least one is different for both labs. Compare the two labs and comment.
We are testing the following hypotheses:
\(H_0\colon \mu_1=\mu_2=\mu_3\) vs \(H_a\colon\text{ at least one mean is different}\)
The assumptions were discussed in the previous example.
The following is the output for one-way ANOVA for Lab Precise:
Null Hypothesis | All means are equal |
---|---|
Alternative Hypothesis | Not all means are equal |
Significance Level | \(\alpha\)= 0.05 |
Equal variances were assumed for the analysis.
Factor | Levels | Values |
---|---|---|
Factor | 3 | Precise A, Precise B, Precise C |
Source | DF | Adj SS | Adj MS | F-Value | P-Value |
---|---|---|---|---|---|
Factor | 2 | 12.000 | 6.00000 | 65.46 | 0.000 |
Error | 15 | 1.375 | 0.09165 | ||
Total | 17 | 13.375 |
S | R-sq | R-sq(adj) | R-sq(pred) |
---|---|---|---|
0.302743 | 89.72% | 88.35% | 85.20% |
The p-value for this test is less than 0.0001. At any reasonable significance level, we would reject the null hypothesis and conclude there is enough evidence in the data to suggest at least one mean tar content is different.
But which ones are different? The next step is to examine the multiple comparisons. Minitab provides the following output:
Factor | N | Mean | StDev | 95% CI |
---|---|---|---|---|
Precise A | 6 | 10.000 | 0.257 | (9.737, 10.263) |
Precise B | 6 | 11.000 | 0.365 | (10.737, 11.263) |
Precise C | 6 | 12.000 | 0.276 | (11.737, 12.263) |
Pooled StDev = 0.302743
Grouping information using the tukey method and 95% confidence.
Factor | N | Mean | Grouping |
---|---|---|---|
Precise C | 6 | 12.000 | A |
Precise B | 6 | 11.000 | B |
Precise A | 6 | 10.000 | C |
Means that do not share a letter are significantly different.
The Tukey pairwise comparisons suggest that all the means are different. Therefore, Brand C has the highest tar content and Brand A has the lowest.
We are testing the same hypotheses for Lab Sloppy as Lab Precise, and the assumptions were checked. The ANOVA output for Lab Sloppy is:
Factor | Levels | Values |
---|---|---|
Factor | 3 | Sloppy A, Sloppy B, Sloppy C |
Source | DF | Adj SS | Adj MS | F-Value | P-Value |
---|---|---|---|---|---|
Factor | 2 | 12.00 | 6.000 | 1.96 | 0.176 |
Error | 15 | 45.98 | 3.065 | ||
Total | 17 | 57.98 |
S | R-sq | R-sq(adj) | R-sq(pred) |
---|---|---|---|
1.75073 | 20.70% | 10.12% | 0.00% |
The one-way ANOVA showed statistically significant results for Lab Precise but not for Lab Sloppy. Recall that ANOVA compares the within variation and the between variation. For Lab Precise, the within variation was small compared to the between variation. This resulted in a large F-statistic (65.46) and thus a small p-value. For Lab Sloppy, this ratio was small (1.96), resulting in a large p-value.
20 young pigs are assigned at random among 4 experimental groups. Each group is fed a different diet. (This design is a completely randomized design.) The data are the pig's weight, in kilograms, after being raised on these diets for 10 months ( pig_weights.txt ). We wish to determine whether the mean pig weights are the same for all 4 diets.
First, we set up our hypothesis test:
\(H_0\colon \mu_1=\mu_2=\mu_3=\mu_4\)
\(H_a\colon \text { at least one mean weight is different}\)
Here are the data that were obtained from the four experimental groups, as well as, their summary statistics:
Feed 1 | Feed 2 | Feed 3 | Feed 4 |
---|---|---|---|
60.8 | 68.3 | 102.6 | 87.9 |
57.1 | 67.7 | 102.2 | 84.7 |
65.0 | 74.0 | 100.5 | 83.2 |
58.7 | 66.3 | 97.5 | 85.8 |
61.8 | 69.9 | 98.9 | 90.3 |
Descriptive statistics: feed 1, feed 2, feed 3, feed 4.
Variable | N | N* | Mean | StDev | Minimum | Maximum |
---|---|---|---|---|---|---|
Feed 1 | 5 | 0 | 60.68 | 3.03 | 57.10 | 65.00 |
Feed 2 | 5 | 0 | 69.24 | 2.96 | 66.30 | 74.00 |
Feed 3 | 5 | 0 | 100.34 | 2.16 | 97.50 | 102.60 |
Feed 4 | 5 | 0 | 86.38 | 2.78 | 83.20 | 90.30 |
The smallest standard deviation is 2.16, and the largest is 3.03. Since the rule of thumb is satisfied here, we can say the equal variance assumption is not violated. The description suggests that the samples are independent. There is nothing in the description to suggest the weights come from a normal distribution. The normal probability plots are:
There are no obvious violations from the normal assumption, but we should proceed with caution as the sample sizes are very small.
The ANOVA output is:
Factor | Levels | Values |
---|---|---|
Factor | 4 | Feed 1, Feed 2, Feed 3, Feed 4 |
Source | DF | Adj SS | Adj MS | F-Value | P-Value |
---|---|---|---|---|---|
Factor | 3 | 4703.2 | 1567.73 | 206.72 | 0.000 |
Error | 16 | 121.3 | 7.58 | ||
Total | 19 | 4824.5 |
S | R-sq | R-sq(adj) | R-sq(pred) |
---|---|---|---|
2.75386 | 97.48% | 97.01% | 96.07% |
The p-value for the test is less than 0.001. With a significance level of 5%, we reject the null hypothesis. The data provide sufficient evidence to conclude that the mean weights of pigs from the four feeds are not all the same.
With a rejection of the null hypothesis leading us to conclude that not all the means are equal (i.e., at least the mean pig weight or one diet differs from the mean pig weight from the other diets) some follow up questions are:
To answer these questions we analyze the multiple comparison output (the grouping information) and the interval graph.
Factor | N | Mean | StDev | 95% CI |
---|---|---|---|---|
Feed 1 | 5 | 60.68 | 3.03 | (58.07, 63.29) |
Feed 2 | 5 | 69.24 | 2.96 | (66.63, 71.85) |
Feed 3 | 5 | 100.340 | 2.164 | (97.729, 102.951) |
Feed 4 | 5 | 86.38 | 2.78 | (83.77, 88.99) |
Pooled StDev = 2.75386
Factor | N | Mean | Grouping | ||||
---|---|---|---|---|---|---|---|
Feed 3 | 5 | 100.340 | A | ||||
Feed 4 | 5 | 86.38 | B | ||||
Feed 2 | 5 | 69.24 | C | ||||
Feed 1 | 5 | 60.68 | D |
Each of these factor levels are associated with a grouping letter. If any factor levels have the same letter, then the multiple comparison method did not determine a significant difference between the mean response. For any factor level that does not share a letter, a significant mean difference was identified. From the lettering we see each Diet Type has a different letter, i.e. no two groups share a letter. Therefore, we can conclude that all four diets resulted in statistically significant different mean pig weights. Furthermore, with the order of the means also provided from highest to lowest, we can say that Feed 3 resulted in the highest mean weight followed by Feed 4, then Feed 2, then Feed 1. This grouping result is supported by the graph of the intervals.
In analyzing the intervals, we reflect back on our lesson in comparing two means: if an interval contained zero, we could not conclude a difference between the two means; if the interval did not contain zero, then a difference between the two means was supported. With four factor levels, there are six possible pairwise comparisons. (Remember the binomial formula where we had the counter for the number of possible outcomes? In this case \(4\choose 2\) = 6). In inspecting each of these six intervals, we find that all six do NOT include zero. Therefore, there is a statistical difference between all four group means; the four types of diet resulted in significantly different mean pig weights.
The one-way ANOVA presented in the Lesson is a simple case. In practice, research questions are rarely this “simple.” ANOVA models become increasingly complex very quickly.
The two-way ANOVA model is briefly introduced here to give you an idea of what to expect in practice. Even two-way ANOVA can be too “simple” for practice.
In two-way ANOVA, there are two factors of interest. When there are two factors, the experimental units get a combination of treatments.
Suppose a researcher is interested in examining how different fertilizers affect the growth of plants. However, the researcher is also interested in the growth of different species of plant. Species is the second factor, making this a two-factor experiment. But... those of you with green thumbs say sometimes different fertilizers are more effective on different species of plants!
This is the idea behind two-way ANOVA. If you are interested in more complex ANOVA models, you should consider taking STAT 502 and STAT 503 .
In this Lesson, we introduced One-way Analysis of Variance (ANOVA). The ANOVA test tests the hypothesis that the population means for the groups are the same against the hypothesis that at least one of the means is different. If the null hypothesis is rejected, we need to perform multiple comparisons to determine which means are different.
Additional hypothesis tests.
In unit 1, we learned the basics of statistics – what they are, how they work, and the mathematical and conceptual principles that guide them. In unit 2, we put applied these principles to the process and ideas of hypothesis testing – how we take observed sample data and use it to make inferences about our populations of interest – using one continuous variable and one categorical variable. We will now continue to use this same hypothesis testing logic and procedure on new types of data. We will focus on group mean differences on more than two groups, using Analysis of Variance.
Analysis of variance, often abbreviated to ANOVA for short, serves the same purpose as the t -tests we learned earlier in unit 2: it tests for differences in group means. ANOVA is more flexible in that it can handle any number of groups, unlike t -tests which are limited to two groups (independent samples) or two time points (paired samples). Thus, the purpose and interpretation of ANOVA will be the same as it was for t -tests, as will the hypothesis testing procedure. However, ANOVA will, at first glance, look much different from a mathematical perspective, though as we will see, the basic logic behind the test statistic for ANOVA is actually the same.
An Analysis of Variance (ANOVA) is an inferential statistical tool that we use to find statistically significant differences among the means of two or more populations.
We calculate variance but the goal is still to compare population mean differences. The test statistic for the ANOVA is called F. It is a ratio of two estimates of the population variance based on the sample data.
Experiments are designed to determine if there is a cause and effect relationship between two variables. In the language of the ANOVA, the factor is the variable hypothesized to cause some change (effect) in the response variable (dependent variable).
An ANOVA conducted on a design in which there is only one factor is called a one-way ANOVA . If an experiment has two factors, then the ANOVA is called a two-way ANOVA . For example, suppose an experiment on the effects of age and gender on reading speed were conducted using three age groups (8 years, 10 years, and 12 years) and the two genders (male and female). The factors would be age and gender. Age would have three levels and gender would have two levels. ANOVAs can also be used for within-group/repeated and between subjects designs. For this chapter we will focus on between subject one-way ANOVA .
In a One-Way ANOVA we compare two types of variance: the variance between groups and the variance within groups, which we will discuss in the next section.
We have seen time and again that scores, be they individual data or group means, will differ naturally. Sometimes this is due to random chance, and other times it is due to actual differences. Our job as scientists, researchers, and data analysts is to determine if the observed differences are systematic and meaningful (via a hypothesis test) and, if so, what is causing those differences. Through this, it becomes clear that, although we are usually interested in the mean or average score, it is the variability in the scores that is key.
Take a look at figure 1, which shows scores for many people on a test of skill used as part of a job application. The x-axis has each individual person, in no particular order, and the y-axis contains the score each person received on the test. As we can see, the job applicants differed quite a bit in their performance, and understanding why that is the case would be extremely useful information. However, there’s no interpretable pattern in the data, especially because we only have information on the test, not on any other variable (remember that the x-axis here only shows individual people and is not ordered or interpretable).
Figure 1. Scores on a job test
Our goal is to explain this variability that we are seeing in the dataset. Let’s assume that as part of the job application procedure we also collected data on the highest degree each applicant earned. With knowledge of what the job requires, we could sort our applicants into three groups: those applicants who have a college degree related to the job, those applicants who have a college degree that is not related to the job, and those applicants who did not earn a college degree. This is a common way that job applicants are sorted, and we can use ANOVA to test if these groups are actually different. Figure 2 presents the same job applicant scores, but now they are color coded by group membership (i.e. which group they belong in). Now that we can differentiate between applicants this way, a pattern starts to emerge: those applicants with a relevant degree (coded red) tend to be near the top, those applicants with no college degree (coded black) tend to be near the bottom, and the applicants with an unrelated degree (coded green) tend to fall into the middle. However, even within these groups, there is still some variability, as shown in Figure 2.
Figure 2. Applicant scores coded by degree earned
This pattern is even easier to see when the applicants are sorted and organized into their respective groups, as shown in Figure 3.
Figure 3. Applicant scores by group
Now that we have our data visualized into an easily interpretable format, we can clearly see that our applicants’ scores differ largely along group lines. Those applicants who do not have a college degree received the lowest scores, those who had a degree relevant to the job received the highest scores, and those who did have a degree but one that is not related to the job tended to fall somewhere in the middle. Thus, we have systematic variance between our groups.
The process and analyses used in ANOVA will take these two sources of variance (systematic variance between groups and random error within groups, or how much groups differ from each other and how much people differ within each group) and compare them to one another to determine if the groups have any explanatory value in our outcome variable. By doing this, we will test for statistically significant differences between the group means, just like we did for t – tests. We will go step by step to break down the math to see how ANOVA actually works.
ANOVA is all about looking at the different sources of variance (i.e. the reasons that scores differ from one another) in a dataset. Fortunately, the way we calculate these sources of variance takes a very familiar form: the Sum of Squares. Before we get into the calculations themselves, we must first lay out some important terminology and notation.
In ANOVA, we are working with two variables, a grouping or explanatory variable and a continuous outcome variable . The grouping variable is our predictor (it predicts or explains the values in the outcome variable) or, in experimental terms, our independent variable , and it made up of k groups, with k being any whole number 2 or greater. That is, ANOVA requires two or more groups to work, and it is usually conducted with three or more. In ANOVA, we refer to groups as “levels”, so the number of levels is just the number of groups, which again is k . In the above example, our grouping variable was education, which had 3 levels, so k = 3. When we report any descriptive value (e.g. mean, sample size, standard deviation) for a specific group, we will use a subscript 1… k to denote which group it refers to. For example, if we have three groups and want to report the standard deviation s for each group, we would report them as s 1 , s 2 , and s 3 .
Our second variable is our outcome variable . This is the variable on which people differ, and we are trying to explain or account for those differences based on group membership. In the example above, our outcome was the score each person earned on the test. Our outcome variable will still use X for scores as before. When describing the outcome variable using means, we will use subscripts to refer to specific group means. So if we have k = 3 groups, our means will be ̅X̅1̅, ̅X̅2̅, and ̅X̅3̅. We will also have a single mean representing the average of all participants across all groups. This is known as the grand mean , and we use the symbol X̅G. These different means – the individual group means and the overall grand mean –will be how we calculate our sums of squares.
Finally, we now have to differentiate between several different sample sizes. Our data will now have sample sizes for each group, and we will denote these with a lower case “n” and a subscript, just like with our other descriptive statistics: n 1 , n 2 , and n 3 . We also have the overall sample size in our dataset, and we will denote this with a capital N. The total sample size (N) is just the group sample sizes added together.
One source of variability we can identified in Figure 3 of the above example was differences or variability between the groups. That is, the groups clearly had different average levels. The variability arising from these differences is known as the between groups variability, and it is quantified using Between Groups Sum of Squares.
Our calculations for sums of squares in ANOVA will take on the same form as it did for regular calculations of variance. Each observation, in this case the group means, is compared to the overall mean, in this case the grand mean, to calculate a deviation score. These deviation scores are squared so that they do not cancel each other out and sum to zero. The squared deviations are then added up, or summed. There is, however, one small difference. Because each group mean represents a group composed of multiple people, before we sum the deviation scores we must multiple them by the number of people within that group. Incorporating this, we find our equation for Between Groups Sum of Squares.
The other source of variability in the figures comes from differences that occur within each group. That is, each individual deviates a little bit from their respective group mean, just like the group means differed from the grand mean. We therefore label this source the Within Groups Sum of Squares. Because we are trying to account for variance based on group-level means, any deviation from the group means indicates an inaccuracy or error. Thus, our within groups variability represents our error in ANOVA.
We can see that our Total Sum of Squares is just each individual score minus the grand mean. As with our Within Groups Sum of Squares, we are calculating a deviation score for each individual person, so we do not need to multiply anything by the sample size; that is only done for Between Groups Sum of Squares.
This will prove to be very convenient, because if we know the values of any two of our sums of squares, it is very quick and easy to find the value of the third. It is also a good way to check calculations: if you calculate each SS by hand, you can make sure that they all fit together as shown above, and if not, you know that you made a math mistake somewhere.
We can see from the above formulas that calculating an ANOVA by hand from raw data can take a very, very long time. For this reason, you will not be required to calculate the SS values by hand, but you should still take the time to understand how they fit together and what each one represents to ensure you understand the analysis itself.
All of our sources of variability fit together in meaningful, interpretable ways as we saw above, and the easiest way to do this is to organize them into a table. The ANOVA table, shown in Table 1, is how we calculate our test statistic.
Source | SS | df | MS | F |
Between | SS | k-1 |
|
|
Within | SS | N-k |
|
|
Total | SS | N-1 | (MS is variance) |
|
Table 1. ANOVA table.
The first column of the ANOVA table, labeled “Source”, indicates which of our sources of variability we are using: between groups, within groups, or total. The second column, labeled “SS”, contains our values for the sums of squares that we learned to calculate above. As noted previously, calculating these by hand takes too long, and so the formulas are not presented in Table 1. However, remember that the Total is the sum of the other two, in case you are only given two SS values and need to calculate the third.
The next column in Table 1, labeled “df”, is our degrees of freedom. As with the sums of squares, there is a different df for each group, and the formulas are presented in the table. Notice that the total degrees of freedom, N – 1, is the same as it was for our regular variance. This matches the SS T formulation to again indicate that we are simply taking our familiar variance term and breaking it up into difference sources. Also remember that the capital N in the df calculations refers to the overall sample size, not a specific group sample size. Notice that the total row for degrees of freedom, just like for sums of squares, is just the Between and Within rows added together. If you take N – k + k – 1, then the “– k” and “+ k” portions will cancel out, and you are left with N – 1. This is a convenient way to quickly check your calculations.
The third column, labeled “MS”, is our Mean Squares for each source of variance. A “mean square” is just another way to say variability. Each mean square is calculated by dividing the sum of squares by its corresponding degrees of freedom. Notice that we do this for the Between row and the Within row, but not for the Total row. There are two reasons for this. First, our Total Mean Square would just be the variance in the full dataset (put together the formulas to see this for yourself), so it would not be new information. Second, the Mean Square values for Between and Within would not add up to equal the Mean Square Total because they are divided by different denominators. This is in contrast to the first two columns, where the Total row was both the conceptual total (i.e. the overall variance and degrees of freedom) and the literal total of the other two rows.
The final column in the ANOVA table (Table 1), labeled “F”, is our test statistic for ANOVA. The F statistic, just like a t – or z -statistic, is compared to a critical value to see whether we can reject for fail to reject a null hypothesis. Thus, although the calculations look different for ANOVA, we are still doing the same thing that we did in all of Unit 2. We are simply using a new type of data to test our hypotheses. We will see what these hypotheses look like shortly, but first, we must take a moment to address why we are doing our calculations this way.
We will typically work from having Sum of Squares calculated, but here are the basic formulas for the 3 types of Sum of Squares for the ANOVA:
While there are other ways to calculate the SSs, these are the formulas we can use for this class if needed.
You may be wondering why we do not just use another t -test to test our hypotheses about three or more groups the way we did in Unit 2. After all, we are still just looking at group mean differences. The reason is that our t -statistic formula can only handle up to two groups, one minus the other. With only two groups, we can move our population parameters for the group means around in our null hypothesis and still get the same interpretation: the means are equal, which can also be concluded if one mean minus the other mean is equal to zero. However, if we tried adding a third mean, we would no longer be able to do this. So, in order to use t – tests to compare three or more means, we would have to run a series of individual group comparisons.
For only three groups, we would have three t -tests: group 1 vs group 2, group 1 vs group 3, and group 2 vs group 3. This may not sound like a lot, especially with the advances in technology that have made running an analysis very fast, but it quickly scales up. With just one additional group, bringing our total to four, we would have six comparisons: group 1 vs group 2, group 1 vs group 3, group 1 vs group 4, group 2 vs group 3, group 2 vs group 4, and group 3 vs group 4. This makes for a logistical and computation nightmare for five or more groups. When we reject the null hypothesis in a one-way ANOVA, we conclude that the group means are not all the same in the population. But this can indicate different things. With three groups, it can indicate that all three means are significantly different from each other. Or it can indicate that one of the means is significantly different from the other two, but the other two are not significantly different from each other. For this reason, statistically significant one-way ANOVA results are typically followed up with a series of post hoc comparisons of selected pairs of group means to determine which are different from which others.
A bigger issue, however, is our probability of committing a Type I Error. Remember that a Type I error is a false positive, and the chance of committing a Type I error is equal to our significance level, α. This is true if we are only running a single analysis (such as a t -test with only two groups) on a single dataset.
However, when we start running multiple analyses on the same dataset, our Type I error rate increases, raising the probability that we are capitalizing on random chance and rejecting a null hypothesis when we should not. ANOVA, by comparing all groups simultaneously with a single analysis, averts this issue and keeps our error rate at the α we set.
So far we have seen what ANOVA is used for, why we use it, and how we use it. Now we can turn to the formal hypotheses we will be testing. As with before, we have a null and an alternative hypothesis to lay out. Our null hypothesis is still the idea of “no difference” in our data. Because we have multiple group means, we simply list them out as equal to each other:
H 0 : There is no difference in the group means. H0: µ1 = µ2 = µ3
We list as many μ parameters as groups we have. In the example above, we have three groups to test (k = 3), so we have three parameters in our null hypothesis. If we had more groups, say, four, we would simply add another μ to the list and give it the appropriate subscript, giving us: H0: µ1 = µ2 = µ3 = µ4. Notice that we do not say that the means are all equal to zero, we only say that they are equal to one another; it does not matter what the actual value is, so long as it holds for all groups equally.
Our alternative hypothesis for ANOVA is a little bit different. Let’s take a look at it and then dive deeper into what it means:
H A : At least 1 mean is different
The first difference in obvious: there is no mathematical statement of the alternative hypothesis in ANOVA. This is due to the second difference: we are not saying which group is going to be different, only that at least one will be. Because we do not hypothesize about which mean will be different, there is no way to write it mathematically. Related to this, we do not have directional hypotheses (greater than or less than) like we did with the z-statistic and t- statistics. Due to this, our alternative hypothesis is always exactly the same: at least one mean is different.
With t-tests, we saw that, if we reject the null hypothesis, we can adopt the alternative, and this made it easy to understand what the differences looked like. In ANOVA, we will still adopt the alternative hypothesis as the best explanation of our data if we reject the null hypothesis. However, when we look at the alternative hypothesis, we can see that it does not give us much information. We will know that a difference exists somewhere, but we will not know where that difference is. The ANOVA is an ominous test meaning you just know there are differences. More specifically, at least 1 group is different from the rest. Is only group 1 different but groups 2 and 3 the same? Is it only group 2? Are all three of them different? Based on just our alternative hypothesis, there is no way to be sure. We will come back to this issue later and see how to find out specific differences. For now, just remember that we are testing for any difference in group means, and it does not matter where that difference occurs. Now that we have our hypotheses for ANOVA, let’s work through an example. We will continue to use the data from Figures 1 through 3 for continuity.
Our data come from three groups of 10 people each, all of whom applied for a single job opening: those with no college degree, those with a college degree that is not related to the job opening, and those with a college degree from a relevant field. We want to know if we can use this group membership to account for our observed variability and, by doing so, test if there is a difference between our three group means (k = 3). We will follow the same steps for hypothesis testing as we did in previous chapters. Let’s start, as always, with our hypotheses.
Step 1: State the Hypotheses
Our hypotheses are concerned with the means of groups based on education level, so:
H 0 : There is no difference between educational levels. H0: µ1 = µ2 = µ3
H A : At least 1 educational level is different.
Again, we phrase our null hypothesis in terms of what we are actually looking for, and we use a number of population parameters equal to our number of groups. Our alternative hypothesis is always exactly the same.
Step 2: Find the Critical Value
Our test statistic for ANOVA, as we saw above, is F . Because we are using a new test statistic, we will get a new table: the F distribution table, the top of which is shown in Figure 4:
Figure 4. F distribution table.
The F table only displays critical values for α = 0.05. This is because other significance levels are uncommon and so it is not worth it to use up the space to present them. There are now two degrees of freedom we must use to find our critical value: Numerator and Denominator. These correspond to the numerator and denominator of our test statistic, which, if you look at the ANOVA table presented earlier, are our Between Groups and Within Groups rows, respectively. The df B is the “Degrees of Freedom: Numerator” because it is the degrees of freedom value used to calculate the Mean Square Between, which in turn was the numerator of our F statistic. Likewise, the df W is the “df denom.” (short for denominator) because it is the degrees of freedom value used to calculate the Mean Square Within, which was our denominator for F .
The formula for df B is k – 1, and remember that k is the number of groups we are assessing. In this example, k = 3 so our df B = 2. This tells us that we will use the second column, the one labeled 2, to find our critical value. To find the proper row, we simply calculate the df W , which was N – k. The original prompt told us that we have “three groups of 10 people each,” so our total sample size is 30. This makes our value for df W = 27. If we follow the second column down to the row for 27, we find that our critical value is 3.35. We use this critical value the same way as we did before: it is our criterion against which we will compare our obtained test statistic to determine statistical significance.
Step 3: Calculate the Test Statistic
Now that we have our hypotheses and the criterion we will use to test them, we can calculate our test statistic. To do this, we will fill in the ANOVA table. When we do so, we will work our way from left to right, filling in each cell to get our final answer. Here will be are basic steps for calculating ANOVA:
We will assume that we are given the SS values as shown below:
Source | SS | df | MS | F |
Between | 8246 |
|
|
|
Within | 3020 |
|
|
|
Total |
|
|
|
|
These may seem like random numbers, but remember that they are based on the distances between the groups themselves and within each group. Figure 5 shows the plot of the data with the group means and grand mean included. If we wanted to, we could use this information, combined with our earlier information that each group has 10 people, to calculate the Between Groups Sum of Squares by hand.
However, doing so would take some time, and without the specific values of the data points, we would not be able to calculate our Within Groups Sum of Squares, so we will trust that these values are the correct ones.
Figure 5. Means
We were given the sums of squares values for our first two rows, so we can use those to calculate the Total Sum of Squares.
Source | SS | df | MS | F |
Between | 8246 |
|
|
|
Within | 3020 |
|
|
|
Total | 8246+3020=11266 |
|
|
|
We also calculated our degrees of freedom earlier, so we can fill in those values. Additionally, we know that the total degrees of freedom is N – 1, which is 29. This value of 29 is also the sum of the other two degrees of freedom, so everything checks out.
Source | SS | df | MS | F |
Between | 8246 | 3-1=2 |
|
|
Within | 3020 | 29-2=27 |
|
|
Total | 11266 | 30-1=29 |
|
|
Now we have everything we need to calculate our mean squares. Our MS values for each row are just the SS divided by the df for that row, giving us:
Source | SS | df | MS | F |
Between | 8246 | 2 | 8246/2 = 4123 |
|
Within | 3020 | 27 | 3020/27 =111.85 |
|
Total | 11266 | 29 |
|
|
Remember that we do not calculate a Total Mean Square, so we leave that cell blank. Finally, we have the information we need to calculate our test statistic. F is our MS B divided by MS W .
Source | SS | df | MS | F |
Between | 8246 | 2 | 4123 | 36.86 |
Within | 3020 | 27 | 111.85 |
|
Total | 11266 | 29 |
|
So, working our way through the table given only two SS values and the sample size and group size given before, we calculate our test statistic to be F obt = 36.86, which we will compare to the critical value in step 4.
Step 4: Make a decision
Our obtained test statistic was calculated to be F obt = 36.86 and our critical value was found to be F * = 3.35. Our obtained statistic is larger than our critical value, so we can reject the null hypothesis.
Reject H0. Based on our 3 groups of 10 people, we can conclude that job test scores are statistically significantly different based on education level, F (2,27) = 36.86, p < .05.
Notice that when we report F , we include both degrees of freedom. We always report the numerator then the denominator, separated by a comma. We must also note that, because we were only testing for any difference, we cannot yet conclude which groups are different from the others. We will do so shortly, but first, because we found a statistically significant result, we need to calculate an effect size to see how big of an effect we found.
Recall that the purpose of ANOVA is to take observed variability and see if we can explain those differences based on group membership. To that end, our effect size will be just that: the variance explained. You can think of variance explained as the proportion or percent of the differences we are able to account for based on our groups. We know that the overall observed differences are quantified as the Total Sum of Squares, and that our observed effect of group membership is the Between Groups Sum of Squares. Our effect size, therefore, is the ratio of these to sums of squares.
Eta-square is reported as percentage of variance of the outcome/dependent variable explained by the predictor/independent variable.
Although you report variance explained by the predictor/independent variable, you can also use the 𝜂2 guidelines for effect size:
𝜂2 | Size |
0.01 | Small |
0.09 | Medium |
0.25 | Large |
Note: if less than .01, no effect is reported |
Example continued adding on effect size for scores on job application tests
For our example, SS B =8246 and SS T = 11266, our values give an effect size, 𝜂2, of:
So, we are able to explain 73% of the variance in job test scores based on education. This is, in fact, a huge effect size, and most of the time we will not explain nearly that much variance.
So, we found that not only do we have a statistically significant result, but that our observed effect was very large! However, we still do not know specifically which groups are different from each other. It could be that they are all different, or that only those who have a relevant degree are different from the others, or that only those who have no degree are different from the others. To find out which is true, we need to do a special analysis called a post hoc test.
A post hoc test is used only after we find a statistically significant result and need to determine where our differences truly came from. The term “post hoc” comes from the Latin for “after the event”. There are many different post hoc tests that have been developed, and most of them will give us similar answers.
A Bonferroni test is perhaps the simplest post hoc analysis. A Bonferroni test is a series of t -tests performed on each pair of groups. As we discussed earlier, the number of groups quickly grows the number of comparisons, which inflates Type I error rates. To avoid this, a Bonferroni test divides our significance level α by the number of comparisons we are making so that when they are all run, they sum back up to our original Type I error rate. Once we have our new significance level, we simply run independent samples t -tests to look for difference between our pairs of groups. This adjustment is sometimes called a Bonferroni Correction, and it is easy to do by hand if we want to compare obtained p -values to our new corrected α level, but it is more difficult to do when using critical values like we do for our analyses so we will leave our discussion of it to that.
Tukey’s Honest Significant Difference (HSD) is a very popular post hoc analysis. This analysis, like Bonferroni’s, makes adjustments based on the number of comparisons, but it makes adjustments to the test statistic when running the comparisons of two groups. These comparisons give us an estimate of the difference between the groups and a confidence interval for the estimate. We use this confidence interval in the same way that we use a confidence interval for a regular independent samples t -test: if it contains 0.00, the groups are not different, but if it does not contain 0.00 then the groups are different.
Example continued adding on post hoc for scores on job application tests: Tukey
Remember we are comparing scores from those whom applied for a single job opening: those with no college degree (none), those with a college degree that is not related to the job opening (unrelated), and those with a college degree from a relevant field (relevant).
Below are the differences between the group means and the Tukey’s HSD confidence intervals for the differences:
Comparison | Difference | Tukey’s HSD CI |
None vs Relevant | 40.60 | (28.87, 52.33) |
None vs Unrelated | 19.50 | (7.77, 31.23) |
Relevant vs Unrelated | 21.10 | (9.37, 32.83) |
As we can see, none of these intervals contain 0.00, so we can conclude that all three groups are different from one another.
Another common post hoc test is Scheffe’s Test. Like Tukey’s HSD, Scheffe’s test adjusts the test statistic for how many comparisons are made, but it does so in a slightly different way. The result is a test that is “conservative,” which means that it is less likely to commit a Type I Error, but this comes at the cost of less power to detect effects. We can see this by looking at the confidence intervals that Scheffe’s test gives us for our example.
Example continued adding on post hoc for scores on job application tests: Scheffe
Below are the differences between the group means and the Sheffe confidence intervals for the differences:
Comparison | Difference | Scheffe’s CI |
None vs Relevant | 40.60 | (28.35, 52.85) |
None vs Unrelated | 19.50 | (7.25, 31.75) |
Relevant vs Unrelated | 21.10 | (8.85, 33.35) |
As we can see, these are slightly wider than the intervals we got from Tukey’s HSD. This means that, all other things being equal, they are more likely to contain zero. In our case, however, the results are the same, and we again conclude that all three groups differ from one another.
There are many more post hoc tests than just these three, and they all approach the task in different ways, with some being more conservative and others being more powerful. In general, though, they will give highly similar answers. What is important here is to be able to interpret a post hoc analysis. If you are given post hoc analysis confidence intervals, like the ones seen above, read them the same way we read confidence intervals previously comparing two groups: if they contain zero, there is no difference; if they do not contain zero, there is a difference.
We have only just scratched the surface on ANOVA in this chapter. There are many other variations available for the one-way ANOVA presented here. There are also other types of ANOVAs that you are likely to encounter. The first is called a factorial ANOVA . Factorial ANOVAs use multiple grouping variables, not just one, to look for group mean differences. Just as there is no limit to the number of groups in a one-way ANOVA, there is no limit to the number of grouping variables in a Factorial ANOVA, but it becomes very difficult to find and interpret significant results with many factors, so usually they are limited to two or three grouping variables with only a small number of groups in each. Another ANOVA is called a Repeated Measures ANOVA. This is an extension of a repeated measures or matched pairs t -test, but in this case we are measuring each person three or more times to look for a change. We can even combine both of these advanced ANOVAs into mixed designs to test very specific and valuable questions. These topics are far beyond the scope of this text, but you should know about their existence. Our treatment of ANOVA here is a small first step into a much larger world!
Having read the chapter, students should be able to:
Source | SS | df | MS | F |
Between | 60.72 | 3 | 20.24 | 3.88 |
Within | 213.61 | 41 | 5.21 |
|
Total | 274.33 | 44 |
|
|
5. Finish filling out the following ANOVA tables:
Problem 1: N = 14
Source | SS | df | MS | F |
Between |
| 2 | 14.10 |
|
Within |
|
|
|
|
Total | 64.65 |
|
|
|
Source | SS | df | MS | F |
Between |
| 2 |
| 42.36 |
Within |
| 54 | 2.48 |
|
Total |
|
|
|
|
6. You know that stores tend to charge different prices for similar or identical products, and you want to test whether or not these differences are, on average, statistically significantly different. You go online and collect data from 3 different stores, gathering information on 15 products at each store. You find that the average prices at each store are: Store 1 M = $27.82, Store 2 M= $38.96, and Store 3 M = $24.53. Based on the overall variability in the products and the variability within each store, you find the following values for the Sums of Squares: SST = 683.22, SSW = 441.19. Complete the ANOVA table and use the 4 step hypothesis testing procedure to see if there are systematic price differences between the stores.
7. You and your friend are debating which type of candy is the best. You find data on the average rating for hard candy (e.g. jolly ranchers, ̅X = 3.60), chewable candy (e.g. starburst, ̅X = 4.20), and chocolate (e.g. snickers, ̅X = 4.40); each type of candy was rated by 30 people. Test for differences in average candy rating using SSB = 16.18 and SSW = 28.74.
8. Administrators at a university want to know if students in different majors are more or less extroverted than others. They provide you with data they have for English majors (̅X = 3.78, n = 45), History majors (̅X = 2.23, n = 40), Psychology majors (̅X = 4.41, n = 51), and Math majors (̅X = 1.15, n = 28). You find the SSB = 75.80 and SSW = 47.40 and test at α = 0.05.
9. You are assigned to run a study comparing a new medication (̅X = 17.47, n = 19), an existing medication (̅X = 17.94, n = 18), and a placebo (̅X = 13.70, n= 20), with higher scores reflecting better outcomes. Use SSB = 210.10 and SSW = 133.90 to test for differences.
10. You are in charge of assessing different training methods for effectiveness. You have data on 4 methods: Method 1 (̅X = 87, n = 12), Method 2 (̅X = 92, n = 14), Method 3 (̅X = 88, n = 15), and Method 4 (̅X = 75, n = 11). Test for differences among these means, assuming SSB = 64.81 and SST = 399.45.
1. Variance between groups (SSB), variance within groups (SSW) and total variance (SST).
3. Post hoc tests are run if we reject the null hypothesis in ANOVA; they tell us which specific group differences are significant. 5. Finish filling out the following ANOVA tables:
Source | SS | df | MS | F |
Between | 28.20 | 2 | 14.10 | 4.26 |
Within | 36.45 | 11 | 3.31 |
|
Total | 64.65 | 13 |
|
|
Source | SS | df | MS | F |
Between | 210.10 | 2 | 105.05 | 42.36 |
Within | 133.92 | 54 | 2.48 |
|
Total | 344.02 |
|
|
|
7. Step 1: H 0 : μ 1 = μ 2 = μ 3 “There is no difference in average rating of candy quality”, H A : “At least one mean is different.”
Step 3: based on the given SSB and SSW and the computed df from step 2, is:
Source | SS | df | MS | F |
Between | 16.18 | 2 | 8.09 | 24.52 |
Within | 28.74 | 87 | 0.33 |
|
Total | 44.92 | 89 |
|
|
Step 4: F > F*, reject H 0 . Based on the data in our 3 groups, we can say that there is a statistically significant difference in the quality of different types of candy, F(2,87) = 24.52, p < .05. Since the result is significant, we need an effect size: η 2 = 16.18/44.92 = .36, which is a large effect.
Source | SS | df | MS | F |
Between | 210.10 | 2 | 105.02 | 42.36 |
Within | 133.90 | 54 | 2.48 |
|
Total | 344.00 | 56 |
|
|
Step 4: F > F*, reject H 0 . Based on the data in our 3 groups, we can say that there is a statistically significant difference in the effectiveness of the treatments, F(2,54) = 42.36, p < .05. Since the result is significant, we need an effect size: η 2 = 210.10/344.00 = .61, which is a large effect.
Introduction to Statistics for Psychology Copyright © 2021 by Alisa Beyer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
Previously, we have tested hypotheses about two population means. This chapter examines methods for comparing more than two means. Analysis of variance (ANOVA) is an inferential method used to test the equality of three or more population means.
H 0 : µ 1 = µ 2 = µ 3 = …=µ k
This method is also referred to as single-factor ANOVA because we use a single property, or characteristic, for categorizing the populations. This characteristic is sometimes referred to as a treatment or factor.
A treatment (or factor) is a property, or characteristic, that allows us to distinguish the different populations from one another.
The objects of ANOVA are (1) estimate treatment means, and the differences of treatment means; (2) test hypotheses for statistical significance of comparisons of treatment means, where “treatment” or “factor” is the characteristic that distinguishes the populations.
For example, a biologist might compare the effect that three different herbicides may have on seed production of an invasive species in a forest environment. The biologist would want to estimate the mean annual seed production under the three different treatments, while also testing to see which treatment results in the lowest annual seed production. The null and alternative hypotheses are:
H0: µ1= µ2= µ3 | H1: at least one of the means is significantly different from the others |
It would be tempting to test this null hypothesis H 0 : µ 1 = µ 2 = µ 3 by comparing the population means two at a time. If we continue this way, we would need to test three different pairs of hypotheses:
H0: µ1= µ2 | AND | H0: µ1= µ3 | AND | H0: µ2= µ3 |
H1: µ1≠ µ2 | H1: µ1≠ µ3 | H1: µ2≠ µ3 |
If we used a 5% level of significance, each test would have a probability of a Type I error (rejecting the null hypothesis when it is true) of α = 0.05. Each test would have a 95% probability of correctly not rejecting the null hypothesis. The probability that all three tests correctly do not reject the null hypothesis is 0.95 3 = 0.86. There is a 1 – 0.95 3 = 0.14 (14%) probability that at least one test will lead to an incorrect rejection of the null hypothesis. A 14% probability of a Type I error is much higher than the desired alpha of 5% (remember: α is the same as Type I error). As the number of populations increases, the probability of making a Type I error using multiple t-tests also increases. Analysis of variance allows us to test the null hypothesis (all means are equal) against the alternative hypothesis (at least one mean is different) with a specified value of α .
In the previous chapter, we used a two-sample t-test to compare the means from two independent samples with a common variance. The sample data are used to compute the test statistic:
is the pooled estimate of the common population variance σ 2 . To test more than two populations, we must extend this idea of pooled variance to include all samples as shown below:
where S w 2 represents the pooled estimate of the common variance σ 2 , and it measures the variability of the observations within the different populations whether or not H 0 is true . This is often referred to as the variance within samples (variation due to error).
If the null hypothesis IS true (all the means are equal), then all the populations are the same, with a common mean μ and variance σ 2 . Instead of randomly selecting different samples from different populations, we are actually drawing k different samples from one population. We know that the sampling distribution for k means based on n observations will have mean μ x̄ and variance σ 2 /n (squared standard error). Since we have drawn k samples of n observations each, we can estimate the variance of the k sample means ( σ 2 /n) by
Consequently, n times the sample variance of the means estimates σ 2 . We designate this quantity as S B 2 such that
where S B 2 is also an unbiased estimate of the common variance σ 2 , IF H 0 IS TRUE. This is often referred to as the variance between samples (variation due to treatment).
Under the null hypothesis that all k populations are identical, we have two estimates of σ 2 (S W 2 and S B 2 ). We can use the ratio of S B 2 / S W 2 as a test statistic to test the null hypothesis that H 0 : µ 1 = µ 2 = µ 3 = …= µ k , which follows an F-distribution with degrees of freedom df 1 = k – 1 and df 2 = N – k (where k is the number of populations and N is the total number of observations (N = n 1 + n 2 +…+ n k ). The numerator of the test statistic measures the variation between sample means. The estimate of the variance in the denominator depends only on the sample variances and is not affected by the differences among the sample means.
When the null hypothesis is true, the ratio of S B 2 and S W 2 will be close to 1. When the null hypothesis is false, S B 2 will tend to be larger than S W 2 due to the differences among the populations. We will reject the null hypothesis if the F test statistic is larger than the F critical value at a given level of significance (or if the p-value is less than the level of significance).
Tables are a convenient format for summarizing the key results in ANOVA calculations. The following one-way ANOVA table illustrates the required computations and the relationships between the various ANOVA table elements.
The sum of squares for the ANOVA table has the relationship of SSTo = SSTr + SSE where:
Total variation (SSTo) = explained variation (SSTr) + unexplained variation (SSE)
The degrees of freedom also have a similar relationship: df (SSTo) = df (SSTr) + df (SSE)
The Mean Sum of Squares for the treatment and error are found by dividing the Sums of Squares by the degrees of freedom for each. While the Sums of Squares are additive, the Mean Sums of Squares are not. The F-statistic is then found by dividing the Mean Sum of Squares for the treatment (MSTr) by the Mean Sum of Squares for the error(MSE). The MSTr is the S B 2 and the MSE is the S W 2 .
F = S B 2 / S w 2 = MSTr/MSE
An environmentalist wanted to determine if the mean acidity of rain differed among Alaska, Florida, and Texas. He randomly selected six rain dates at each site obtained the following data:
H 0 : μ A = μ F = μ T H 1 : at least one of the means is different
State | Sample size | Sample total | Sample mean | Sample variance |
Alaska | n1 = 6 | 30.2 | 5.033 | 0.0265 |
Florida | n2 = 6 | 27.1 | 4.517 | 0.1193 |
Texas | n3 = 6 | 33.22 | 5.537 | 0.1575 |
Table 3. Summary Table.
Notice that there are differences among the sample means. Are the differences small enough to be explained solely by sampling variability? Or are they of sufficient magnitude so that a more reasonable explanation is that the μ ’s are not all equal? The conclusion depends on how much variation among the sample means (based on their deviations from the grand mean) compares to the variation within the three samples.
The grand mean is equal to the sum of all observations divided by the total sample size:
SSTo = (5.11-5.0289) 2 + (5.01-5.0289) 2 +…+(5.24-5.0289) 2
+ (4.87-5.0289) 2 + (4.18-5.0289) 2 +…+(4.09-5.0289) 2
+ (5.46-5.0289) 2 + (6.29-5.0289) 2 +…+(5.30-5.0289) 2 = 4.6384
SSTr = 6(5.033-5.0289) 2 + 6(4.517-5.0289) 2 + 6(5.537-5.0289) 2 = 3.1214
SSE = SSTo – SSTr = 4.6384 – 3.1214 = 1.5170
This test is based on df 1 = k – 1 = 2 and df 2 = N – k = 15. For α = 0.05, the F critical value is 3.68. Since the observed F = 15.4372 is greater than the F critical value of 3.68, we reject the null hypothesis. There is enough evidence to state that at least one of the means is different.
Source | DF | SS | MS | F | P |
State | 2 | 3.121 | 1.561 | 15.43 | 0.000 |
Error | 15 | 1.517 | 0.101 | ||
Total | 17 4.638 | ||||
S = 0.3180 R-Sq = 67.29% R-Sq(adj) = 62.93% |
Individual 95% CIs For Mean Based on Pooled StDev | ||||||||
Level | N | Mean | StDev | —-+———+———+———+—– | ||||
Alaska | 6 | 5.0333 | 0.1629 | (——*——) | ||||
Florida | 6 | 4.5167 | 0.3455 | (——*——) | ||||
Texas | 6 | 5.5367 | 0.3969 | (——*——) | ||||
—-+———+———+———+—– | ||||||||
4.40 | 4.80 | 5.20 | 5.60 | |||||
Pooled StDev = 0.3180 |
The p-value (0.000) is less than the level of significance (0.05) so we will reject the null hypothesis.
| ||||
Groups | Count | Sum | Average | Variance |
Column 1 | 6 | 30.2 | 5.033333 | 0.026547 |
Column 2 | 6 | 27.1 | 4.516667 | 0.119347 |
Column 3 | 6 | 33.22 | 5.536667 | 0.157507 |
| ||||||
Source of Variation | SS | df | MS | F | p-value | F crit |
Between Groups | 3.121378 | 2 | 1.560689 | 15.43199 | 0.000229 | 3.68232 |
Within Groups | 1.517 | 15 | 0.101133 | |||
Total | 4.638378 | 17 |
The p-value (0.000229) is less than alpha (0.05) so we reject the null hypothesis. There is enough evidence to support the claim that at least one of the means is different.
Once we have rejected the null hypothesis and found that at least one of the treatment means is different, the next step is to identify those differences. There are two approaches that can be used to answer this type of question: contrasts and multiple comparisons.
Contrasts can be used only when there are clear expectations BEFORE starting an experiment, and these are reflected in the experimental design. Contrasts are planned comparisons . For example, mule deer are treated with drug A, drug B, or a placebo to treat an infection. The three treatments are not symmetrical. The placebo is meant to provide a baseline against which the other drugs can be compared. Contrasts are more powerful than multiple comparisons because they are more specific. They are more able to pick up a significant difference. Contrasts are not always readily available in statistical software packages (when they are, you often need to assign the coefficients), or may be limited to comparing each sample to a control.
Multiple comparisons should be used when there are no justified expectations. They are aposteriori , pair-wise tests of significance. For example, we compare the gas mileage for six brands of all-terrain vehicles. We have no prior knowledge to expect any vehicle to perform differently from the rest. Pair-wise comparisons should be performed here, but only if an ANOVA test on all six vehicles rejected the null hypothesis first.
It is NOT appropriate to use a contrast test when suggested comparisons appear only after the data have been collected. We are going to focus on multiple comparisons instead of planned contrasts.
When the null hypothesis is rejected by the F-test, we believe that there are significant differences among the k population means. So, which ones are different? Multiple comparison method is the way to identify which of the means are different while controlling the experiment-wise error (the accumulated risk associated with a family of comparisons). There are many multiple comparison methods available.
In The Least Significant Difference Test , each individual hypothesis is tested with the student t-statistic. When the Type I error probability is set at some value and the variance s 2 has v degrees of freedom, the null hypothesis is rejected for any observed value such that |t o |>t α/2 , v. It is an abbreviated version of conducting all possible pair-wise t-tests. This method has weak experiment-wise error rate. Fisher’s Protected LSD is somewhat better at controlling this problem.
Bonferroni inequality is a conservative alternative when software is not available. When conducting n comparisons, α e ≤ n α c therefore α c = α e /n. In other words, divide the experiment-wise level of significance by the number of multiple comparisons to get the comparison-wise level of significance. The Bonferroni procedure is based on computing confidence intervals for the differences between each possible pair of μ ’s. The critical value for the confidence intervals comes from a table with (N – k ) degrees of freedom and k ( k – 1)/2 number of intervals. If a particular interval does not contain zero, the two means are declared to be significantly different from one another. An interval that contains zero indicates that the two means are NOT significantly different.
Scheffe’s test is also a conservative method for all possible simultaneous comparisons suggested by the data. This test equates the F statistic of ANOVA with the t-test statistic. Since t 2 = F then t = √F, we can substitute √F( α e , v 1 , v 2 ) for t( α e , v 2 ) for Scheffe’s statistic.
Tukey’s test provides a strong sense of experiment-wise error rate for all pair-wise comparison of treatment means. This test is also known as the Honestly Significant Difference . This test orders the treatments from smallest to largest and uses the studentized range statistic
The absolute difference of the two means is used because the location of the two means in the calculated difference is arbitrary, with the sign of the difference depending on which mean is used first. For unequal replications, the Tukey-Kramer approximation is used instead.
Student-Newman-Keuls (SNK) test is a multiple range test based on the studentized range statistic like Tukey’s. The critical value is based on a particular pair of means being tested within the entire set of ordered means. Two or more ranges among means are used for test criteria. While it is similar to Tukey’s in terms of a test statistic, it has weak experiment-wise error rates.
Bonferroni, Dunnett’s, and Scheffe’s tests are the most conservative, meaning that the difference between the two means must be greater before concluding a significant difference. The LSD and SNK tests are the least conservative. Tukey’s test is in the middle. Robert Kuehl, author of Design of Experiments: Statistical Principles of Research Design and Analysis (2000), states that the Tukey method provides the best protection against decision errors, along with a strong inference about magnitude and direction of differences.
Let’s go back to our question on mean rain acidity in Alaska, Florida, and Texas. The null and alternative hypotheses were as follows:
H : μA = μF = μT | H : at least one of the means is different |
The p-value for the F-test was 0.000229, which is less than our 5% level of significance. We rejected the null hypothesis and had enough evidence to support the claim that at least one of the means was significantly different from another. We will use Bonferroni and Tukey’s methods for multiple comparisons in order to determine which mean(s) is different.
A Bonferroni confidence interval is computed for each pair-wise comparison. For k populations, there will be k ( k -1)/2 multiple comparisons. The confidence interval takes the form of:
Where MSE is from the analysis of variance table and the Bonferroni t critical value comes from the Bonferroni Table given below. The Bonferroni t critical value, instead of the student t critical value, combined with the use of the MSE is used to achieve a simultaneous confidence level of at least 95% for all intervals computed. The two means are judged to be significantly different if the corresponding interval does not include zero.
For this problem, k = 3 so there are k ( k – 1)/2= 3(3 – 1)/2 = 3 multiple comparisons. The degrees of freedom are equal to N – k = 18 – 3 = 15. The Bonferroni critical value is 2.69.
The first confidence interval contains all positive values. This tells you that there is a significant difference between the two means and that the mean rain pH for Alaska is significantly greater than the mean rain pH for Florida.
The second confidence interval contains all negative values. This tells you that there is a significant difference between the two means and that the mean rain pH of Alaska is significantly lower than the mean rain pH of Texas.
The third confidence interval also contains all negative values. This tells you that there is a significant difference between the two means and that the mean rain pH of Florida is significantly lower than the mean rain pH of Texas.
All three states have significantly different levels of rain pH. Texas has the highest rain pH, then Alaska followed by Florida, which has the lowest mean rain pH level. You can use the confidence intervals to estimate the mean difference between the states. For example, the average rain pH in Texas ranges from 0.5262 to 1.5138 higher than the average rain pH in Florida.
Now let’s use the Tukey method for multiple comparisons. We are going to let software compute the values for us. Excel doesn’t do multiple comparisons so we are going to rely on Minitab output.
Source | DF | SS | MS | F | P |
state | 2 | 3.121 | 1.561 | 15.4 | 0.000 |
Error | 15 | 1.517 | 0.101 | ||
Total | 17 | 4.638 | |||
S = 0.3180 | R-Sq = 67.29% | R-Sq(adj) = 62.93% |
We have seen this part of the output before. We now want to focus on the Grouping Information Using Tukey Method. All three states have different letters indicating that the mean rain pH for each state is significantly different. They are also listed from highest to lowest. It is easy to see that Texas has the highest mean rain pH while Florida has the lowest.
state | N | Mean | Grouping |
Texas | 6 | 5.5367 | A |
Alaska | 6 | 5.0333 | B |
Florida | 6 | 4.516 | C |
Means that do not share a letter are significantly different. |
This next set of confidence intervals is similar to the Bonferroni confidence intervals. They estimate the difference of each pair of means. The individual confidence interval level is set at 97.97% instead of 95% thus controlling the experiment-wise error rate.
Tukey 95% Simultaneous Confidence Intervals |
All Pairwise Comparisons among Levels of state |
Individual confidence level = |
state = Alaska subtracted from: | |||||||
state | Lower | Center | Upper | ———+———+———+———+ | |||
Florida | -0.9931 | -0.5167 | -0.0402 | (—–*—-) | |||
Texas | 0.0269 | 0.5033 | 0.9798 | (—–*—–) | |||
———+———+———+———+ | |||||||
-0.80 | 0.00 | 0.80 | 1.60 |
state = Florida subtracted from: | |||||||
state | Lower | Center | Upper | ———+———+———+———+ | |||
Texas | 0.5435 | 1.0200 | 1.4965 | (—–*—–) | |||
———+———+———+———+ | |||||||
-0.80 | 0.00 | 0.80 | 1.60 |
The first pairing is Florida – Alaska, which results in an interval of (-0.9931, -0.0402). The interval has all negative values indicating that Florida is significantly lower than Alaska. The second pairing is Texas – Alaska, which results in an interval of (0.0269, 0.9798). The interval has all positive values indicating that Texas is greater than Alaska. The third pairing is Texas – Florida, which results in an interval from (0.5435, 1.4965). All positive values indicate that Texas is greater than Florida.
The intervals are similar to the Bonferroni intervals with differences in width due to methods used. In both cases, the same conclusions are reached.
When we use one-way ANOVA and conclude that the differences among the means are significant, we can’t be absolutely sure that the given factor is responsible for the differences. It is possible that the variation of some other unknown factor is responsible. One way to reduce the effect of extraneous factors is to design an experiment so that it has a completely randomized design. This means that each element has an equal probability of receiving any treatment or belonging to any different group. In general good results require that the experiment be carefully designed and executed.
Additional example:
Natural Resources Biometrics Copyright © 2014 by Diane Kiernan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
Published on May 6, 2022 by Shaun Turney . Revised on June 22, 2023.
The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :
Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, similarities and differences between null and alternative hypotheses, how to write null and alternative hypotheses, other interesting articles, frequently asked questions.
The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”:
The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample. It’s critical for your research to write strong hypotheses .
You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.
Professional editors proofread and edit your paper by focusing on:
See an example
The null hypothesis is the claim that there’s no effect in the population.
If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.
Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept . Be careful not to say you “prove” or “accept” the null hypothesis.
Null hypotheses often include phrases such as “no effect,” “no difference,” or “no relationship.” When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).
You can never know with complete certainty whether there is an effect in the population. Some percentage of the time, your inference about the population will be incorrect. When you incorrectly reject the null hypothesis, it’s called a type I error . When you incorrectly fail to reject it, it’s a type II error.
The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.
( ) | ||
Does tooth flossing affect the number of cavities? | Tooth flossing has on the number of cavities. | test: The mean number of cavities per person does not differ between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ = µ . |
Does the amount of text highlighted in the textbook affect exam scores? | The amount of text highlighted in the textbook has on exam scores. | : There is no relationship between the amount of text highlighted and exam scores in the population; β = 0. |
Does daily meditation decrease the incidence of depression? | Daily meditation the incidence of depression.* | test: The proportion of people with depression in the daily-meditation group ( ) is greater than or equal to the no-meditation group ( ) in the population; ≥ . |
*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .
The alternative hypothesis ( H a ) is the other answer to your research question . It claims that there’s an effect in the population.
Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.
The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.
Alternative hypotheses often include phrases such as “an effect,” “a difference,” or “a relationship.” When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes < or >). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.
The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.
Does tooth flossing affect the number of cavities? | Tooth flossing has an on the number of cavities. | test: The mean number of cavities per person differs between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ ≠ µ . |
Does the amount of text highlighted in a textbook affect exam scores? | The amount of text highlighted in the textbook has an on exam scores. | : There is a relationship between the amount of text highlighted and exam scores in the population; β ≠ 0. |
Does daily meditation decrease the incidence of depression? | Daily meditation the incidence of depression. | test: The proportion of people with depression in the daily-meditation group ( ) is less than the no-meditation group ( ) in the population; < . |
Null and alternative hypotheses are similar in some ways:
However, there are important differences between the two types of hypotheses, summarized in the following table.
A claim that there is in the population. | A claim that there is in the population. | |
| ||
Equality symbol (=, ≥, or ≤) | Inequality symbol (≠, <, or >) | |
Rejected | Supported | |
Failed to reject | Not supported |
To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.
The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:
Does independent variable affect dependent variable ?
Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.
( ) | ||
test
with two groups | The mean dependent variable does not differ between group 1 (µ ) and group 2 (µ ) in the population; µ = µ . | The mean dependent variable differs between group 1 (µ ) and group 2 (µ ) in the population; µ ≠ µ . |
with three groups | The mean dependent variable does not differ between group 1 (µ ), group 2 (µ ), and group 3 (µ ) in the population; µ = µ = µ . | The mean dependent variable of group 1 (µ ), group 2 (µ ), and group 3 (µ ) are not all equal in the population. |
There is no correlation between independent variable and dependent variable in the population; ρ = 0. | There is a correlation between independent variable and dependent variable in the population; ρ ≠ 0. | |
There is no relationship between independent variable and dependent variable in the population; β = 0. | There is a relationship between independent variable and dependent variable in the population; β ≠ 0. | |
Two-proportions test | The dependent variable expressed as a proportion does not differ between group 1 ( ) and group 2 ( ) in the population; = . | The dependent variable expressed as a proportion differs between group 1 ( ) and group 2 ( ) in the population; ≠ . |
Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.
If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.
Methodology
Research bias
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.
Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.
The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).
The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).
A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (“ x affects y because …”).
A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses . In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Turney, S. (2023, June 22). Null & Alternative Hypotheses | Definitions, Templates & Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/statistics/null-and-alternative-hypotheses/
Other students also liked, inferential statistics | an easy introduction & examples, hypothesis testing | a step-by-step guide with easy examples, type i & type ii errors | differences, examples, visualizations, what is your plagiarism score.
IMAGES
VIDEO
COMMENTS
ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups. A one-way ANOVA uses one independent variable, while a two-way ANOVA uses two independent variables. As a crop researcher, you want to test the effect of three different fertilizer mixtures on crop yield.
A one-way ANOVA ("analysis of variance") compares the means of three or more independent groups to determine if there is a statistically significant difference between the corresponding population means. ... A one-way ANOVA uses the following null and alternative hypotheses: H 0 (null hypothesis): ...
Alternative Hypothesis (multi-sided): In the population, the mean post-test score is significantly different across the three cohorts of students. Study Background This example illustrates how to conduct a one-way analysis of variance to compare the mean differences across three groups.
Alternative Hypothesis (H1): This is the hypothesis that there is a difference between at least two of the group means. ... The Analysis of Variance (ANOVA) is a powerful statistical technique that is used widely across various fields and industries. Here are some of its key applications:
The hypothesis is based on available information and the investigator's belief about the population parameters. The specific test considered here is called analysis of variance (ANOVA) and is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups.
To account for this P(Type I Error) inflation, we instead will do an analysis of variance (ANOVA) to test the equality between 3 or more population means \(\mu_{1}, \mu_{2}, \mu_{3}, \ldots, \mu_{k}\). ... The null hypothesis will always have the means equal to one another versus the alternative hypothesis that at least one mean is different ...
Analysis of variance became widely known after being included in Fisher's 1925 book Statistical Methods for Research Workers (Fisher 1925). To test the hypothesis that all treatments have exactly the same effect, the F-test's p values closely approximate the permutation test's p values: the approximation is particularly close when the ...
With that in mind, here is the null hypothesis and the alternative hypothesis for a one-way analysis of variance: Null hypothesis: The null hypothesis states that the independent variable (dosage level) has no effect on the dependent variable (cholesterol level) in any treatment group. Thus, ... β j = 0 for all j. Alternative hypothesis: The ...
The degrees of freedom associated with SSE is n -2 = 49-2 = 47. And the degrees of freedom add up: 1 + 47 = 48. The sums of squares add up: SSTO = SSR + SSE. That is, here: 53637 = 36464 + 17173. Let's tackle a few more columns of the analysis of variance table, namely the " mean square " column, labeled MS, and the F -statistic column labeled F.
The Null and Alternative Hypotheses. The null hypothesis is that all the group population means are the same. The alternative hypothesis is that at least one pair of means is different. ... μ 1 = μ 2 = μ 3 and the three populations have the same distribution if the null hypothesis is true. The variance of the combined data is approximately ...
Chapter 7One-way ANOVAOne-way ANOVA examines equality of population means for a quantitative out-come and a single categorical explanatory variable wi. h any number of levels.The t-test of Chapter 6 looks at quantitative outcomes with a categorical ex-planatory variable t. at has only two levels. The one-way Analysis of Variance (ANOVA) can be ...
For this reason, it is often referred to as the analysis of variance F-test. The following section summarizes the ANOVA F-test. The ANOVA F-test for the slope parameter β 1. The null hypothesis is H 0: β 1 = 0. The alternative hypothesis is H A: β 1 ≠ 0. The test statistic is \(F^*=\frac{MSR}{MSE}\).
ANOVA, short for Analysis of Variance, is a statistical method used to see if there are significant differences between the averages of three or more unrelated groups. ... In general, if the p-value associated with the F is smaller than .05, then the null hypothesis is rejected and the alternative hypothesis is supported. If the null hypothesis ...
In ANOVA, we will still adopt the alternative hypothesis as the best explanation of our data if we reject the null hypothesis. However, when we look at the alternative hypothesis, we can see that it does not give us much information. We will know that a difference exists somewhere, but we will not know where that difference is.
The analysis is based on an examination of variation between and within groups, and is often called the one-way analysis of variance (or one-way ANOVA for short). We explicitly consider all possible sources of variation before carefully explaining how an ANOVA is conducted. ... The alternative hypothesis is \(H_{1}:\mu _{a}\ne \mu _{b}\), where ...
A One-Way Analysis of Variance is a way to test the equality of three or more population means at one time by using sample variances, under the following assumptions: ... The null hypothesis is that all population means are equal, the alternative hypothesis is that at least one mean is different.
In this Lesson, we introduced One-way Analysis of Variance (ANOVA). The ANOVA test tests the hypothesis that the population means for the groups are the same against the hypothesis that at least one of the means is different. If the null hypothesis is rejected, we need to perform multiple comparisons to determine which means are different.
Analysis of variance, often abbreviated to ANOVA for short, serves the same purpose as the t-tests we learned earlier in unit 2: ... Our alternative hypothesis is always exactly the same. Step 2: Find the Critical Value. Our test statistic for ANOVA, as we saw above, is F.
Analysis of variance (ANOVA) is an inferential method used to test the equality of three or more population means. \(H_0: \mu_1= \mu_2= \mu_3= \cdot =\mu_k\) ... Analysis of variance allows us to test the null hypothesis (all means are equal) against the alternative hypothesis (at least one mean is different) with a specified value of α.
The analysis of variance (ANOVA) is a hypothesis-testing technique used to test the claim that three or more populations (or treatment) means are equal by examining the variances of samples that are taken. This is an extension of the two independent samples t-test. ANOVA is based on comparing the variance (or variation) between the data samples ...
Two-Way ANOVA | Examples & When To Use It. Published on March 20, 2020 by Rebecca Bevans.Revised on June 22, 2023. ANOVA (Analysis of Variance) is a statistical test used to analyze the difference between the means of more than two groups. A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables.
The estimate of the variance in the denominator depends only on the sample variances and is not affected by the differences among the sample means. When the null hypothesis is true, the ratio of S B 2 and S W 2 will be close to 1. When the null hypothesis is false, S B 2 will tend to be larger than S W 2 due to the differences among the ...
The null hypothesis (H0) answers "No, there's no effect in the population.". The alternative hypothesis (Ha) answers "Yes, there is an effect in the population.". The null and alternative are always claims about the population. That's because the goal of hypothesis testing is to make inferences about a population based on a sample.
We observe weak to moderate evidence for the alternative hypothesis in a Bayesian analysis of group means, with more robust results upon stimulation to a brain region governing multiple phoneme features. ... The model with the second highest significance and the lowest sample variance was Decoding Accuracy∼Task Accuracy (F (1,119) = 13.556, p ...