• Privacy Policy

Research Method

Home » ANOVA (Analysis of variance) – Formulas, Types, and Examples

ANOVA (Analysis of variance) – Formulas, Types, and Examples

Table of Contents

ANOVA

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more means. It is similar to the t-test, but the t-test is generally used for comparing two means, while ANOVA is used when you have more than two means to compare.

ANOVA is based on comparing the variance (or variation) between the data samples to the variation within each particular sample. If the between-group variance is high and the within-group variance is low, this provides evidence that the means of the groups are significantly different.

ANOVA Terminology

When discussing ANOVA, there are several key terms to understand:

  • Factor : This is another term for the independent variable in your analysis. In a one-way ANOVA, there is one factor, while in a two-way ANOVA, there are two factors.
  • Levels : These are the different groups or categories within a factor. For example, if the factor is ‘diet’ the levels might be ‘low fat’, ‘medium fat’, and ‘high fat’.
  • Response Variable : This is the dependent variable or the outcome that you are measuring.
  • Within-group Variance : This is the variance or spread of scores within each level of your factor.
  • Between-group Variance : This is the variance or spread of scores between the different levels of your factor.
  • Grand Mean : This is the overall mean when you consider all the data together, regardless of the factor level.
  • Treatment Sums of Squares (SS) : This represents the between-group variability. It is the sum of the squared differences between the group means and the grand mean.
  • Error Sums of Squares (SS) : This represents the within-group variability. It’s the sum of the squared differences between each observation and its group mean.
  • Total Sums of Squares (SS) : This is the sum of the Treatment SS and the Error SS. It represents the total variability in the data.
  • Degrees of Freedom (df) : The degrees of freedom are the number of values that have the freedom to vary when computing a statistic. For example, if you have ‘n’ observations in one group, then the degrees of freedom for that group is ‘n-1’.
  • Mean Square (MS) : Mean Square is the average squared deviation and is calculated by dividing the sum of squares by the corresponding degrees of freedom.
  • F-Ratio : This is the test statistic for ANOVAs, and it’s the ratio of the between-group variance to the within-group variance. If the between-group variance is significantly larger than the within-group variance, the F-ratio will be large and likely significant.
  • Null Hypothesis (H0) : This is the hypothesis that there is no difference between the group means.
  • Alternative Hypothesis (H1) : This is the hypothesis that there is a difference between at least two of the group means.
  • p-value : This is the probability of obtaining a test statistic as extreme as the one that was actually observed, assuming that the null hypothesis is true. If the p-value is less than the significance level (usually 0.05), then the null hypothesis is rejected in favor of the alternative hypothesis.
  • Post-hoc tests : These are follow-up tests conducted after an ANOVA when the null hypothesis is rejected, to determine which specific groups’ means (levels) are different from each other. Examples include Tukey’s HSD, Scheffe, Bonferroni, among others.

Types of ANOVA

Types of ANOVA are as follows:

One-way (or one-factor) ANOVA

This is the simplest type of ANOVA, which involves one independent variable . For example, comparing the effect of different types of diet (vegetarian, pescatarian, omnivore) on cholesterol level.

Two-way (or two-factor) ANOVA

This involves two independent variables. This allows for testing the effect of each independent variable on the dependent variable , as well as testing if there’s an interaction effect between the independent variables on the dependent variable.

Repeated Measures ANOVA

This is used when the same subjects are measured multiple times under different conditions, or at different points in time. This type of ANOVA is often used in longitudinal studies.

Mixed Design ANOVA

This combines features of both between-subjects (independent groups) and within-subjects (repeated measures) designs. In this model, one factor is a between-subjects variable and the other is a within-subjects variable.

Multivariate Analysis of Variance (MANOVA)

This is used when there are two or more dependent variables. It tests whether changes in the independent variable(s) correspond to changes in the dependent variables.

Analysis of Covariance (ANCOVA)

This combines ANOVA and regression. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative covariates (interval variables) account. This allows the comparison of one variable outcome between groups, while statistically controlling for the effect of other continuous variables that are not of primary interest.

Nested ANOVA

This model is used when the groups can be clustered into categories. For example, if you were comparing students’ performance from different classrooms and different schools, “classroom” could be nested within “school.”

ANOVA Formulas

ANOVA Formulas are as follows:

Sum of Squares Total (SST)

This represents the total variability in the data. It is the sum of the squared differences between each observation and the overall mean.

  • yi represents each individual data point
  • y_mean represents the grand mean (mean of all observations)

Sum of Squares Within (SSW)

This represents the variability within each group or factor level. It is the sum of the squared differences between each observation and its group mean.

  • yij represents each individual data point within a group
  • y_meani represents the mean of the ith group

Sum of Squares Between (SSB)

This represents the variability between the groups. It is the sum of the squared differences between the group means and the grand mean, multiplied by the number of observations in each group.

  • ni represents the number of observations in each group
  • y_mean represents the grand mean

Degrees of Freedom

The degrees of freedom are the number of values that have the freedom to vary when calculating a statistic.

For within groups (dfW):

For between groups (dfB):

For total (dfT):

  • N represents the total number of observations
  • k represents the number of groups

Mean Squares

Mean squares are the sum of squares divided by the respective degrees of freedom.

Mean Squares Between (MSB):

Mean Squares Within (MSW):

F-Statistic

The F-statistic is used to test whether the variability between the groups is significantly greater than the variability within the groups.

If the F-statistic is significantly higher than what would be expected by chance, we reject the null hypothesis that all group means are equal.

Examples of ANOVA

Examples 1:

Suppose a psychologist wants to test the effect of three different types of exercise (yoga, aerobic exercise, and weight training) on stress reduction. The dependent variable is the stress level, which can be measured using a stress rating scale.

Here are hypothetical stress ratings for a group of participants after they followed each of the exercise regimes for a period:

  • Yoga: [3, 2, 2, 1, 2, 2, 3, 2, 1, 2]
  • Aerobic Exercise: [2, 3, 3, 2, 3, 2, 3, 3, 2, 2]
  • Weight Training: [4, 4, 5, 5, 4, 5, 4, 5, 4, 5]

The psychologist wants to determine if there is a statistically significant difference in stress levels between these different types of exercise.

To conduct the ANOVA:

1. State the hypotheses:

  • Null Hypothesis (H0): There is no difference in mean stress levels between the three types of exercise.
  • Alternative Hypothesis (H1): There is a difference in mean stress levels between at least two of the types of exercise.

2. Calculate the ANOVA statistics:

  • Compute the Sum of Squares Between (SSB), Sum of Squares Within (SSW), and Sum of Squares Total (SST).
  • Calculate the Degrees of Freedom (dfB, dfW, dfT).
  • Calculate the Mean Squares Between (MSB) and Mean Squares Within (MSW).
  • Compute the F-statistic (F = MSB / MSW).

3. Check the p-value associated with the calculated F-statistic.

  • If the p-value is less than the chosen significance level (often 0.05), then we reject the null hypothesis in favor of the alternative hypothesis. This suggests there is a statistically significant difference in mean stress levels between the three exercise types.

4. Post-hoc tests

  • If we reject the null hypothesis, we conduct a post-hoc test to determine which specific groups’ means (exercise types) are different from each other.

Examples 2:

Suppose an agricultural scientist wants to compare the yield of three varieties of wheat. The scientist randomly selects four fields for each variety and plants them. After harvest, the yield from each field is measured in bushels. Here are the hypothetical yields:

The scientist wants to know if the differences in yields are due to the different varieties or just random variation.

Here’s how to apply the one-way ANOVA to this situation:

  • Null Hypothesis (H0): The means of the three populations are equal.
  • Alternative Hypothesis (H1): At least one population mean is different.
  • Calculate the Degrees of Freedom (dfB for between groups, dfW for within groups, dfT for total).
  • If the p-value is less than the chosen significance level (often 0.05), then we reject the null hypothesis in favor of the alternative hypothesis. This would suggest there is a statistically significant difference in mean yields among the three varieties.
  • If we reject the null hypothesis, we conduct a post-hoc test to determine which specific groups’ means (wheat varieties) are different from each other.

How to Conduct ANOVA

Conducting an Analysis of Variance (ANOVA) involves several steps. Here’s a general guideline on how to perform it:

  • Null Hypothesis (H0): The means of all groups are equal.
  • Alternative Hypothesis (H1): At least one group mean is different from the others.
  • The significance level (often denoted as α) is usually set at 0.05. This implies that you are willing to accept a 5% chance that you are wrong in rejecting the null hypothesis.
  • Data should be collected for each group under study. Make sure that the data meet the assumptions of an ANOVA: normality, independence, and homogeneity of variances.
  • Calculate the Degrees of Freedom (df) for each sum of squares (dfB, dfW, dfT).
  • Compute the Mean Squares Between (MSB) and Mean Squares Within (MSW) by dividing the sum of squares by the corresponding degrees of freedom.
  • Compute the F-statistic as the ratio of MSB to MSW.
  • Determine the critical F-value from the F-distribution table using dfB and dfW.
  • If the calculated F-statistic is greater than the critical F-value, reject the null hypothesis.
  • If the p-value associated with the calculated F-statistic is smaller than the significance level (0.05 typically), you reject the null hypothesis.
  • If you rejected the null hypothesis, you can conduct post-hoc tests (like Tukey’s HSD) to determine which specific groups’ means (if you have more than two groups) are different from each other.
  • Regardless of the result, report your findings in a clear, understandable manner. This typically includes reporting the test statistic, p-value, and whether the null hypothesis was rejected.

When to use ANOVA

ANOVA (Analysis of Variance) is used when you have three or more groups and you want to compare their means to see if they are significantly different from each other. It is a statistical method that is used in a variety of research scenarios. Here are some examples of when you might use ANOVA:

  • Comparing Groups : If you want to compare the performance of more than two groups, for example, testing the effectiveness of different teaching methods on student performance.
  • Evaluating Interactions : In a two-way or factorial ANOVA, you can test for an interaction effect. This means you are not only interested in the effect of each individual factor, but also whether the effect of one factor depends on the level of another factor.
  • Repeated Measures : If you have measured the same subjects under different conditions or at different time points, you can use repeated measures ANOVA to compare the means of these repeated measures while accounting for the correlation between measures from the same subject.
  • Experimental Designs : ANOVA is often used in experimental research designs when subjects are randomly assigned to different conditions and the goal is to compare the means of the conditions.

Here are the assumptions that must be met to use ANOVA:

  • Normality : The data should be approximately normally distributed.
  • Homogeneity of Variances : The variances of the groups you are comparing should be roughly equal. This assumption can be tested using Levene’s test or Bartlett’s test.
  • Independence : The observations should be independent of each other. This assumption is met if the data is collected appropriately with no related groups (e.g., twins, matched pairs, repeated measures).

Applications of ANOVA

The Analysis of Variance (ANOVA) is a powerful statistical technique that is used widely across various fields and industries. Here are some of its key applications:

Agriculture

ANOVA is commonly used in agricultural research to compare the effectiveness of different types of fertilizers, crop varieties, or farming methods. For example, an agricultural researcher could use ANOVA to determine if there are significant differences in the yields of several varieties of wheat under the same conditions.

Manufacturing and Quality Control

ANOVA is used to determine if different manufacturing processes or machines produce different levels of product quality. For instance, an engineer might use it to test whether there are differences in the strength of a product based on the machine that produced it.

Marketing Research

Marketers often use ANOVA to test the effectiveness of different advertising strategies. For example, a marketer could use ANOVA to determine whether different marketing messages have a significant impact on consumer purchase intentions.

Healthcare and Medicine

In medical research, ANOVA can be used to compare the effectiveness of different treatments or drugs. For example, a medical researcher could use ANOVA to test whether there are significant differences in recovery times for patients who receive different types of therapy.

ANOVA is used in educational research to compare the effectiveness of different teaching methods or educational interventions. For example, an educator could use it to test whether students perform significantly differently when taught with different teaching methods.

Psychology and Social Sciences

Psychologists and social scientists use ANOVA to compare group means on various psychological and social variables. For example, a psychologist could use it to determine if there are significant differences in stress levels among individuals in different occupations.

Biology and Environmental Sciences

Biologists and environmental scientists use ANOVA to compare different biological and environmental conditions. For example, an environmental scientist could use it to determine if there are significant differences in the levels of a pollutant in different bodies of water.

Advantages of ANOVA

Here are some advantages of using ANOVA:

Comparing Multiple Groups: One of the key advantages of ANOVA is the ability to compare the means of three or more groups. This makes it more powerful and flexible than the t-test, which is limited to comparing only two groups.

Control of Type I Error: When comparing multiple groups, the chances of making a Type I error (false positive) increases. One of the strengths of ANOVA is that it controls the Type I error rate across all comparisons. This is in contrast to performing multiple pairwise t-tests which can inflate the Type I error rate.

Testing Interactions: In factorial ANOVA, you can test not only the main effect of each factor, but also the interaction effect between factors. This can provide valuable insights into how different factors or variables interact with each other.

Handling Continuous and Categorical Variables: ANOVA can handle both continuous and categorical variables . The dependent variable is continuous and the independent variables are categorical.

Robustness: ANOVA is considered robust to violations of normality assumption when group sizes are equal. This means that even if your data do not perfectly meet the normality assumption, you might still get valid results.

Provides Detailed Analysis: ANOVA provides a detailed breakdown of variances and interactions between variables which can be useful in understanding the underlying factors affecting the outcome.

Capability to Handle Complex Experimental Designs: Advanced types of ANOVA (like repeated measures ANOVA, MANOVA, etc.) can handle more complex experimental designs, including those where measurements are taken on the same subjects over time, or when you want to analyze multiple dependent variables at once.

Disadvantages of ANOVA

Some limitations or disadvantages that are important to consider:

Assumptions: ANOVA relies on several assumptions including normality (the data follows a normal distribution), independence (the observations are independent of each other), and homogeneity of variances (the variances of the groups are roughly equal). If these assumptions are violated, the results of the ANOVA may not be valid.

Sensitivity to Outliers: ANOVA can be sensitive to outliers. A single extreme value in one group can affect the sum of squares and consequently influence the F-statistic and the overall result of the test.

Dichotomous Variables: ANOVA is not suitable for dichotomous variables (variables that can take only two values, like yes/no or male/female). It is used to compare the means of groups for a continuous dependent variable.

Lack of Specificity: Although ANOVA can tell you that there is a significant difference between groups, it doesn’t tell you which specific groups are significantly different from each other. You need to carry out further post-hoc tests (like Tukey’s HSD or Bonferroni) for these pairwise comparisons.

Complexity with Multiple Factors: When dealing with multiple factors and interactions in factorial ANOVA, interpretation can become complex. The presence of interaction effects can make main effects difficult to interpret.

Requires Larger Sample Sizes: To detect an effect of a certain size, ANOVA generally requires larger sample sizes than a t-test.

Equal Group Sizes: While not always a strict requirement, ANOVA is most powerful and its assumptions are most likely to be met when groups are of equal or similar sizes.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Content Analysis

Content Analysis – Methods, Types and Examples

Documentary Analysis

Documentary Analysis – Methods, Applications and...

Narrative Analysis

Narrative Analysis – Types, Methods and Examples

Graphical Methods

Graphical Methods – Types, Examples and Guide

Substantive Framework

Substantive Framework – Types, Methods and...

Inferential Statistics

Inferential Statistics – Types, Methods and...

Hypothesis Testing - Analysis of Variance (ANOVA)

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

alternative hypothesis analysis of variance

Introduction

This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific test considered here is called analysis of variance (ANOVA) and is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups. For example, in some clinical trials there are more than two comparison groups. In a clinical trial to evaluate a new medication for asthma, investigators might compare an experimental medication to a placebo and to a standard treatment (i.e., a medication currently being used). In an observational study such as the Framingham Heart Study, it might be of interest to compare mean blood pressure or mean cholesterol levels in persons who are underweight, normal weight, overweight and obese.  

The technique to test for a difference in more than two independent means is an extension of the two independent samples procedure discussed previously which applies when there are exactly two independent comparison groups. The ANOVA technique applies when there are two or more than two independent groups. The ANOVA procedure is used to compare the means of the comparison groups and is conducted using the same five step approach used in the scenarios discussed in previous sections. Because there are more than two groups, however, the computation of the test statistic is more involved. The test statistic must take into account the sample sizes, sample means and sample standard deviations in each of the comparison groups.

If one is examining the means observed among, say three groups, it might be tempting to perform three separate group to group comparisons, but this approach is incorrect because each of these comparisons fails to take into account the total data, and it increases the likelihood of incorrectly concluding that there are statistically significate differences, since each comparison adds to the probability of a type I error. Analysis of variance avoids these problemss by asking a more global question, i.e., whether there are significant differences among the groups, without addressing differences between any two groups in particular (although there are additional tests that can do this if the analysis of variance indicates that there are differences among the groups).

The fundamental strategy of ANOVA is to systematically examine variability within groups being compared and also examine variability among the groups being compared.

Learning Objectives

After completing this module, the student will be able to:

  • Perform analysis of variance by hand
  • Appropriately interpret results of analysis of variance tests
  • Distinguish between one and two factor analysis of variance tests
  • Identify the appropriate hypothesis testing procedure based on type of outcome variable and number of samples

The ANOVA Approach

Consider an example with four independent groups and a continuous outcome measure. The independent groups might be defined by a particular characteristic of the participants such as BMI (e.g., underweight, normal weight, overweight, obese) or by the investigator (e.g., randomizing participants to one of four competing treatments, call them A, B, C and D). Suppose that the outcome is systolic blood pressure, and we wish to test whether there is a statistically significant difference in mean systolic blood pressures among the four groups. The sample data are organized as follows:

 

n

n

n

n

s

s

s

s

The hypotheses of interest in an ANOVA are as follows:

  • H 0 : μ 1 = μ 2 = μ 3 ... = μ k
  • H 1 : Means are not all equal.

where k = the number of independent comparison groups.

In this example, the hypotheses are:

  • H 0 : μ 1 = μ 2 = μ 3 = μ 4
  • H 1 : The means are not all equal.

The null hypothesis in ANOVA is always that there is no difference in means. The research or alternative hypothesis is always that the means are not all equal and is usually written in words rather than in mathematical symbols. The research hypothesis captures any difference in means and includes, for example, the situation where all four means are unequal, where one is different from the other three, where two are different, and so on. The alternative hypothesis, as shown above, capture all possible situations other than equality of all means specified in the null hypothesis.

Test Statistic for ANOVA

The test statistic for testing H 0 : μ 1 = μ 2 = ... =   μ k is:

and the critical value is found in a table of probability values for the F distribution with (degrees of freedom) df 1 = k-1, df 2 =N-k. The table can be found in "Other Resources" on the left side of the pages.

NOTE: The test statistic F assumes equal variability in the k populations (i.e., the population variances are equal, or s 1 2 = s 2 2 = ... = s k 2 ). This means that the outcome is equally variable in each of the comparison populations. This assumption is the same as that assumed for appropriate use of the test statistic to test equality of two independent means. It is possible to assess the likelihood that the assumption of equal variances is true and the test can be conducted in most statistical computing packages. If the variability in the k comparison groups is not similar, then alternative techniques must be used.

The F statistic is computed by taking the ratio of what is called the "between treatment" variability to the "residual or error" variability. This is where the name of the procedure originates. In analysis of variance we are testing for a difference in means (H 0 : means are all equal versus H 1 : means are not all equal) by evaluating variability in the data. The numerator captures between treatment variability (i.e., differences among the sample means) and the denominator contains an estimate of the variability in the outcome. The test statistic is a measure that allows us to assess whether the differences among the sample means (numerator) are more than would be expected by chance if the null hypothesis is true. Recall in the two independent sample test, the test statistic was computed by taking the ratio of the difference in sample means (numerator) to the variability in the outcome (estimated by Sp).  

The decision rule for the F test in ANOVA is set up in a similar way to decision rules we established for t tests. The decision rule again depends on the level of significance and the degrees of freedom. The F statistic has two degrees of freedom. These are denoted df 1 and df 2 , and called the numerator and denominator degrees of freedom, respectively. The degrees of freedom are defined as follows:

df 1 = k-1 and df 2 =N-k,

where k is the number of comparison groups and N is the total number of observations in the analysis.   If the null hypothesis is true, the between treatment variation (numerator) will not exceed the residual or error variation (denominator) and the F statistic will small. If the null hypothesis is false, then the F statistic will be large. The rejection region for the F test is always in the upper (right-hand) tail of the distribution as shown below.

Rejection Region for F   Test with a =0.05, df 1 =3 and df 2 =36 (k=4, N=40)

Graph of rejection region for the F statistic with alpha=0.05

For the scenario depicted here, the decision rule is: Reject H 0 if F > 2.87.

The ANOVA Procedure

We will next illustrate the ANOVA procedure using the five step approach. Because the computation of the test statistic is involved, the computations are often organized in an ANOVA table. The ANOVA table breaks down the components of variation in the data into variation between treatments and error or residual variation. Statistical computing packages also produce ANOVA tables as part of their standard output for ANOVA, and the ANOVA table is set up as follows: 

Source of Variation

Sums of Squares (SS)

Degrees of Freedom (df)

Mean Squares (MS)

F

Between Treatments

k-1

Error (or Residual)

N-k

Total

N-1

where  

  • X = individual observation,
  • k = the number of treatments or independent comparison groups, and
  • N = total number of observations or total sample size.

The ANOVA table above is organized as follows.

  • The first column is entitled "Source of Variation" and delineates the between treatment and error or residual variation. The total variation is the sum of the between treatment and error variation.
  • The second column is entitled "Sums of Squares (SS)" . The between treatment sums of squares is

and is computed by summing the squared differences between each treatment (or group) mean and the overall mean. The squared differences are weighted by the sample sizes per group (n j ). The error sums of squares is:

and is computed by summing the squared differences between each observation and its group mean (i.e., the squared differences between each observation in group 1 and the group 1 mean, the squared differences between each observation in group 2 and the group 2 mean, and so on). The double summation ( SS ) indicates summation of the squared differences within each treatment and then summation of these totals across treatments to produce a single value. (This will be illustrated in the following examples). The total sums of squares is:

and is computed by summing the squared differences between each observation and the overall sample mean. In an ANOVA, data are organized by comparison or treatment groups. If all of the data were pooled into a single sample, SST would reflect the numerator of the sample variance computed on the pooled or total sample. SST does not figure into the F statistic directly. However, SST = SSB + SSE, thus if two sums of squares are known, the third can be computed from the other two.

  • The third column contains degrees of freedom . The between treatment degrees of freedom is df 1 = k-1. The error degrees of freedom is df 2 = N - k. The total degrees of freedom is N-1 (and it is also true that (k-1) + (N-k) = N-1).
  • The fourth column contains "Mean Squares (MS)" which are computed by dividing sums of squares (SS) by degrees of freedom (df), row by row. Specifically, MSB=SSB/(k-1) and MSE=SSE/(N-k). Dividing SST/(N-1) produces the variance of the total sample. The F statistic is in the rightmost column of the ANOVA table and is computed by taking the ratio of MSB/MSE.  

A clinical trial is run to compare weight loss programs and participants are randomly assigned to one of the comparison programs and are counseled on the details of the assigned program. Participants follow the assigned program for 8 weeks. The outcome of interest is weight loss, defined as the difference in weight measured at the start of the study (baseline) and weight measured at the end of the study (8 weeks), measured in pounds.  

Three popular weight loss programs are considered. The first is a low calorie diet. The second is a low fat diet and the third is a low carbohydrate diet. For comparison purposes, a fourth group is considered as a control group. Participants in the fourth group are told that they are participating in a study of healthy behaviors with weight loss only one component of interest. The control group is included here to assess the placebo effect (i.e., weight loss due to simply participating in the study). A total of twenty patients agree to participate in the study and are randomly assigned to one of the four diet groups. Weights are measured at baseline and patients are counseled on the proper implementation of the assigned diet (with the exception of the control group). After 8 weeks, each patient's weight is again measured and the difference in weights is computed by subtracting the 8 week weight from the baseline weight. Positive differences indicate weight losses and negative differences indicate weight gains. For interpretation purposes, we refer to the differences in weights as weight losses and the observed weight losses are shown below.

Low Calorie

Low Fat

Low Carbohydrate

Control

8

2

3

2

9

4

5

2

6

3

4

-1

7

5

2

0

3

1

3

3

Is there a statistically significant difference in the mean weight loss among the four diets?  We will run the ANOVA using the five-step approach.

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ 1 = μ 2 = μ 3 = μ 4 H 1 : Means are not all equal              α=0.05

  • Step 2. Select the appropriate test statistic.  

The test statistic is the F statistic for ANOVA, F=MSB/MSE.

  • Step 3. Set up decision rule.  

The appropriate critical value can be found in a table of probabilities for the F distribution(see "Other Resources"). In order to determine the critical value of F we need degrees of freedom, df 1 =k-1 and df 2 =N-k. In this example, df 1 =k-1=4-1=3 and df 2 =N-k=20-4=16. The critical value is 3.24 and the decision rule is as follows: Reject H 0 if F > 3.24.

  • Step 4. Compute the test statistic.  

To organize our computations we complete the ANOVA table. In order to compute the sums of squares we must first compute the sample means for each group and the overall mean based on the total sample.  

 

Low Calorie

Low Fat

Low Carbohydrate

Control

n

5

5

5

5

Group mean

6.6

3.0

3.4

1.2

We can now compute

So, in this case:

Next we compute,

SSE requires computing the squared differences between each observation and its group mean. We will compute SSE in parts. For the participants in the low calorie diet:  

6.6

8

1.4

2.0

9

2.4

5.8

6

-0.6

0.4

7

0.4

0.2

3

-3.6

13.0

Totals

0

21.4

For the participants in the low fat diet:  

3.0

2

-1.0

1.0

4

1.0

1.0

3

0.0

0.0

5

2.0

4.0

1

-2.0

4.0

Totals

0

10.0

For the participants in the low carbohydrate diet:  

3

-0.4

0.2

5

1.6

2.6

4

0.6

0.4

2

-1.4

2.0

3

-0.4

0.2

Totals

0

5.4

For the participants in the control group:

2

0.8

0.6

2

0.8

0.6

-1

-2.2

4.8

0

-1.2

1.4

3

1.8

3.2

Totals

0

10.6

We can now construct the ANOVA table .

Source of Variation

Sums of Squares

(SS)

Degrees of Freedom

(df)

Means Squares

(MS)

F

Between Treatmenst

75.8

4-1=3

75.8/3=25.3

25.3/3.0=8.43

Error (or Residual)

47.4

20-4=16

47.4/16=3.0

Total

123.2

20-1=19

  • Step 5. Conclusion.  

We reject H 0 because 8.43 > 3.24. We have statistically significant evidence at α=0.05 to show that there is a difference in mean weight loss among the four diets.    

ANOVA is a test that provides a global assessment of a statistical difference in more than two independent means. In this example, we find that there is a statistically significant difference in mean weight loss among the four diets considered. In addition to reporting the results of the statistical test of hypothesis (i.e., that there is a statistically significant difference in mean weight losses at α=0.05), investigators should also report the observed sample means to facilitate interpretation of the results. In this example, participants in the low calorie diet lost an average of 6.6 pounds over 8 weeks, as compared to 3.0 and 3.4 pounds in the low fat and low carbohydrate groups, respectively. Participants in the control group lost an average of 1.2 pounds which could be called the placebo effect because these participants were not participating in an active arm of the trial specifically targeted for weight loss. Are the observed weight losses clinically meaningful?

Another ANOVA Example

Calcium is an essential mineral that regulates the heart, is important for blood clotting and for building healthy bones. The National Osteoporosis Foundation recommends a daily calcium intake of 1000-1200 mg/day for adult men and women. While calcium is contained in some foods, most adults do not get enough calcium in their diets and take supplements. Unfortunately some of the supplements have side effects such as gastric distress, making them difficult for some patients to take on a regular basis.  

 A study is designed to test whether there is a difference in mean daily calcium intake in adults with normal bone density, adults with osteopenia (a low bone density which may lead to osteoporosis) and adults with osteoporosis. Adults 60 years of age with normal bone density, osteopenia and osteoporosis are selected at random from hospital records and invited to participate in the study. Each participant's daily calcium intake is measured based on reported food intake and supplements. The data are shown below.   

1200

1000

890

1000

1100

650

980

700

1100

900

800

900

750

500

400

800

700

350

Is there a statistically significant difference in mean calcium intake in patients with normal bone density as compared to patients with osteopenia and osteoporosis? We will run the ANOVA using the five-step approach.

H 0 : μ 1 = μ 2 = μ 3 H 1 : Means are not all equal                            α=0.05

In order to determine the critical value of F we need degrees of freedom, df 1 =k-1 and df 2 =N-k.   In this example, df 1 =k-1=3-1=2 and df 2 =N-k=18-3=15. The critical value is 3.68 and the decision rule is as follows: Reject H 0 if F > 3.68.

To organize our computations we will complete the ANOVA table. In order to compute the sums of squares we must first compute the sample means for each group and the overall mean.  

Normal Bone Density

n =6

n =6

n =6

 If we pool all N=18 observations, the overall mean is 817.8.

We can now compute:

Substituting:

SSE requires computing the squared differences between each observation and its group mean. We will compute SSE in parts. For the participants with normal bone density:

1200

261.6667

68,486.9

1000

61.6667

3,806.9

980

41.6667

1,738.9

900

-38.3333

1,466.9

750

-188.333

35,456.9

800

-138.333

19,126.9

Total

0

130,083.3

For participants with osteopenia:

1000

200

40,000

1100

300

90,000

700

-100

10,000

800

0

0

500

-300

90,000

700

-100

10,000

Total

0

240,000

For participants with osteoporosis:

890

175

30,625

650

-65

4,225

1100

385

148,225

900

185

34,225

400

-315

99,225

350

-365

133,225

Total

0

449,750

Between Treatments

152,477.7

2

76,238.6

1.395

Error or Residual

819,833.3

15

54,655.5

Total

972,311.0

17

We do not reject H 0 because 1.395 < 3.68. We do not have statistically significant evidence at a =0.05 to show that there is a difference in mean calcium intake in patients with normal bone density as compared to osteopenia and osterporosis. Are the differences in mean calcium intake clinically meaningful? If so, what might account for the lack of statistical significance?

One-Way ANOVA in R

The video below by Mike Marin demonstrates how to perform analysis of variance in R. It also covers some other statistical issues, but the initial part of the video will be useful to you.

Two-Factor ANOVA

The ANOVA tests described above are called one-factor ANOVAs. There is one treatment or grouping factor with k > 2 levels and we wish to compare the means across the different categories of this factor. The factor might represent different diets, different classifications of risk for disease (e.g., osteoporosis), different medical treatments, different age groups, or different racial/ethnic groups. There are situations where it may be of interest to compare means of a continuous outcome across two or more factors. For example, suppose a clinical trial is designed to compare five different treatments for joint pain in patients with osteoarthritis. Investigators might also hypothesize that there are differences in the outcome by sex. This is an example of a two-factor ANOVA where the factors are treatment (with 5 levels) and sex (with 2 levels). In the two-factor ANOVA, investigators can assess whether there are differences in means due to the treatment, by sex or whether there is a difference in outcomes by the combination or interaction of treatment and sex. Higher order ANOVAs are conducted in the same way as one-factor ANOVAs presented here and the computations are again organized in ANOVA tables with more rows to distinguish the different sources of variation (e.g., between treatments, between men and women). The following example illustrates the approach.

Consider the clinical trial outlined above in which three competing treatments for joint pain are compared in terms of their mean time to pain relief in patients with osteoarthritis. Because investigators hypothesize that there may be a difference in time to pain relief in men versus women, they randomly assign 15 participating men to one of the three competing treatments and randomly assign 15 participating women to one of the three competing treatments (i.e., stratified randomization). Participating men and women do not know to which treatment they are assigned. They are instructed to take the assigned medication when they experience joint pain and to record the time, in minutes, until the pain subsides. The data (times to pain relief) are shown below and are organized by the assigned treatment and sex of the participant.

Table of Time to Pain Relief by Treatment and Sex

12

21

15

19

16

18

17

24

14

25

14

21

17

20

19

23

20

27

17

25

25

37

27

34

29

36

24

26

22

29

The analysis in two-factor ANOVA is similar to that illustrated above for one-factor ANOVA. The computations are again organized in an ANOVA table, but the total variation is partitioned into that due to the main effect of treatment, the main effect of sex and the interaction effect. The results of the analysis are shown below (and were generated with a statistical computing package - here we focus on interpretation). 

 ANOVA Table for Two-Factor ANOVA

Model

967.0

5

193.4

20.7

0.0001

Treatment

651.5

2

325.7

34.8

0.0001

Sex

313.6

1

313.6

33.5

0.0001

Treatment * Sex

1.9

2

0.9

0.1

0.9054

Error or Residual

224.4

24

9.4

Total

1191.4

29

There are 4 statistical tests in the ANOVA table above. The first test is an overall test to assess whether there is a difference among the 6 cell means (cells are defined by treatment and sex). The F statistic is 20.7 and is highly statistically significant with p=0.0001. When the overall test is significant, focus then turns to the factors that may be driving the significance (in this example, treatment, sex or the interaction between the two). The next three statistical tests assess the significance of the main effect of treatment, the main effect of sex and the interaction effect. In this example, there is a highly significant main effect of treatment (p=0.0001) and a highly significant main effect of sex (p=0.0001). The interaction between the two does not reach statistical significance (p=0.91). The table below contains the mean times to pain relief in each of the treatments for men and women (Note that each sample mean is computed on the 5 observations measured under that experimental condition).  

Mean Time to Pain Relief by Treatment and Gender

A

14.8

21.4

B

17.4

23.2

C

25.4

32.4

Treatment A appears to be the most efficacious treatment for both men and women. The mean times to relief are lower in Treatment A for both men and women and highest in Treatment C for both men and women. Across all treatments, women report longer times to pain relief (See below).  

Graph of two-factor ANOVA

Notice that there is the same pattern of time to pain relief across treatments in both men and women (treatment effect). There is also a sex effect - specifically, time to pain relief is longer in women in every treatment.  

Suppose that the same clinical trial is replicated in a second clinical site and the following data are observed.

Table - Time to Pain Relief by Treatment and Sex - Clinical Site 2

22

21

25

19

26

18

27

24

24

25

14

21

17

20

19

23

20

27

17

25

15

37

17

34

19

36

14

26

12

29

The ANOVA table for the data measured in clinical site 2 is shown below.

Table - Summary of Two-Factor ANOVA - Clinical Site 2

Source of Variation

Sums of Squares

(SS)

Degrees of freedom

(df)

Mean Squares

(MS)

F

P-Value

Model

907.0

5

181.4

19.4

0.0001

Treatment

71.5

2

35.7

3.8

0.0362

Sex

313.6

1

313.6

33.5

0.0001

Treatment * Sex

521.9

2

260.9

27.9

0.0001

Error or Residual

224.4

24

9.4

Total

1131.4

29

Notice that the overall test is significant (F=19.4, p=0.0001), there is a significant treatment effect, sex effect and a highly significant interaction effect. The table below contains the mean times to relief in each of the treatments for men and women.  

Table - Mean Time to Pain Relief by Treatment and Gender - Clinical Site 2

24.8

21.4

17.4

23.2

15.4

32.4

Notice that now the differences in mean time to pain relief among the treatments depend on sex. Among men, the mean time to pain relief is highest in Treatment A and lowest in Treatment C. Among women, the reverse is true. This is an interaction effect (see below).  

Graphic display of the results in the preceding table

Notice above that the treatment effect varies depending on sex. Thus, we cannot summarize an overall treatment effect (in men, treatment C is best, in women, treatment A is best).    

When interaction effects are present, some investigators do not examine main effects (i.e., do not test for treatment effect because the effect of treatment depends on sex). This issue is complex and is discussed in more detail in a later module. 

Teach yourself statistics

One-Way Analysis of Variance: Example

In this lesson, we apply one-way analysis of variance to some fictitious data, and we show how to interpret the results of our analysis.

Note: Computations for analysis of variance are usually handled by a software package. For this example, however, we will do the computations "manually", since the gory details have educational value.

Problem Statement

A pharmaceutical company conducts an experiment to test the effect of a new cholesterol medication. The company selects 15 subjects randomly from a larger population. Each subject is randomly assigned to one of three treatment groups. Within each treament group, subjects receive a different dose of the new medication. In Group 1, subjects receive 0 mg/day; in Group 2, 50 mg/day; and in Group 3, 100 mg/day.

The treatment levels represent all the levels of interest to the experimenter, so this experiment used a fixed-effects model to select treatment levels for study.

After 30 days, doctors measure the cholesterol level of each subject. The results for all 15 subjects appear in the table below:

Dosage
Group 1,
0 mg
Group 2,
50 mg
Group 3,
100 mg
210 210 180
240 240 210
270 240 210
270 270 210
300 270 240

In conducting this experiment, the experimenter had two research questions:

  • Does dosage level have a significant effect on cholesterol level?
  • How strong is the effect of dosage level on cholesterol level?

To answer these questions, the experimenter intends to use one-way analysis of variance.

Is One-Way ANOVA the Right Technique?

Before you crunch the first number in one-way analysis of variance, you must be sure that one-way analysis of variance is the correct technique. That means you need to ask two questions:

  • Is the experimental design compatible with one-way analysis of variance?
  • Does the data set satisfy the critical assumptions required for one-way analysis of variance?

Let's address both of those questions.

Experimental Design

As we discussed in the previous lesson (see One-Way Analysis of Variance: Fixed Effects ), one-way analysis of variance is only appropriate with one experimental design - a completely randomized design. That is exactly the design used in our cholesterol study, so we can check the experimental design box.

Critical Assumptions

We also learned in the previous lesson that one-way analysis of variance makes three critical assumptions:

  • Independence . The dependent variable score for each experimental unit is independent of the score for any other unit.
  • Normality . In the population, dependent variable scores are normally distributed within treatment groups.
  • Equality of variance . In the population, the variance of dependent variable scores in each treatment group is equal. (Equality of variance is also known as homogeneity of variance or homoscedasticity.)

Therefore, for the cholesterol study, we need to make sure our data set is consistent with the critical assumptions.

Independence of Scores

The assumption of independence is the most important assumption. When that assumption is violated, the resulting statistical tests can be misleading.

The independence assumption is satisfied by the design of the study, which features random selection of subjects and random assignment to treatment groups. Randomization tends to distribute effects of extraneous variables evenly across groups.

Normal Distributions in Groups

Violations of normality can be a problem when sample size is small, as it is in this cholesterol study. Therefore, it is important to be on the lookout for any indication of non-normality.

There are many different ways to check for normality. On this website, we describe three at: How to Test for Normality: Three Simple Tests . Given the small sample size, our best option for testing normality is to look at the following descriptive statistics:

  • Central tendency. The mean and the median are summary measures used to describe central tendency - the most "typical" value in a set of values. With a normal distribution, the mean is equal to the median.
  • Skewness. Skewness is a measure of the asymmetry of a probability distribution. If observations are equally distributed around the mean, the skewness value is zero; otherwise, the skewness value is positive or negative. As a rule of thumb, skewness between -2 and +2 is consistent with a normal distribution.
  • Kurtosis. Kurtosis is a measure of whether observations cluster around the mean of the distribution or in the tails of the distribution. The normal distribution has a kurtosis value of zero. As a rule of thumb, kurtosis between -2 and +2 is consistent with a normal distribution.

The table below shows the mean, median, skewness, and kurtosis for each group from our study.

  Group 1,
0 mg
Group 2,
50 mg
Group 3,
100 mg
Mean 258 246 210
Median 270 240 210
Range 90 60 60
Skewness -0.40 -0.51 0.00
Kurtosis -0.18 -0.61 2.00

In all three groups, the difference between the mean and median looks small (relative to the range ). And skewness and kurtosis measures are consistent with a normal distribution (i.e., between -2 and +2). These are crude tests, but they provide some confidence for the assumption of normality in each group.

Note: With Excel, you can easily compute the descriptive statistics in Table 1. To see how, go to: How to Test for Normality: Example 1 .

Homogeneity of Variance

When the normality of variance assumption is satisfied, you can use Hartley's Fmax test to test for homogeneity of variance. Here's how to implement the test:

Σj=1 - X )
s =
( n - 1 )

where X i, j is the score for observation i in Group j , X j is the mean of Group j , and n j is the number of observations in Group j .

Here is the variance ( s 2 j ) for each group in the cholesterol study.

Group 1,
0 mg
Group 2,
50 mg
Group 3,
100 mg
1170 630 450

F RATIO = s 2 MAX / s 2 MIN

F RATIO = 1170 / 450

F RATIO = 2.6

where s 2 MAX is the largest group variance, and s 2 MIN is the smallest group variance.

where n is the largest sample size in any group.

Note: The critical F values in the table are based on a significance level of 0.05.

Here, the F ratio (2.6) is smaller than the Fmax value (15.5), so we conclude that the variances are homogeneous.

Note: Other tests, such as Bartlett's test , can also test for homogeneity of variance. For the record, Bartlett's test yields the same conclusion for the cholesterol study; namely, the variances are homogeneous.

Analysis of Variance

Having confirmed that the critical assumptions are tenable, we can proceed with a one-way analysis of variance. That means taking the following steps:

  • Specify a mathematical model to describe the causal factors that affect the dependent variable.
  • Write statistical hypotheses to be tested by experimental data.
  • Specify a significance level for a hypothesis test.
  • Compute the grand mean and the mean scores for each group.
  • Compute sums of squares for each effect in the model.
  • Find the degrees of freedom associated with each effect in the model.
  • Based on sums of squares and degrees of freedom, compute mean squares for each effect in the model.
  • Compute a test statistic , based on observed mean squares and their expected values.
  • Find the P value for the test statistic.
  • Accept or reject the null hypothesis , based on the P value and the significance level.
  • Assess the magnitude of the effect of the independent variable, based on sums of squares.

Now, let's execute each step, one-by-one, with our cholesterol medication experiment.

Mathematical Model

For every experimental design, there is a mathematical model that accounts for all of the independent and extraneous variables that affect the dependent variable. In our experiment, the dependent variable ( X ) is the cholesterol level of a subject, and the independent variable ( β ) is the dosage level administered to a subject.

For example, here is the fixed-effects model for a completely randomized design:

X i j = μ + β j + ε i ( j )

where X i j is the cholesterol level for subject i in treatment group j , μ is the population mean, β j is the effect of the dosage level administered to subjects in group j ; and ε i ( j ) is the effect of all other extraneous variables on subject i in treatment j .

Statistical Hypotheses

For fixed-effects models, it is common practice to write statistical hypotheses in terms of the treatment effect β j . With that in mind, here is the null hypothesis and the alternative hypothesis for a one-way analysis of variance:

H 0 : β j = 0 for all j

H 1 : β j ≠ 0 for some j

If the null hypothesis is true, the mean score (i.e., mean cholesterol level) in each treatment group should equal the population mean. Thus, if the null hypothesis is true, mean scores in the k treatment groups should be equal. If the null hypothesis is false, at least one pair of mean scores should be unequal.

Significance Level

The significance level (also known as alpha or α) is the probability of rejecting the null hypothesis when it is actually true. The significance level for an experiment is specified by the experimenter, before data collection begins.

Experimenters often choose significance levels of 0.05 or 0.01. For this experiment, let's use a significance level of 0.05.

Mean Scores

Analysis of variance begins by computing a grand mean and group means:

X  = ( 1 / 15 ) * ( 210 + 210 + ... + 270 + 240 )

  • Group means. The mean of group j ( X j ) is the mean of all observations in group j , computed as follows:

X  1  = 258

X  2  = 246

X  3  = 210

In the equations above, n is the total sample size across all groups; and n  j is the sample size in Group j  .

Sums of Squares

A sum of squares is the sum of squared deviations from a mean score. One-way analysis of variance makes use of three sums of squares:

SSB = 5 * [ ( 238-258 ) 2 + ( 238-246) 2 + ( 238-210 ) 2 ]

SSW = 2304 + ... + 900 = 9000

  • Total sum of squares. The total sum of squares (SST) measures variation of all scores around the grand mean. It can be computed from the following formula: SST = k Σ j=1 n j Σ i=1 ( X  i j  -  X  ) 2

SST = 784 + 4 + 1084 + ... + 784 + 784 + 4

SST = 15,240

It turns out that the total sum of squares is equal to the between-groups sum of squares plus the within-groups sum of squares, as shown below:

SST = SSB + SSW

15,240 = 6240 + 9000

Degrees of Freedom

The term degrees of freedom (df) refers to the number of independent sample points used to compute a statistic minus the number of parameters estimated from the sample points.

To illustrate what is going on, let's find the degrees of freedom associated with the various sum of squares computations:

Here, the formula uses k independent sample points, the sample means X   j  . And it uses one parameter estimate, the grand mean X , which was estimated from the sample points. So, the between-groups sum of squares has k - 1 degrees of freedom ( df BG  ).

df BG = k - 1 = 5 - 1 = 4

Here, the formula uses n independent sample points, the individual subject scores X  i j  . And it uses k parameter estimates, the group means X   j  , which were estimated from the sample points. So, the within-groups sum of squares has n - k degrees of freedom ( df WG  ).

n = Σ n i = 5 + 5 + 5 = 15

df WG = n - k = 15 - 3 = 12

Here, the formula uses n independent sample points, the individual subject scores X  i j  . And it uses one parameter estimate, the grand mean X , which was estimated from the sample points. So, the total sum of squares has n  - 1 degrees of freedom ( df TOT  ).

df TOT  = n - 1 = 15 - 1 = 14

The degrees of freedom for each sum of squares are summarized in the table below:

Sum of squares Degrees of freedom
Between-groups k - 1 = 2
Within-groups n - k =12
Total n - 1 = 14

Mean Squares

A mean square is an estimate of population variance. It is computed by dividing a sum of squares (SS) by its corresponding degrees of freedom (df), as shown below:

MS = SS / df

To conduct a one-way analysis of variance, we are interested in two mean squares:

MS WG = SSW / df WG

MS WG = 9000 / 12 = 750

MS BG = SSB / df BG

MS BG = 6240 / 2 = 3120

Expected Value

The expected value of a mean square is the average value of the mean square over a large number of experiments.

Statisticians have derived formulas for the expected value of the within-groups mean square ( MS WG  ) and for the expected value of the between-groups mean square ( MS BG  ). For one-way analysis of variance, the expected value formulas are:

Fixed- and Random-Effects:

E( MS WG  ) = σ ε 2

Fixed-Effects:

Σj=1
E( MS  ) = σ +
( k - 1 )

Random-Effects:

E( MS BG  ) = σ ε 2 + nσ β 2

In the equations above, E( MS WG  ) is the expected value of the within-groups mean square; E( MS BG  ) is the expected value of the between-groups mean square; n is total sample size; k is the number of treatment groups; β  j is the treatment effect in Group j ; σ ε 2 is the variance attributable to everything except the treatment effect (i.e., all the extraneous variables); and σ β 2 is the variance due to random selection of treatment levels.

Notice that MS BG should equal MS WG when the variation due to treatment effects ( β  j for fixed effects and σ β 2 for random effects) is zero (i.e., when the independent variable does not affect the dependent variable). And MS BG should be bigger than the MS WG when the variation due to treatment effects is not zero (i.e., when the independent variable does affect the dependent variable)

Conclusion: By examining the relative size of the mean squares, we can make a judgment about whether an independent variable affects a dependent variable.

Test Statistic

Suppose we use the mean squares to define a test statistic F as follows:

F(v 1 , v 2 ) = MS BG / MS WG

F(2, 12) = 3120 / 750 = 4.16

where MS BG is the between-groups mean square, MS WG is the within-groups mean square, v 1 is the degrees of freedom for MS BG , and v 2 is the degrees of freedom for MS WG .

Defined in this way, the F ratio measures the size of MS BG relative to MS WG . The F ratio is a convenient measure that we can use to test the null hypothesis. Here's how:

  • When the F ratio is close to one, MS BG is approximately equal to MS WG . This indicates that the independent variable did not affect the dependent variable, so we cannot reject the null hypothesis.
  • When the F ratio is significantly greater than one, MS BG is bigger than MS WG . This indicates that the independent variable did affect the dependent variable, so we must reject the null hypothesis.

What does it mean for the F ratio to be significantly greater than one? To answer that question, we need to talk about the P-value.

In an experiment, a P-value is the probability of obtaining a result more extreme than the observed experimental outcome, assuming the null hypothesis is true.

With analysis of variance, the F ratio is the observed experimental outcome that we are interested in. So, the P-value would be the probability that an F statistic would be more extreme (i.e., bigger) than the actual F ratio computed from experimental data.

We can use Stat Trek's F Distribution Calculator to find the probability that an F statistic will be bigger than the actual F ratio observed in the experiment. Enter the between-groups degrees of freedom (2), the within-groups degrees of freedom (12), and the observed F ratio (4.16) into the calculator; then, click the Calculate button.

From the calculator, we see that the P ( F > 4.16 ) equals about 0.04. Therefore, the P-Value is 0.04.

Hypothesis Test

Recall that we specified a significance level 0.05 for this experiment. Once you know the significance level and the P-value, the hypothesis test is routine. Here's the decision rule for accepting or rejecting the null hypothesis:

  • If the P-value is bigger than the significance level, accept the null hypothesis.
  • If the P-value is equal to or smaller than the significance level, reject the null hypothesis.

Since the P-value (0.04) in our experiment is smaller than the significance level (0.05), we reject the null hypothesis that drug dosage had no effect on cholesterol level. And we conclude that the mean cholesterol level in at least one treatment group differed significantly from the mean cholesterol level in another group.

Magnitude of Effect

The hypothesis test tells us whether the independent variable in our experiment has a statistically significant effect on the dependent variable, but it does not address the magnitude of the effect. Here's the issue:

  • When the sample size is large, you may find that even small differences in treatment means are statistically significant.
  • When the sample size is small, you may find that even big differences in treatment means are not statistically significant.

With this in mind, it is customary to supplement analysis of variance with an appropriate measure of effect size. Eta squared (η 2 ) is one such measure. Eta squared is the proportion of variance in the dependent variable that is explained by a treatment effect. The eta squared formula for one-way analysis of variance is:

η 2 = SSB / SST

where SSB is the between-groups sum of squares and SST is the total sum of squares.

Given this formula, we can compute eta squared for this drug dosage experiment, as shown below:

η 2 = SSB / SST = 6240 / 15240 = 0.41

Thus, 41 percent of the variance in our dependent variable (cholesterol level) can be explained by variation in our independent variable (dosage level). It appears that the relationship between dosage level and cholesterol level is significant not only in a statistical sense; it is significant in a practical sense as well.

ANOVA Summary Table

It is traditional to summarize ANOVA results in an analysis of variance table. The analysis that we just conducted provides all of the information that we need to produce the following ANOVA summary table:

Analysis of Variance Table

Source SS df MS F P
BG 6,240 2 3,120 4.16 0.04
WG 9,000 12 750
Total 15,240 14

This ANOVA table allows any researcher to interpret the results of the experiment, at a glance.

The P-value (shown in the last column of the ANOVA table) is the probability that an F statistic would be more extreme (bigger) than the F ratio shown in the table, assuming the null hypothesis is true. When the P-value is bigger than the significance level, we accept the null hypothesis; when it is smaller, we reject it. Here, the P-value (0.04) is smaller than the significance level (0.05), so we reject the null hypothesis.

To assess the strength of the treatment effect, an experimenter might compute eta squared (η 2 ). The computation is easy, using sum of squares entries from the ANOVA table, as shown below:

η 2 = SSB / SST = 6,240 / 15,240 = 0.41

For this experiment, an eta squared of 0.41 means that 41% of the variance in the dependent variable can be explained by the effect of the independent variable.

An Easier Option

In this lesson, we showed all of the hand calculations for a one-way analysis of variance. In the real world, researchers seldom conduct analysis of variance by hand. They use statistical software. In the next lesson, we'll analyze data from this problem with Excel. Hopefully, we'll get the same result.

13.1 One-Way ANOVA

The purpose of a one-way ANOVA test is to determine the existence of a statistically significant difference among several group means. The test uses variances to help determine if the means are equal or not. To perform a one-way ANOVA test, there are five basic assumptions to be fulfilled:

  • Each population from which a sample is taken is assumed to be normal.
  • All samples are randomly selected and independent.
  • The populations are assumed to have equal standard deviations (or variances).
  • The factor is a categorical variable.
  • The response is a numerical variable.

The Null and Alternative Hypotheses

The null hypothesis is that all the group population means are the same. The alternative hypothesis is that at least one pair of means is different. For example, if there are k groups

H 0 : μ 1 = μ 2 = μ 3 = ... = μ k

H a : At least two of the group means μ 1 , μ 2 , μ 3 , ..., μ k are not equal. That is, μ i ≠ μ j for some i ≠ j .

The graphs, a set of box plots representing the distribution of values with the group means indicated by a horizontal line through the box, help in the understanding of the hypothesis test. In the first graph (red box plots), H 0 : μ 1 = μ 2 = μ 3 and the three populations have the same distribution if the null hypothesis is true. The variance of the combined data is approximately the same as the variance of each of the populations.

If the null hypothesis is false, then the variance of the combined data is larger, which is caused by the different means as shown in the second graph (green box plots).

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/13-1-one-way-anova

© Apr 16, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

ANOVA (Analysis of Variance)

ANOVA, short for Analysis of Variance, is a statistical method used to see if there are significant differences between the averages of three or more unrelated groups. This technique is especially useful when comparing more than two groups, which is a limitation of other tests like the t-test and z-test. For example, ANOVA can compare average IQ scores across several countries—like the US, Canada, Italy, and Spain—to see if nationality influences IQ scores. Ronald Fisher developed ANOVA in 1918, expanding the capabilities of previous tests by allowing for the comparison of multiple groups at once. This method is also referred to as Fisher’s analysis of variance, highlighting its ability to analyze how a categorical variable with multiple levels affects a continuous variable.

The use of ANOVA depends on the research design. Commonly, ANOVAs are used in three ways: one-way ANOVA , two-way ANOVA , and N-way ANOVA.

One-Way ANOVA

One-Way ANOVA is a statistical method used when we’re looking at the impact of one single factor on a particular outcome. For instance, if we want to explore how IQ scores vary by country, that’s where One-Way ANOVA comes into play. The “one way” part means we’re only considering one independent variable, which in this case is the country, but remember, this country variable can include any number of categories, from just two countries to twenty or more.

Two-Way ANOVA

Moving a step further, Two-Way ANOVA, also known as factorial ANOVA, allows us to examine the effect of two different factors on an outcome simultaneously. Building on our previous example, we could look at how both country and gender influence IQ scores. This method doesn’t just tell us about the individual effects of each factor but also lets us explore interactions between them. An interaction effect means the impact of one factor might change depending on the level of the other factor. For example, the difference in IQ scores between genders might vary from one country to another, suggesting that the effect of gender on IQ is not consistent across all countries.

N-Way ANOVA

When researchers have more than two factors to consider, they turn to N-Way ANOVA, where “n” represents the number of independent variables in the analysis. This could mean examining how IQ scores are influenced by a combination of factors like country, gender, age group, and ethnicity all at once. N-Way ANOVA allows for a comprehensive analysis of how these multiple factors interact with each other and their combined effect on the dependent variable, providing a deeper understanding of the dynamics at play.

In summary, ANOVA is a versatile statistical tool that scales from analyzing the effect of one factor (One-Way ANOVA) to multiple factors (Two-Way or N-Way ANOVA) on an outcome. By using ANOVA, researchers can uncover not just the direct effects of independent variables on a dependent variable but also how these variables interact with each other, offering rich insights into complex phenomena.

Looking for assistance with your research?

Schedule a time to speak with an expert using the calendar below.

User-friendly Software

Transform raw data to written interpreted results in seconds.

General Purpose and Procedure

Omnibus ANOVA test:

The null hypothesis for an ANOVA is that there is no significant difference among the groups. The alternative hypothesis assumes that there is at least one significant difference among the groups.  After cleaning the data, the researcher must test the assumptions of ANOVA. They must then calculate the F -ratio and the associated probability value ( p -value). In general, if the p -value associated with the F is smaller than .05, then the null hypothesis is rejected and the alternative hypothesis is supported. If the null hypothesis is rejected, one concludes that the means of all the groups are not equal. Post-hoc tests tell the researcher which groups are different from each other.

So what if you find statistical significance?  Multiple comparison tests

When you conduct an ANOVA, you are attempting to determine if there is a statistically significant difference among the groups. If you find that there is a difference, you will then need to examine where the group differences lay.

At this point you could run post-hoc tests which are t tests examining mean differences between the groups.  There are several multiple comparison tests that can be conducted that will control for Type I error rate, including the Bonferroni , Scheffe, Dunnet, and Tukey tests.

Research Questions the ANOVA Examines

One-way ANOVA: Are there differences in GPA by grade level (freshmen vs. sophomores vs. juniors)?

Two-way ANOVA: Are there differences in GPA by grade level (freshmen vs. sophomores vs. juniors) and gender (male vs. female)?

Data Level and Assumptions

The level of measurement of the variables and assumptions of the test play an important role in ANOVA. In ANOVA, the dependent variable must be a continuous (interval or ratio) level of measurement. The independent variables in ANOVA must be categorical (nominal or ordinal) variables. Like the t -test, ANOVA is also a parametric test and has some assumptions. ANOVA assumes that the data is normally distributed.  The ANOVA also assumes homogeneity of variance, which means that the variance among the groups should be approximately equal. ANOVA also assumes that the observations are independent of each other. Researchers should keep in mind when planning any study to look out for extraneous or confounding variables.  ANOVA has methods (i.e., ANCOVA) to control for confounding variables.

Testing of the Assumptions

  • The population from which samples are drawn should be normally distributed.
  • Independence of cases: the sample cases should be independent of each other.
  • Homogeneity of variance: Homogeneity means that the variance among the groups should be approximately equal.

These assumptions can be tested using statistical software (like Intellectus Statistics!). The assumption of homogeneity of variance can be tested using tests such as Levene’s test or the Brown-Forsythe Test.  Normality of the distribution of the scores can be tested using histograms, the values of skewness and kurtosis, or using tests such as Shapiro-Wilk or Kolmogorov-Smirnov. The assumption of independence can be determined from the design of the study.

It is important to note that ANOVA is not robust to violations to the assumption of independence. This is to say, that even if you violate the assumptions of homogeneity or normality, you can conduct the test and basically trust the findings. However, the results of the ANOVA are invalid if the independence assumption is violated. In general, with violations of homogeneity the analysis is considered robust if you have equal sized groups. With violations of normality, continuing with the ANOVA is generally ok if you have a large sample size .

Related Analyses: MANOVA and ANCOVA

Researchers have extended ANOVA in MANOVA and ANCOVA. MANOVA stands for the multivariate analysis of variance.  MANOVA is used when there are two or more dependent variables.  ANCOVA is the term for analysis of covariance. The ANCOVA is used when the researcher includes one or more covariate variables in the analysis.

Need more help?

Check out our online course for conducting an ANOVA here .

Algina, J., & Olejnik, S. (2003). Conducting power analyses for ANOVA and ANCOVA in between-subjects designs. Evaluation & the Health Professions, 26 (3), 288-314.

Cardinal, R. N., & Aitken, M. R. F. (2006). ANOVA for the behavioural sciences researcher . Mahwah, NJ: Lawrence Erlbaum Associates.

Davison, M. L., & Sharma, A. R. (1994). ANOVA and ANCOVA of pre- and post-test, ordinal data. Psychometrika, 59 (4), 593-600.

Levy, M. S., & Neill, J. W. (1990). Testing for lack of fit in linear multiresponse models based on exact or near replicates. Communications in Statistics – Theory and Methods, 19 (6), 1987-2002.

Tsangari, H., & Akritas, M. G. (2004). Nonparametric ANCOVA with two and three covariates. Journal of Multivariate Analysis, 88 (2), 298-319.

Turner, J. R., & Thayer, J. F. (2001). Introduction to analysis of variance: Design, analysis, & interpretation . Thousand Oaks, CA: Sage Publications.

Wilcox, R. R. (2005). An approach to ANCOVA that allows multiple covariates, nonlinearity, and heteroscedasticity. Educational and Psychological Measurement, 65 (3), 442-450.

Wright, D. B. (2006). Comparing groups in a before-after design: When t test and ANCOVA produce different results. British Journal of Educational Psychology, 76 , 663-675.

To Reference this Page :  Statistics Solutions. (2013). ANOVA . Retrieved from https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/anova/

Related Pages:

  • Conduct and Interpret a One-Way ANOVA
  • Conduct and Interpret a Factorial ANOVA
  • Conduct and Interpret a Repeated Measures ANOVA

Statistics Solutions can assist with your quantitative analysis by assisting you to develop your methodology and results chapters. The services that we offer include:

Data Analysis Plan

  • Edit your research questions and null/alternative hypotheses
  • Write your data analysis plan; specify specific statistics to address the research questions, the assumptions of the statistics, and justify why they are the appropriate statistics; provide references
  • Justify your sample size/power analysis, provide references
  • Explain your data analysis plan to you so you are comfortable and confident
  • Two hours of additional support with your statistician

Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling , Path analysis, HLM, Cluster Analysis )

  • Clean and code dataset
  • Conduct descriptive statistics (i.e., mean, standard deviation, frequency and percent, as appropriate)
  • Conduct analyses to examine each of your research questions
  • Write-up results
  • Provide APA 7 th edition tables and figures
  • Explain Chapter 4 findings
  • Ongoing support for entire results chapter statistics

Please call 727-442-4290 to request a quote based on the specifics of your research, schedule using the calendar on this page, or email [email protected]

One-Way Analysis of Variance (ANOVA)

  • First Online: 21 April 2020

Cite this chapter

alternative hypothesis analysis of variance

  • Jonathan Gillard 9  

Part of the book series: Springer Undergraduate Mathematics Series ((SUMS))

4123 Accesses

5 Citations

In this chapter, a method for the analysis of an experiment that has more than two groups of observations is described. The main objective is to determine if there are significant differences among the population means of the groups, which are assumed to be random samples from normally distributed populations. The analysis is based on an examination of variation between and within groups, and is often called the one-way analysis of variance (or one-way ANOVA for short). We explicitly consider all possible sources of variation before carefully explaining how an ANOVA is conducted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and affiliations.

School of Mathematics, Cardiff University, Cardiff, UK

Jonathan Gillard

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Jonathan Gillard .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Gillard, J. (2020). One-Way Analysis of Variance (ANOVA). In: A First Course in Statistical Inference. Springer Undergraduate Mathematics Series. Springer, Cham. https://doi.org/10.1007/978-3-030-39561-2_6

Download citation

DOI : https://doi.org/10.1007/978-3-030-39561-2_6

Published : 21 April 2020

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-39560-5

Online ISBN : 978-3-030-39561-2

eBook Packages : Mathematics and Statistics Mathematics and Statistics (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Number Theory
  • Data Structures
  • Cornerstones

Analysis of Variance (One-way ANOVA)

  • The data involved must be interval or ratio level data.
  • The populations from which the samples were obtained must be normally or approximately normally distributed.
  • The samples must be independent.
  • The variances of the populations must be equal (i.e., homogeneity of variance).

In the case where one is dealing with $k \ge 3$ samples all of the same size $n$, the calculations involved are much simpler, so let us consider this scenario first.

When Sample Sizes are Equal

The strategy behind an ANOVA test relies on estimating the common population variance in two different ways: 1) through the mean of the sample variances -- called the variance within samples and denoted $s^2_w$, and 2) through the variance of the sample means -- called the variance between samples and denoted $s^2_b$.

When the means are not significantly different, the variance of the sample means will be small, relative to the mean of the sample variances. When at least one mean is significantly different from the others, the variance of the sample means will be larger, relative to the mean of the sample variances.

Consequently, precisely when at least one mean is significantly different from the others, the ratio of these estimates $$F = \frac{s^2_b}{s^2_w}$$ which follows an $F$-distribution, will be large (i.e., somewhere in the right tail of the distribution).

To calculate the variance of the sample means, recall that the Central Limit Theorem tells us that $$\sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}}$$ Solving for the variance, $\sigma^2$, we find $$\sigma^2 = n\sigma^2_{\overline{x}}$$ Thus, we can estimate $\sigma^2$ with $$s^2_b = n s^2_{\overline{x}}$$

Calculating the mean of the sample variances is straight-forward, we simply average $s^2_1, s^2_2, \ldots, s^2_k$. Thus, $$s^2_w = \frac{\sum s^2_i}{k}$$

Given the construction of these two estimates for the common population variance, their quotient $$F = \frac{s^2_b}{s^2_w}$$ gives us a test statistic that follows an $F$-distribution with $k-1$ degrees of freedom associated with the numerator and $(n-1) + (n-1) + \cdots + (n-1) = k(n-1) = kn - k = N - k$ degrees of freedom associated with the denominator.

When Sample Sizes are Unequal

The grand mean of a set of samples is the total of all the data values divided by the total sample size (or as a weighted average of the sample means). $$\overline{X}_{GM} = \frac{\sum x}{N} = \frac{\sum n\overline{x}}{\sum n}$$

The total variation (not variance) is comprised the sum of the squares of the differences of each mean with the grand mean. $$SS(T) = \sum (x - \overline{X}_{GM})^2$$

The between group variation due to the interaction between the samples is denoted SS(B) for sum of squares between groups . If the sample means are close to each other (and therefore the grand mean) this will be small. There are k samples involved with one data value for each sample (the sample mean), so there are k-1 degrees of freedom. $$SS(B) = \sum n(\overline{x} - \overline{X}_{GM})^2$$

The variance between the samples, $s^2_b$ is also denoted by MS(B) for mean square between groups . This is the between group variation divided by its degrees of freedom. $$s^2_b = MS(B) = \frac{SS(B)}{k-1}$$

The within group variation due to differences within individual samples, denoted SS(W) for sum of squares within groups . Each sample is considered independently, so no interaction between samples is involved. The degrees of freedom is equal to the sum of the individual degrees of freedom for each sample. Since each sample has degrees of freedom equal to one less than their sample sizes, and there are $k$ samples, the total degrees of freedom is $k$ less than the total sample size: $df = N - k$. $$SS(W) = \sum df \cdot s^2$$

The variance within samples $s^2_w$ is also denoted by MS(W) for mean square within groups . This is the within group variation divided by its degrees of freedom. It is the weighted average of the variances (weighted with the degrees of freedom). $$s^2_w = MS(W) = \frac{SS(W)}{N-k}$$

Here again we find an $F$ test statistic by dividing the between group variance by the within group variance -- and as before, the degrees of freedom for the numerator are $(k-1)$ and the degrees of freedom for the denominator are $(N-k)$. $$F = \frac{s^2_b}{s^2_w}$$

All of this sounds like a lot to remember, and it is. However, the following table might prove helpful in organizing your thoughts: $$\begin{array}{l|c|c|c|c|} & \textrm{SS} & \textrm{df} & \textrm{MS} & \textrm{F}\\\hline \textrm{Between} & SS(B) & k-1 & \displaystyle{s^2_b = \frac{SS(B)}{k-1}} & \displaystyle{\frac{s^2_b}{s^2_w} = \frac{MS(B)}{MS(W)}}\\\hline \textrm{Within} & SS(W) & N-k & \displaystyle{s^2_w = \frac{SS(W)}{N-k}} & \\\hline \textrm{Total} & SS(W) + SS(B) & N-1 & & \\\hline \end{array}$$

Notice that each Mean Square is just the Sum of Squares divided by its degrees of freedom, and the F value is the ratio of the mean squares.

Importantly, one must not put the largest variance in the numerator, always divide the between variance by the within variance. If the between variance is smaller than the within variance, then the means are really close to each other and you will want to fail to reject the claim that they are all equal.

The null hypothesis is rejected if the test statistic from the table is greater than the F critical value with k-1 numerator and N-k denominator degrees of freedom.

If the decision is to reject the null, then the conclusion is that at least one of the means is different. However, the ANOVA test does not tell you where the difference lies. For this, you need another test, like the Scheffe' test described below, applied to every possible pairing of samples in the original ANOVA test.

The Scheffe Test

Lesson 10: Introduction to ANOVA

In the previous lessons, we learned how to perform inference for a population mean from one sample and also how to compare population means from two samples (independent and paired). In this Lesson, we introduce Analysis of Variance or ANOVA. ANOVA is a statistical method that analyzes variances to determine if the means from more than two populations are the same. In other words, we have a quantitative response variable and a categorical explanatory variable with more than two levels. In ANOVA, the categorical explanatory is typically referred to as the factor.

  • Describe the logic behind analysis of variance.
  • Set up and perform one-way ANOVA.
  • Identify the information in the ANOVA table.
  • Interpret the results from ANOVA output.
  • Perform multiple comparisons and interpret the results, when appropriate.

10.1 - Introduction to Analysis of Variance

Let's use the following example to look at the logic behind what an analysis of variance is after.

Application: Tar Content Comparisons

We want to see whether the tar contents (in milligrams) for three different brands of cigarettes are different. Two different labs took samples, Lab Precise and Lab Sloppy.

Lab Precise

Lab Precise took six samples from each of the three brands and got the following measurements:

Sample Brand A Brand B Brand C
1 10.21 11.32 11.60
2 10.25 11.20 11.90
3 10.24 11.40 11.80
4 9.80 10.50 12.30
5 9.77 10.68 12.20
6 9.73 10.90 12.20
Average \(\bar{y}_1= 10.00\) \(\bar{y}_2= 11.00\) \(\bar{y}_3= 12.00\)

Lab Precise Dotplot

Dotplot of the 3 brands for lab precise

Lab Sloppy also took six samples from each of the three brands and got the following measurements:

Sample Brand A Brand B Brand C
1 9.03 9.56 10.45
2 10.26 13.40 9.64
3 11.60 10.68 9.59
4 11.40 11.32 13.40
5 8.01 10.68 14.50
6 9.70 10.36 14.42
Average \(\bar{y}_1= 10.00\) \(\bar{y}_2= 11.00\) \(\bar{y}_3= 12.00\)

Lab Sloppy Dotplot

Dotplot of the six samples taken from each brand for the Sloppy Lab

The sample means from the two labs turned out to be the same and thus the differences in the sample means from the two labs are zero.

From which data set can you draw more conclusive evidence that the means from the three populations are different?

We need to compare the between-sample-variation to the within-sample-variation. Since the between-sample-variation from Lab Sloppy is large compared to the within-sample-variation for data from Lab Precise, we will be more inclined to conclude that the three population means are different using the data from Lab Precise. Since such analysis is based on the analysis of variances for the data set, we call this statistical method the Analysis of Variance (or ANOVA) .

10.2 - A Statistical Test for One-Way ANOVA

Before we go into the details of the test, we need to determine the null and alternative hypotheses. Recall that for a test for two independent means, the null hypothesis was \(\mu_1=\mu_2\). In one-way ANOVA, we want to compare \(t\) population means, where \(t>2\). Therefore, the null hypothesis for analysis of variance for \(t\) population means is:

\(H_0\colon \mu_1=\mu_2=...\mu_t\)

The alternative, however, cannot be set up similarly to the two-sample case. If we wanted to see if two population means are different, the alternative would be \(\mu_1\ne\mu_2\). With more than two groups, the research question is “Are some of the means different?." If we set up the alternative to be \(\mu_1\ne\mu_2\ne…\ne\mu_t\), then we would have a test to see if ALL the means are different. This is not what we want. We need to be careful how we set up the alternative. The mathematical version of the alternative is...

\(H_a\colon \mu_i\ne\mu_j\text{ for some }i \text{ and }j \text{ where }i\ne j\)

This means that at least one of the pairs is not equal. The more common presentation of the alternative is:

\(H_a\colon \text{ at least one mean is different}\) or \(H_a\colon \text{ not all the means are equal}\)

Recall that when we compare the means of two populations for independent samples, we use a 2-sample t -test with pooled variance when the population variances can be assumed equal.

For more than two populations, the test statistic, \(F\), is the ratio of between group sample variance and the within-group-sample variance. That is,

\(F=\dfrac{\text{between group variance}}{\text{within group variance}}\)

Under the null hypothesis (and with certain assumptions), both quantities estimate the variance of the random error, and thus the ratio should be close to 1. If the ratio is large, then we have evidence against the null, and hence, we would reject the null hypothesis.

In the next section, we present the assumptions for this test. In the following section, we present how to find the between group variance, the within group variance, and the F-statistic in the ANOVA table.

10.2.1 - ANOVA Assumptions

Assumptions for one-way anova test.

There are three primary assumptions in ANOVA:

  • The responses for each factor level have a normal population distribution .
  • These distributions have the same variance .
  • The data are independent .

A general rule of thumb for equal variances is to compare the smallest and largest sample standard deviations. This is much like the rule of thumb for equal variances for the test for independent means. If the ratio of these two sample standard deviations falls within 0.5 to 2, then it may be that the assumption is not violated.

Example 10-1: Tar Content Comparisons

Recall the application from the beginning of the lesson. We wanted to see whether the tar contents (in milligrams) for three different brands of cigarettes were different. Lab Precise and Lab Sloppy each took six samples from each of the three brands (A, B and C). Check the assumptions for this example.

The graph shows no obvious violations from Normal, but we should proceed with caution.

Normal probability plot of Brand A, B, and C

Descriptive Statistics: Precise Brand A, Precise Brand B, Precise Brand C

Variable

Mean

StDev

Precise Brand A

10.000

0.257

Precise Brand B

11.000

0.365

Precise Brand C

12.000

0.276

The smallest standard deviation is 0.257, and twice the value is 0.514. The largest standard deviation is less than this value. Since the sample sizes are the same, it is safe to assume the standard deviations (and thus the variances) are equal.

The samples were taken independently, so there is no indication that this assumption is violated.

The sample size is small. We should check for obvious violations using the Normal Probability Plot.

Normal probability plots for Sloppy Brands A, B and C.

Descriptive Statistics: Sloppy Brand A, Sloppy Brand B, Sloppy Brand C

Variable

Mean

StDev

Sloppy Brand A

10.000

1.384

Sloppy Brand B

11.000

1.308

Sloppy Brand C

12.000

2.360

The smallest standard deviation is 1.308, and twice the value is 2.616. The largest standard deviation is less than this value. Since the sample sizes are the same, it is safe to assume the standard deviations (and thus the variances) are equal.

10.2.2 - The ANOVA Table

In this section, we present the Analysis of Variance Table for a completely randomized design, such as the tar content example.

Random samples of size \(n_1, …, n_t\) are drawn from the respective \(t\) populations. The data would have the following format:

1

\(y_{11}\)

\(y_{12}\)

...

\(y_{1n_1}\)

\(\bar{y}_{1.}\)

2

\(y_{21}\)

\(y_{22}\)

...

\(y_{2n_2}\)

\(\bar{y}_{2.}\)

\(t\)

\(y_{t1}\)

\(y_{t2}\)

...

\(y_{tn_t}\)

\(\bar{y}_{t.}\)

\(t\): The total number of groups

\(y_{ij}\): The \(j^{th}\) observation from the \(i^{th}\) population.

\(n_i\): The sample size from the \(i^{th}\) population.

\(n_T\): The total sample size: \(n_T=\sum_{i=1}^t n_i\).

\(\bar{y}_{i.}\): The mean of the sample from the \(i^{th}\) population.

\(\bar{y}_{..}\): The mean of the combined data. Also called the overall mean.

Recall that we want to examine the between group variation and the within group variation. We can find an estimate of the variations with the following:

It can be derived that \(\text{TSS } = \text{ SST } + \text{ SSE}\).

We can set up the ANOVA table to help us find the F-statistic. Hover over the light bulb to get more information on that item.

The ANOVA Table

Source

Df

SS

MS

F

P-value

Treatment

\(t-1\)

\(\text{SST}\)

\(\text{MST}=\dfrac{\text{SST}}{t-1}\)

\(\dfrac{\text{MST}}{\text{MSE}}\)

Error

\(n_T-t\)

\(\text{SSE}\)

\(\text{MSE}=\dfrac{\text{SSE}}{n_T-t}\)

   

Total

\(n_T-1\)

\(\text{TSS}\)

     

The p-value is found using the F-statistic and the F-distribution. We will not ask you to find the p-value for this test. You will only need to know how to interpret it. If the p-value is less than our predetermined significance level, we will reject the null hypothesis that all the means are equal.

The ANOVA table can easily be obtained by statistical software and hand computation of such quantities are very tedious.

10.3 - Multiple Comparisons

If our test of the null hypothesis is rejected , we conclude that not all the means are equal: that is, at least one mean is different from the other means. The ANOVA test itself provides only statistical evidence of a difference, but not any statistical evidence as to which mean or means are statistically different.

For instance, using the previous example for tar content, if the ANOVA test results in a significant difference in average tar content between the cigarette brands, a follow up analysis would be needed to determine which brand mean or means differ in tar content. Plus we would want to know if one brand or multiple brands were better/worse than another brand in average tar content. To complete this analysis we use a method called multiple comparisons.

Multiple comparisons conducts an analysis of all possible pairwise means. For example, with three brands of cigarettes, A, B, and C, if the ANOVA test was significant, then multiple comparison methods would compare the three possible pairwise comparisons:

  • Brand A to Brand B
  • Brand A to Brand C
  • Brand B to Brand C

These are essentially tests of two means similar to what we learned previously in our lesson for comparing two means. However, the methods here use an adjustment to account for the number of comparisons taking place. Minitab provides three adjustment choices. We will use the Tukey adjustment which is an adjustment on the t-multiplier based on the number of comparisons.

Note! We don’t go in the theory behind the Tukey method. Just note that we only use a multiple comparison technique in ANOVA when we have a significant result.

In the next section, we present an example to walk through the ANOVA results.

Minitab ®

Using minitab to perform one-way anova.

If the data entered in Minitab are in different columns, then in Minitab we use:

  • Stat > ANOVA > One-Way
  • If the responses are in one column and the factors are in their own column, then select the drop down of 'Response data are in one column for all factor levels.'
  • If the responses are in their own column for each factor level, then select 'Response data are in a separate column for each factor level.'
  • Next, in case we have a significant ANOVA result, and we want to conduct a multiple comparison analysis, we preemptively click 'Comparisons', the box for Tukey, and verify that the boxes for 'Interval plot for differences of means' and 'Grouping Information' are also checked.
  • Click OK and OK again.

Example: Tar Content (ANOVA)

Test the hypothesis that the means are the same vs. at least one is different for both labs. Compare the two labs and comment.

We are testing the following hypotheses:

\(H_0\colon \mu_1=\mu_2=\mu_3\) vs \(H_a\colon\text{ at least one mean is different}\)

The assumptions were discussed in the previous example.

The following is the output for one-way ANOVA for Lab Precise:

One-way ANOVA: Precise A, Precise B, Precise C

Null Hypothesis All means are equal
Alternative Hypothesis Not all means are equal
Significance Level \(\alpha\)= 0.05

Equal variances were assumed for the analysis.

Factor Information

Factor Levels Values

Factor

3 Precise A, Precise B, Precise C

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Factor

2 12.000 6.00000 65.46 0.000
Error 15 1.375 0.09165    
Total 17 13.375      

Model Summary

S R-sq R-sq(adj) R-sq(pred)
0.302743 89.72% 88.35% 85.20%

The p-value for this test is less than 0.0001. At any reasonable significance level, we would reject the null hypothesis and conclude there is enough evidence in the data to suggest at least one mean tar content is different.

But which ones are different? The next step is to examine the multiple comparisons. Minitab provides the following output:

Factor N Mean StDev 95% CI
Precise A 6 10.000 0.257 (9.737, 10.263)
Precise B 6 11.000 0.365 (10.737, 11.263)
Precise C 6 12.000 0.276 (11.737, 12.263)

Pooled StDev = 0.302743

Tukey Pairwise Comparisons

Grouping information using the tukey method and 95% confidence.

Factor N Mean Grouping
Precise C 6 12.000 A
Precise B 6 11.000 B
Precise A 6 10.000 C

Means that do not share a letter are significantly different.

The Tukey pairwise comparisons suggest that all the means are different. Therefore, Brand C has the highest tar content and Brand A has the lowest.

We are testing the same hypotheses for Lab Sloppy as Lab Precise, and the assumptions were checked. The ANOVA output for Lab Sloppy is:

One-way ANOVA: Sloppy A, Sloppy B, Sloppy C

Factor Levels Values

Factor

3

Sloppy A, Sloppy B, Sloppy C

Source DF Adj SS Adj MS F-Value P-Value

Factor

2 12.00 6.000 1.96 0.176
Error 15 45.98 3.065    
Total 17 57.98      
S R-sq R-sq(adj) R-sq(pred)
1.75073 20.70% 10.12% 0.00%

The one-way ANOVA showed statistically significant results for Lab Precise but not for Lab Sloppy. Recall that ANOVA compares the within variation and the between variation. For Lab Precise, the within variation was small compared to the between variation. This resulted in a large F-statistic (65.46) and thus a small p-value. For Lab Sloppy, this ratio was small (1.96), resulting in a large p-value.

20 young pigs are assigned at random among 4 experimental groups. Each group is fed a different diet. (This design is a completely randomized design.) The data are the pig's weight, in kilograms, after being raised on these diets for 10 months ( pig_weights.txt ). We wish to determine whether the mean pig weights are the same for all 4 diets.

First, we set up our hypothesis test:

\(H_0\colon \mu_1=\mu_2=\mu_3=\mu_4\)

\(H_a\colon \text { at least one mean weight is different}\)

Here are the data that were obtained from the four experimental groups, as well as, their summary statistics:

Feed 1 Feed 2 Feed 3 Feed 4
60.8 68.3 102.6 87.9
57.1 67.7 102.2 84.7
65.0 74.0 100.5 83.2
58.7 66.3 97.5 85.8
61.8 69.9 98.9 90.3

Output from Minitab:

Descriptive statistics: feed 1, feed 2, feed 3, feed 4.

Variable

N

N* Mean StDev Minimum Maximum

Feed 1

5 0 60.68 3.03 57.10 65.00

Feed 2

5 0 69.24 2.96 66.30 74.00
Feed 3 5 0 100.34 2.16 97.50 102.60
Feed 4 5 0 86.38 2.78 83.20 90.30

The smallest standard deviation is 2.16, and the largest is 3.03. Since the rule of thumb is satisfied here, we can say the equal variance assumption is not violated. The description suggests that the samples are independent. There is nothing in the description to suggest the weights come from a normal distribution. The normal probability plots are:

Probability plot for feed 1-4. The plots show the trend line and 95% confidence interval lines.

There are no obvious violations from the normal assumption, but we should proceed with caution as the sample sizes are very small.

The ANOVA output is:

One-way ANOVA: Feed 1, Feed 2, Feed 3, Feed 4

Factor Levels Values

Factor

4 Feed 1, Feed 2, Feed 3, Feed 4
Source DF Adj SS Adj MS F-Value P-Value

Factor

3 4703.2 1567.73 206.72 0.000
Error 16 121.3 7.58    
Total 19 4824.5      
S R-sq R-sq(adj) R-sq(pred)
2.75386 97.48% 97.01% 96.07%

The p-value for the test is less than 0.001. With a significance level of 5%, we reject the null hypothesis. The data provide sufficient evidence to conclude that the mean weights of pigs from the four feeds are not all the same.

With a rejection of the null hypothesis leading us to conclude that not all the means are equal (i.e., at least the mean pig weight or one diet differs from the mean pig weight from the other diets) some follow up questions are:

  • "Which diet type results in different average pig weights?", and
  • "Is there one particular diet type that produces the largest/smallest mean weight?"

To answer these questions we analyze the multiple comparison output (the grouping information) and the interval graph.

Factor N Mean StDev 95% CI
Feed 1 5 60.68 3.03 (58.07, 63.29)
Feed 2 5 69.24 2.96 (66.63, 71.85)
Feed 3 5 100.340 2.164 (97.729, 102.951)
Feed 4 5 86.38 2.78 (83.77, 88.99)

Pooled StDev = 2.75386

Factor N Mean Grouping
Feed 3 5 100.340 A      
Feed 4 5 86.38   B    
Feed 2 5 69.24     C  
Feed 1 5 60.68       D

Each of these factor levels are associated with a grouping letter. If any factor levels have the same letter, then the multiple comparison method did not determine a significant difference between the mean response. For any factor level that does not share a letter, a significant mean difference was identified. From the lettering we see each Diet Type has a different letter, i.e. no two groups share a letter. Therefore, we can conclude that all four diets resulted in statistically significant different mean pig weights. Furthermore, with the order of the means also provided from highest to lowest, we can say that Feed 3 resulted in the highest mean weight followed by Feed 4, then Feed 2, then Feed 1. This grouping result is supported by the graph of the intervals.

Confidence Interval Plot of the comparisons between the feeds. None of the intervals cover zero which says the corresponding means are significantly different.

In analyzing the intervals, we reflect back on our lesson in comparing two means: if an interval contained zero, we could not conclude a difference between the two means; if the interval did not contain zero, then a difference between the two means was supported. With four factor levels, there are six possible pairwise comparisons. (Remember the binomial formula where we had the counter for the number of possible outcomes? In this case \(4\choose 2\) = 6). In inspecting each of these six intervals, we find that all six do NOT include zero. Therefore, there is a statistical difference between all four group means; the four types of diet resulted in significantly different mean pig weights.

10.4 - Two-Way ANOVA

The one-way ANOVA presented in the Lesson is a simple case. In practice, research questions are rarely this “simple.” ANOVA models become increasingly complex very quickly.

The two-way ANOVA model is briefly introduced here to give you an idea of what to expect in practice. Even two-way ANOVA can be too “simple” for practice.

In two-way ANOVA, there are two factors of interest. When there are two factors, the experimental units get a combination of treatments.

Suppose a researcher is interested in examining how different fertilizers affect the growth of plants. However, the researcher is also interested in the growth of different species of plant. Species is the second factor, making this a two-factor experiment. But... those of you with green thumbs say sometimes different fertilizers are more effective on different species of plants!

This is the idea behind two-way ANOVA. If you are interested in more complex ANOVA models, you should consider taking STAT 502 and STAT 503 .

10.5 - Summary

In this Lesson, we introduced One-way Analysis of Variance (ANOVA). The ANOVA test tests the hypothesis that the population means for the groups are the same against the hypothesis that at least one of the means is different. If the null hypothesis is rejected, we need to perform multiple comparisons to determine which means are different.

Logo for Maricopa Open Digital Press

14 Chapter 14: Analysis of Variance

Additional hypothesis tests.

In unit 1, we learned the basics of statistics – what they are, how they work, and the mathematical and conceptual principles that guide them. In unit 2, we put applied these principles to the process and ideas of hypothesis testing – how we take observed sample data and use it to make inferences about our populations of interest – using one continuous variable and one categorical variable. We will now continue to use this same hypothesis testing logic and procedure on new types of data. We will focus on group mean differences on more than two groups, using Analysis of Variance.

Analysis of variance, often abbreviated to ANOVA for short, serves the same purpose as the t -tests we learned earlier in unit 2: it tests for differences in group means. ANOVA is more flexible in that it can handle any number of groups, unlike t -tests which are limited to two groups (independent samples) or two time points (paired samples). Thus, the purpose and interpretation of ANOVA will be the same as it was for t -tests, as will the hypothesis testing procedure. However, ANOVA will, at first glance, look much different from a mathematical perspective, though as we will see, the basic logic behind the test statistic for ANOVA is actually the same.

ANOVA basics

An Analysis of Variance (ANOVA) is an inferential statistical tool that we use to find statistically significant differences among the means of two or more populations.

We calculate variance but the goal is still to compare population mean differences. The test statistic for the ANOVA is called F. It is a ratio of two estimates of the population variance based on the sample data.

Experiments are designed to determine if there is a cause and effect relationship between two variables. In the language of the ANOVA, the factor is the variable hypothesized to cause some change (effect) in the response variable (dependent variable).

An ANOVA conducted on a design in which there is only one factor is called a one-way ANOVA . If an experiment has two factors, then the ANOVA is called a two-way ANOVA . For example, suppose an experiment on the effects of age and gender on reading speed were conducted using three age groups (8 years, 10 years, and 12 years) and the two genders (male and female). The factors would be age and gender. Age would have three levels and gender would have two levels. ANOVAs can also be used for within-group/repeated and between subjects designs.  For this chapter we will focus on between subject one-way ANOVA .

In a One-Way ANOVA we compare two types of variance: the variance between groups and the variance within groups, which we will discuss in the next section.

Observing and Interpreting Variability

We have seen time and again that scores, be they individual data or group means, will differ naturally. Sometimes this is due to random chance, and other times it is due to actual differences. Our job as scientists, researchers, and data analysts is to determine if the observed differences are systematic and meaningful (via a hypothesis test) and, if so, what is causing those differences. Through this, it becomes clear that, although we are usually interested in the mean or average score, it is the variability in the scores that is key.

Take a look at figure 1, which shows scores for many people on a test of skill used as part of a job application. The x-axis has each individual person, in no particular order, and the y-axis contains the score each person received on the test. As we can see, the job applicants differed quite a bit in their performance, and understanding why that is the case would be extremely useful information. However, there’s no interpretable pattern in the data, especially because we only have information on the test, not on any other variable (remember that the x-axis here only shows individual people and is not ordered or interpretable).

image

Figure 1. Scores on a job test

Our goal is to explain this variability that we are seeing in the dataset. Let’s assume that as part of the job application procedure we also collected data on the highest degree each applicant earned. With knowledge of what the job requires, we could sort our applicants into three groups: those applicants who have a college degree related to the job, those applicants who have a college degree that is not related to the job, and those applicants who did not earn a college degree. This is a common way that job applicants are sorted, and we can use ANOVA to test if these groups are actually different. Figure 2 presents the same job applicant scores, but now they are color coded by group membership (i.e. which group they belong in). Now that we can differentiate between applicants this way, a pattern starts to emerge: those applicants with a relevant degree (coded red) tend to be near the top, those applicants with no college degree (coded black) tend to be near the bottom, and the applicants with an unrelated degree (coded green) tend to fall into the middle. However, even within these groups, there is still some variability, as shown in Figure 2.

image

Figure 2. Applicant scores coded by degree earned

This pattern is even easier to see when the applicants are sorted and organized into their respective groups, as shown in Figure 3.

image

Figure 3. Applicant scores by group

Now that we have our data visualized into an easily interpretable format, we can clearly see that our applicants’ scores differ largely along group lines. Those applicants who do not have a college degree received the lowest scores, those who had a degree relevant to the job received the highest scores, and those who did have a degree but one that is not related to the job tended to fall somewhere in the middle. Thus, we have systematic variance between our groups.

The process and analyses used in ANOVA will take these two sources of variance (systematic variance between groups and random error within groups, or how much groups differ from each other and how much people differ within each group) and compare them to one another to determine if the groups have any explanatory value in our outcome variable. By doing this, we will test for statistically significant differences between the group means, just like we did for t – tests. We will go step by step to break down the math to see how ANOVA actually works.

alternative hypothesis analysis of variance

Sources of Variance

ANOVA is all about looking at the different sources of variance (i.e. the reasons that scores differ from one another) in a dataset. Fortunately, the way we calculate these sources of variance takes a very familiar form: the Sum of Squares. Before we get into the calculations themselves, we must first lay out some important terminology and notation.

In ANOVA, we are working with two variables, a grouping or explanatory variable and a continuous outcome variable . The grouping variable is our predictor (it predicts or explains the values in the outcome variable) or, in experimental terms, our independent variable , and it made up of k groups, with k being any whole number 2 or greater. That is, ANOVA requires two or more groups to work, and it is usually conducted with three or more. In ANOVA, we refer to groups as “levels”, so the number of levels is just the number of groups, which again is k . In the above example, our grouping variable was education, which had 3 levels, so k = 3. When we report any descriptive value (e.g. mean, sample size, standard deviation) for a specific group, we will use a subscript 1… k to denote which group it refers to. For example, if we have three groups and want to report the standard deviation s for each group, we would report them as s 1 , s 2 , and s 3 .

Our second variable is our outcome variable . This is the variable on which people differ, and we are trying to explain or account for those differences based on group membership. In the example above, our outcome was the score each person earned on the test. Our outcome variable will still use X for scores as before. When describing the outcome variable using means, we will use subscripts to refer to specific group means. So if we have k = 3 groups, our means will be ̅X̅1̅, ̅X̅2̅, and ̅X̅3̅. We will also have a single mean representing the average of all participants across all groups. This is known as the grand mean , and we use the symbol X̅G. These different means – the individual group means and the overall grand mean –will be how we calculate our sums of squares.

Finally, we now have to differentiate between several different sample sizes. Our data will now have sample sizes for each group, and we will denote these with a lower case “n” and a subscript, just like with our other descriptive statistics: n 1 , n 2 , and n 3 . We also have the overall sample size in our dataset, and we will denote this with a capital N. The total sample size (N) is just the group sample sizes added together.

Between Groups Sum of Squares

One source of variability we can identified in Figure 3 of the above example was differences or variability between the groups. That is, the groups clearly had different average levels. The variability arising from these differences is known as the between groups variability, and it is quantified using Between Groups Sum of Squares.

Our calculations for sums of squares in ANOVA will take on the same form as it did for regular calculations of variance. Each observation, in this case the group means, is compared to the overall mean, in this case the grand mean, to calculate a deviation score. These deviation scores are squared so that they do not cancel each other out and sum to zero. The squared deviations are then added up, or summed. There is, however, one small difference. Because each group mean represents a group composed of multiple people, before we sum the deviation scores we must multiple them by the number of people within that group. Incorporating this, we find our equation for Between Groups Sum of Squares.

alternative hypothesis analysis of variance

Within Groups Sum of Squares

The other source of variability in the figures comes from differences that occur within each group. That is, each individual deviates a little bit from their respective group mean, just like the group means differed from the grand mean. We therefore label this source the Within Groups Sum of Squares. Because we are trying to account for variance based on group-level means, any deviation from the group means indicates an inaccuracy or error. Thus, our within groups variability represents our error in ANOVA.

alternative hypothesis analysis of variance

We can see that our Total Sum of Squares is just each individual score minus the grand mean. As with our Within Groups Sum of Squares, we are calculating a deviation score for each individual person, so we do not need to multiply anything by the sample size; that is only done for Between Groups Sum of Squares.

alternative hypothesis analysis of variance

This will prove to be very convenient, because if we know the values of any two of our sums of squares, it is very quick and easy to find the value of the third. It is also a good way to check calculations: if you calculate each SS by hand, you can make sure that they all fit together as shown above, and if not, you know that you made a math mistake somewhere.

We can see from the above formulas that calculating an ANOVA by hand from raw data can take a very, very long time. For this reason, you will not be required to calculate the SS values by hand, but you should still take the time to understand how they fit together and what each one represents to ensure you understand the analysis itself.

ANOVA Table

All of our sources of variability fit together in meaningful, interpretable ways as we saw above, and the easiest way to do this is to organize them into a table. The ANOVA table, shown in Table 1, is how we calculate our test statistic.

Source

SS

df

MS

F

Between

SS

k-1

Within

SS

N-k

Total

SS

N-1

  (MS is variance)

Table 1. ANOVA table.

The first column of the ANOVA table, labeled “Source”, indicates which of our sources of variability we are using: between groups, within groups, or total. The second column, labeled “SS”, contains our values for the sums of squares that we learned to calculate above. As noted previously, calculating these by hand takes too long, and so the formulas are not presented in Table 1. However, remember that the Total is the sum of the other two, in case you are only given two SS values and need to calculate the third.

The next column in Table 1, labeled “df”, is our degrees of freedom. As with the sums of squares, there is a different df for each group, and the formulas are presented in the table. Notice that the total degrees of freedom, N – 1, is the same as it was for our regular variance. This matches the SS T formulation to again indicate that we are simply taking our familiar variance term and breaking it up into difference sources. Also remember that the capital N in the df calculations refers to the overall sample size, not a specific group sample size. Notice that the total row for degrees of freedom, just like for sums of squares, is just the Between and Within rows added together. If you take N – k + k – 1, then the “– k” and “+ k” portions will cancel out, and you are left with N – 1. This is a convenient way to quickly check your calculations.

The third column, labeled “MS”, is our Mean Squares for each source of variance. A “mean square” is just another way to say variability. Each mean square is calculated by dividing the sum of squares by its corresponding degrees of freedom. Notice that we do this for the Between row and the Within row, but not for the Total row. There are two reasons for this. First, our Total Mean Square would just be the variance in the full dataset (put together the formulas to see this for yourself), so it would not be new information. Second, the Mean Square values for Between and Within would not add up to equal the Mean Square Total because they are divided by different denominators. This is in contrast to the first two columns, where the Total row was both the conceptual total (i.e. the overall variance and degrees of freedom) and the literal total of the other two rows.

The final column in the ANOVA table (Table 1), labeled “F”, is our test statistic for ANOVA. The F statistic, just like a t – or z -statistic, is compared to a critical value to see whether we can reject for fail to reject a null hypothesis. Thus, although the calculations look different for ANOVA, we are still doing the same thing that we did in all of Unit 2. We are simply using a new type of data to test our hypotheses. We will see what these hypotheses look like shortly, but first, we must take a moment to address why we are doing our calculations this way.

alternative hypothesis analysis of variance

We will typically work from having Sum of Squares calculated, but here are the basic formulas for the 3 types of Sum of Squares for the ANOVA:

  • Total sum of squares (SS T ): ∑ x 2 – (∑x)2/n
  • Within sum of squares (SS W ): add up the sum of squares for each treatment condition
  • Between sum of squares (SS B ): SST – SSW = SSB

While there are other ways to calculate the SSs, these are the formulas we can use for this class if needed.                                                              

ANOVA and Type I Error

You may be wondering why we do not just use another t -test to test our hypotheses about three or more groups the way we did in Unit 2. After all, we are still just looking at group mean differences. The reason is that our t -statistic formula can only handle up to two groups, one minus the other. With only two groups, we can move our population parameters for the group means around in our null hypothesis and still get the same interpretation: the means are equal, which can also be concluded if one mean minus the other mean is equal to zero. However, if we tried adding a third mean, we would no longer be able to do this. So, in order to use t – tests to compare three or more means, we would have to run a series of individual group comparisons.

For only three groups, we would have three t -tests: group 1 vs group 2, group 1 vs group 3, and group 2 vs group 3. This may not sound like a lot, especially with the advances in technology that have made running an analysis very fast, but it quickly scales up. With just one additional group, bringing our total to four, we would have six comparisons: group 1 vs group 2, group 1 vs group 3, group 1 vs group 4, group 2 vs group 3, group 2 vs group 4, and group 3 vs group 4. This makes for a logistical and computation nightmare for five or more groups. When we reject the null hypothesis in a one-way ANOVA, we conclude that the group means are not all the same in the population. But this can indicate different things. With three groups, it can indicate that all three means are significantly different from each other. Or it can indicate that one of the means is significantly different from the other two, but the other two are not significantly different from each other. For this reason, statistically significant one-way ANOVA results are typically followed up with a series of post hoc comparisons of selected pairs of group means to determine which are different from which others.

A bigger issue, however, is our probability of committing a Type I Error. Remember that a Type I error is a false positive, and the chance of committing a Type I error is equal to our significance level, α. This is true if we are only running a single analysis (such as a t -test with only two groups) on a single dataset.

However, when we start running multiple analyses on the same dataset, our Type I error rate increases, raising the probability that we are capitalizing on random chance and rejecting a null hypothesis when we should not. ANOVA, by comparing all groups simultaneously with a single analysis, averts this issue and keeps our error rate at the α we set.

Hypotheses in ANOVA

So far we have seen what ANOVA is used for, why we use it, and how we use it. Now we can turn to the formal hypotheses we will be testing. As with before, we have a null and an alternative hypothesis to lay out. Our null hypothesis is still the idea of “no difference” in our data. Because we have multiple group means, we simply list them out as equal to each other:

H 0 : There is no difference in the group means.  H0: µ1 = µ2 = µ3

We list as many μ parameters as groups we have. In the example above, we have three groups to test (k = 3), so we have three parameters in our null hypothesis. If we had more groups, say, four, we would simply add another μ to the list and give it the appropriate subscript, giving us: H0: µ1 = µ2 = µ3 = µ4. Notice that we do not say that the means are all equal to zero, we only say that they are equal to one another; it does not matter what the actual value is, so long as it holds for all groups equally.

Our alternative hypothesis for ANOVA is a little bit different. Let’s take a look at it and then dive deeper into what it means:

H A : At least 1 mean is different

The first difference in obvious: there is no mathematical statement of the alternative hypothesis in ANOVA. This is due to the second difference: we are not saying which group is going to be different, only that at least one will be. Because we do not hypothesize about which mean will be different, there is no way to write it mathematically. Related to this, we do not have directional hypotheses (greater than or less than) like we did with the z-statistic and t- statistics. Due to this, our alternative hypothesis is always exactly the same: at least one mean is different.

With t-tests, we saw that, if we reject the null hypothesis, we can adopt the alternative, and this made it easy to understand what the differences looked like. In ANOVA, we will still adopt the alternative hypothesis as the best explanation of our data if we reject the null hypothesis. However, when we look at the alternative hypothesis, we can see that it does not give us much information. We will know that a difference exists somewhere, but we will not know where that difference is. The ANOVA is an  ominous test meaning you just know there are differences.  More specifically, at least 1 group is different from the rest.  Is only group 1 different but groups 2 and 3 the same? Is it only group 2? Are all three of them different? Based on just our alternative hypothesis, there is no way to be sure. We will come back to this issue later and see how to find out specific differences. For now, just remember that we are testing for any difference in group means, and it does not matter where that difference occurs. Now that we have our hypotheses for ANOVA, let’s work through an example. We will continue to use the data from Figures 1 through 3 for continuity.

Example: Scores on Job Application Tests

Our data come from three groups of 10 people each, all of whom applied for a single job opening: those with no college degree, those with a college degree that is not related to the job opening, and those with a college degree from a relevant field. We want to know if we can use this group membership to account for our observed variability and, by doing so, test if there is a difference between our three group means (k = 3). We will follow the same steps for hypothesis testing as we did in previous chapters.  Let’s start, as always, with our hypotheses.

Step 1: State the Hypotheses

Our hypotheses are concerned with the means of groups based on education level, so:

H 0 : There is no difference between educational levels. H0: µ1 = µ2 = µ3

H A : At least 1 educational level is different.

Again, we phrase our null hypothesis in terms of what we are actually looking for, and we use a number of population parameters equal to our number of groups. Our alternative hypothesis is always exactly the same.

Step 2: Find the Critical Value

Our test statistic for ANOVA, as we saw above, is F . Because we are using a new test statistic, we will get a new table: the F distribution table, the top of which is shown in Figure 4:

image

Figure 4. F distribution table.

The F table only displays critical values for α = 0.05. This is because other significance levels are uncommon and so it is not worth it to use up the space to present them. There are now two degrees of freedom we must use to find our critical value: Numerator and Denominator. These correspond to the numerator and denominator of our test statistic, which, if you look at the ANOVA table presented earlier, are our Between Groups and Within Groups rows, respectively. The df B is the “Degrees of Freedom: Numerator” because it is the degrees of freedom value used to calculate the Mean Square Between, which in turn was the numerator of our F statistic. Likewise, the df W is the “df denom.” (short for denominator) because it is the degrees of freedom value used to calculate the Mean Square Within, which was our denominator for F .

The formula for df B is k – 1, and remember that k is the number of groups we are assessing. In this example, k = 3 so our df B = 2. This tells us that we will use the second column, the one labeled 2, to find our critical value. To find the proper row, we simply calculate the df W , which was N – k. The original prompt told us that we have “three groups of 10 people each,” so our total sample size is 30. This makes our value for df W = 27. If we follow the second column down to the row for 27, we find that our critical value is 3.35. We use this critical value the same way as we did before: it is our criterion against which we will compare our obtained test statistic to determine statistical significance.

Step 3: Calculate the Test Statistic

Now that we have our hypotheses and the criterion we will use to test them, we can calculate our test statistic. To do this, we will fill in the ANOVA table. When we do so, we will work our way from left to right, filling in each cell to get our final answer. Here will be are basic steps for calculating ANOVA:

  • 3 Sum of Square calculations
  • 3 degrees of freedom calculations
  • 2 variance calculations
  • 1 F – score

We will assume that we are given the SS values as shown below:

Source

SS

df

MS

F

Between

8246

Within

3020

Total

These may seem like random numbers, but remember that they are based on the distances between the groups themselves and within each group. Figure 5 shows the plot of the data with the group means and grand mean included. If we wanted to, we could use this information, combined with our earlier information that each group has 10 people, to calculate the Between Groups Sum of Squares by hand.

However, doing so would take some time, and without the specific values of the data points, we would not be able to calculate our Within Groups Sum of Squares, so we will trust that these values are the correct ones.

image

Figure 5. Means

We were given the sums of squares values for our first two rows, so we can use those to calculate the Total Sum of Squares.

Source

SS

df

MS

F

Between

8246

Within

3020

Total

8246+3020=11266

We also calculated our degrees of freedom earlier, so we can fill in those values. Additionally, we know that the total degrees of freedom is N – 1, which is 29. This value of 29 is also the sum of the other two degrees of freedom, so everything checks out.

Source

SS

df

MS

F

Between

8246

3-1=2

Within

3020

29-2=27

Total

11266

30-1=29

Now we have everything we need to calculate our mean squares. Our MS values for each row are just the SS divided by the df for that row, giving us:

Source

SS

df

MS

F

Between

8246

2

8246/2 =   4123

Within

3020

27

3020/27 =111.85

Total

11266

29

Remember that we do not calculate a Total Mean Square, so we leave that cell blank. Finally, we have the information we need to calculate our test statistic. F is our MS B divided by MS W .

Source

SS

df

MS

F

Between

8246

2

4123

36.86

Within

3020

27

111.85

Total

11266

29

So, working our way through the table given only two SS values and the sample size and group size given before, we calculate our test statistic to be F obt = 36.86, which we will compare to the critical value in step 4.

Step 4: Make a decision

Our obtained test statistic was calculated to be F obt = 36.86 and our critical value was found to be F * = 3.35. Our obtained statistic is larger than our critical value, so we can reject the null hypothesis.

Reject H0. Based on our 3 groups of 10 people, we can conclude that job test scores are statistically significantly different based on education level, F (2,27) = 36.86, p < .05.

Notice that when we report F , we include both degrees of freedom. We always report the numerator then the denominator, separated by a comma. We must also note that, because we were only testing for any difference, we cannot yet conclude which groups are different from the others. We will do so shortly, but first, because we found a statistically significant result, we need to calculate an effect size to see how big of an effect we found.

Effect Size: Variance Explained

Recall that the purpose of ANOVA is to take observed variability and see if we can explain those differences based on group membership. To that end, our effect size will be just that: the variance explained. You can think of variance explained as the proportion or percent of the differences we are able to account for based on our groups. We know that the overall observed differences are quantified as the Total Sum of Squares, and that our observed effect of group membership is the Between Groups Sum of Squares. Our effect size, therefore, is the ratio of these to sums of squares.

alternative hypothesis analysis of variance

Eta-square is reported as percentage of variance of the outcome/dependent variable explained by the predictor/independent variable.

Although you report variance explained by the predictor/independent variable, you can also use the 𝜂2 guidelines for effect size:

𝜂2

Size

0.01

Small

0.09

Medium

0.25

Large

Note: if less than .01, no effect is reported

Example continued adding on effect size for scores on job application tests

For our example, SS B =8246 and SS T = 11266, our values give an effect size, 𝜂2, of:

alternative hypothesis analysis of variance

So, we are able to explain 73% of the variance in job test scores based on education. This is, in fact, a huge effect size, and most of the time we will not explain nearly that much variance.

So, we found that not only do we have a statistically significant result, but that our observed effect was very large! However, we still do not know specifically which groups are different from each other. It could be that they are all different, or that only those who have a relevant degree are different from the others, or that only those who have no degree are different from the others. To find out which is true, we need to do a special analysis called a post hoc test.

Post Hoc Tests

A post hoc test is used only after we find a statistically significant result and need to determine where our differences truly came from. The term “post hoc” comes from the Latin for “after the event”. There are many different post hoc tests that have been developed, and most of them will give us similar answers.

Bonferroni Test

A Bonferroni test is perhaps the simplest post hoc analysis. A Bonferroni test is a series of t -tests performed on each pair of groups. As we discussed earlier, the number of groups quickly grows the number of comparisons, which inflates Type I error rates. To avoid this, a Bonferroni test divides our significance level α by the number of comparisons we are making so that when they are all run, they sum back up to our original Type I error rate. Once we have our new significance level, we simply run independent samples t -tests to look for difference between our pairs of groups. This adjustment is sometimes called a Bonferroni Correction, and it is easy to do by hand if we want to compare obtained p -values to our new corrected α level, but it is more difficult to do when using critical values like we do for our analyses so we will leave our discussion of it to that.

Tukey’s Honest Significant Difference

Tukey’s Honest Significant Difference (HSD) is a very popular post hoc analysis. This analysis, like Bonferroni’s, makes adjustments based on the number of comparisons, but it makes adjustments to the test statistic when running the comparisons of two groups. These comparisons give us an estimate of the difference between the groups and a confidence interval for the estimate. We use this confidence interval in the same way that we use a confidence interval for a regular independent samples t -test: if it contains 0.00, the groups are not different, but if it does not contain 0.00 then the groups are different.

Example continued adding on post hoc for scores on job application tests: Tukey

Remember we are comparing scores from those whom applied for a single job opening: those with no college degree (none), those with a college degree that is not related to the job opening (unrelated), and those with a college degree from a relevant field (relevant).

Below are the differences between the group means and the Tukey’s HSD confidence intervals for the differences:

Comparison

Difference

Tukey’s HSD CI

None vs Relevant

40.60

(28.87, 52.33)

None vs Unrelated

19.50

(7.77, 31.23)

Relevant vs Unrelated

21.10

(9.37, 32.83)

As we can see, none of these intervals contain 0.00, so we can conclude that all three groups are different from one another.

Scheffe’s Test

Another common post hoc test is Scheffe’s Test. Like Tukey’s HSD, Scheffe’s test adjusts the test statistic for how many comparisons are made, but it does so in a slightly different way. The result is a test that is “conservative,” which means that it is less likely to commit a Type I Error, but this comes at the cost of less power to detect effects. We can see this by looking at the confidence intervals that Scheffe’s test gives us for our example.

Example continued adding on post hoc for scores on job application tests: Scheffe

Below are the differences between the group means and the Sheffe confidence intervals for the differences:

Comparison

Difference

Scheffe’s CI

None vs Relevant

40.60

(28.35, 52.85)

None vs Unrelated

19.50

(7.25, 31.75)

Relevant vs Unrelated

21.10

(8.85, 33.35)

As we can see, these are slightly wider than the intervals we got from Tukey’s HSD. This means that, all other things being equal, they are more likely to contain zero. In our case, however, the results are the same, and we again conclude that all three groups differ from one another.

There are many more post hoc tests than just these three, and they all approach the task in different ways, with some being more conservative and others being more powerful. In general, though, they will give highly similar answers. What is important here is to be able to interpret a post hoc analysis. If you are given post hoc analysis confidence intervals, like the ones seen above, read them the same way we read confidence intervals previously comparing two groups: if they contain zero, there is no difference; if they do not contain zero, there is a difference.

Other ANOVA Designs

We have only just scratched the surface on ANOVA in this chapter. There are many other variations available for the one-way ANOVA presented here. There are also other types of ANOVAs that you are likely to encounter. The first is called a factorial ANOVA . Factorial ANOVAs use multiple grouping variables, not just one, to look for group mean differences. Just as there is no limit to the number of groups in a one-way ANOVA, there is no limit to the number of grouping variables in a Factorial ANOVA, but it becomes very difficult to find and interpret significant results with many factors, so usually they are limited to two or three grouping variables with only a small number of groups in each. Another ANOVA is called a Repeated Measures ANOVA. This is an extension of a repeated measures or matched pairs t -test, but in this case we are measuring each person three or more times to look for a change. We can even combine both of these advanced ANOVAs into mixed designs to test very specific and valuable questions. These topics are far beyond the scope of this text, but you should know about their existence. Our treatment of ANOVA here is a small first step into a much larger world!

Learning Objectives

Having read the chapter, students should be able to:

  • understand the basic purpose for analysis of variance (ANOVA) and the general logic that underlies the statistical procedure
  • perform an ANOVA to evaluate data from a single factor, between subjects research design
  • understand when post hoc tests are necessary and purpose that they serve
  • calculate and interpret effect size

Exercises – Ch. 14

  • What are the three pieces of variance analyzed in ANOVA?
  • What does rejecting the null hypothesis in ANOVA tell us? What does it not tell us?
  • What is the purpose of post hoc tests?
  • Based on the ANOVA table below, do you reject or fail to reject the null hypothesis? What is the effect size?

Source

SS

df

MS

F

Between

60.72

3

20.24

3.88

Within

213.61

41

5.21

Total

274.33

44

5. Finish filling out the following ANOVA tables:

Problem 1: N = 14

Source

SS

df

MS

F

Between

2

14.10

Within

Total

64.65

Source

SS

df

MS

F

Between

2

42.36

Within

54

2.48

Total

6. You know that stores tend to charge different prices for similar or identical products, and you want to test whether or not these differences are, on average, statistically significantly different. You go online and collect data from 3 different stores, gathering information on 15 products at each store. You find that the average prices at each store are: Store 1 M = $27.82, Store 2 M= $38.96, and Store 3 M = $24.53. Based on the overall variability in the products and the variability within each store, you find the following values for the Sums of Squares: SST = 683.22, SSW = 441.19. Complete the ANOVA table and use the 4 step hypothesis testing procedure to see if there are systematic price differences between the stores.

7. You and your friend are debating which type of candy is the best. You find data on the average rating for hard candy (e.g. jolly ranchers, ̅X = 3.60), chewable candy (e.g. starburst, ̅X = 4.20), and chocolate (e.g. snickers, ̅X = 4.40); each type of candy was rated by 30 people. Test for differences in average candy rating using SSB = 16.18 and SSW = 28.74.

8. Administrators at a university want to know if students in different majors are more or less extroverted than others. They provide you with data they have for English majors (̅X = 3.78, n = 45), History majors (̅X = 2.23, n = 40), Psychology majors (̅X = 4.41, n = 51), and Math majors (̅X = 1.15, n = 28). You find the SSB = 75.80 and SSW = 47.40 and test at α = 0.05.

9. You are assigned to run a study comparing a new medication (̅X = 17.47, n = 19), an existing medication (̅X = 17.94, n = 18), and a placebo (̅X = 13.70, n= 20), with higher scores reflecting better outcomes. Use SSB = 210.10 and SSW = 133.90 to test for differences.

10. You are in charge of assessing different training methods for effectiveness. You have data on 4 methods: Method 1 (̅X = 87, n = 12), Method 2 (̅X = 92, n = 14), Method 3 (̅X = 88, n = 15), and Method 4 (̅X = 75, n = 11). Test for differences among these means, assuming SSB = 64.81 and SST = 399.45.

Answers to Odd- Numbered Exercises – Ch. 14

1. Variance between groups (SSB), variance within groups (SSW) and total variance (SST).

3. Post hoc tests are run if we reject the null hypothesis in ANOVA; they tell us which specific group differences are significant. 5. Finish filling out the following ANOVA tables:

Source

SS

df

MS

F

Between

28.20

2

14.10

4.26

Within

36.45

11

3.31

Total

64.65

13

Source

SS

df

MS

F

Between

210.10

2

105.05

42.36

Within

133.92

54

2.48

Total

344.02

7. Step 1: H 0 : μ 1 = μ 2 = μ 3 “There is no difference in average rating of candy quality”, H A : “At least one mean is different.”

Step 3: based on the given SSB and SSW and the computed df from step 2, is:

Source

SS

df

MS

F

Between

16.18

2

8.09

24.52

Within

28.74

87

0.33

Total

44.92

89

Step 4: F > F*, reject H 0 . Based on the data in our 3 groups, we can say that there is a statistically significant difference in the quality of different types of candy, F(2,87) = 24.52, p < .05. Since the result is significant, we need an effect size: η 2 = 16.18/44.92 = .36, which is a large effect.

Source

SS

df

MS

F

Between

210.10

2

105.02

42.36

Within

133.90

54

2.48

Total

344.00

56

Step 4: F > F*, reject H 0 . Based on the data in our 3 groups, we can say that there is a statistically significant difference in the effectiveness of the treatments, F(2,54) = 42.36, p < .05. Since the result is significant, we need an effect size: η 2 = 210.10/344.00 = .61, which is a large effect.

Introduction to Statistics for Psychology Copyright © 2021 by Alisa Beyer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Logo for Milne Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 5: One-Way Analysis of Variance

Previously, we have tested hypotheses about two population means. This chapter examines methods for comparing more than two means. Analysis of variance (ANOVA) is an inferential method used to test the equality of three or more population means.

H 0 : µ 1 = µ 2 = µ 3 = …=µ k

This method is also referred to as single-factor ANOVA because we use a single property, or characteristic, for categorizing the populations. This characteristic is sometimes referred to as a treatment or factor.

A treatment (or factor) is a property, or characteristic, that allows us to distinguish the different populations from one another.

The objects of ANOVA are (1) estimate treatment means, and the differences of treatment means; (2) test hypotheses for statistical significance of comparisons of treatment means, where “treatment” or “factor” is the characteristic that distinguishes the populations.

For example, a biologist might compare the effect that three different herbicides may have on seed production of an invasive species in a forest environment. The biologist would want to estimate the mean annual seed production under the three different treatments, while also testing to see which treatment results in the lowest annual seed production. The null and alternative hypotheses are:

H0: µ1= µ2= µ3 H1: at least one of the means is significantly different from the others

It would be tempting to test this null hypothesis H 0 : µ 1 = µ 2 = µ 3 by comparing the population means two at a time. If we continue this way, we would need to test three different pairs of hypotheses:

H0: µ1= µ2 AND H0: µ1= µ3 AND H0: µ2= µ3
H1: µ1≠ µ2   H1: µ1≠ µ3   H1: µ2≠ µ3

If we used a 5% level of significance, each test would have a probability of a Type I error (rejecting the null hypothesis when it is true) of α = 0.05. Each test would have a 95% probability of correctly not rejecting the null hypothesis. The probability that all three tests correctly do not reject the null hypothesis is 0.95 3 = 0.86. There is a 1 – 0.95 3 = 0.14 (14%) probability that at least one test will lead to an incorrect rejection of the null hypothesis. A 14% probability of a Type I error is much higher than the desired alpha of 5% (remember: α is the same as Type I error). As the number of populations increases, the probability of making a Type I error using multiple t-tests also increases. Analysis of variance allows us to test the null hypothesis (all means are equal) against the alternative hypothesis (at least one mean is different) with a specified value of α .

Image37184.PNG

In the previous chapter, we used a two-sample t-test to compare the means from two independent samples with a common variance. The sample data are used to compute the test statistic:

Image37204.PNG

is the pooled estimate of the common population variance σ 2 . To test more than two populations, we must extend this idea of pooled variance to include all samples as shown below:

Image37244.PNG

where S w 2 represents the pooled estimate of the common variance σ 2 , and it measures the variability of the observations within the different populations whether or not H 0 is true . This is often referred to as the variance within samples (variation due to error).

If the null hypothesis IS true (all the means are equal), then all the populations are the same, with a common mean μ and variance σ 2 . Instead of randomly selecting different samples from different populations, we are actually drawing k different samples from one population. We know that the sampling distribution for k means based on n observations will have mean μ x̄ and variance σ 2 /n (squared standard error). Since we have drawn k samples of n observations each, we can estimate the variance of the k sample means ( σ 2 /n) by

8840.png

Consequently, n times the sample variance of the means estimates σ 2 . We designate this quantity as S B 2 such that

8847.png

where S B 2  is also an unbiased estimate of the common variance σ 2 , IF H 0 IS TRUE. This is often referred to as the variance between samples (variation due to treatment).

Under the null hypothesis that all k populations are identical, we have two estimates of σ 2 (S W 2 and S B 2 ). We can use the ratio of S B 2 / S W 2 as a test statistic to test the null hypothesis that H 0 : µ 1 = µ 2 = µ 3 = …= µ k , which follows an F-distribution with degrees of freedom df 1 = k – 1 and df 2 = N – k (where k is the number of populations and N is the total number of observations (N = n 1 + n 2 +…+ n k ). The numerator of the test statistic measures the variation between sample means. The estimate of the variance in the denominator depends only on the sample variances and is not affected by the differences among the sample means.

When the null hypothesis is true, the ratio of S B 2 and S W 2 will be close to 1. When the null hypothesis is false, S B 2 will tend to be larger than S W 2 due to the differences among the populations. We will reject the null hypothesis if the F test statistic is larger than the F critical value at a given level of significance (or if the p-value is less than the level of significance).

Tables are a convenient format for summarizing the key results in ANOVA calculations. The following one-way ANOVA table illustrates the required computations and the relationships between the various ANOVA table elements.

8636.png

The sum of squares for the ANOVA table has the relationship of SSTo = SSTr + SSE where:

8869.png

Total variation (SSTo) = explained variation (SSTr) + unexplained variation (SSE)

The degrees of freedom also have a similar relationship: df (SSTo) = df (SSTr) + df (SSE)

The Mean Sum of Squares for the treatment and error are found by dividing the Sums of Squares by the degrees of freedom for each. While the Sums of Squares are additive, the Mean Sums of Squares are not. The F-statistic is then found by dividing the Mean Sum of Squares for the treatment (MSTr) by the Mean Sum of Squares for the error(MSE). The MSTr is the S B 2 and the MSE is the S W 2 .

F = S B 2 / S w 2 = MSTr/MSE

An environmentalist wanted to determine if the mean acidity of rain differed among Alaska, Florida, and Texas. He randomly selected six rain dates at each site obtained the following data:

8997.png

H 0 : μ A = μ F = μ T H 1 : at least one of the means is different

State

Sample size

Sample total

Sample mean

Sample variance

Alaska

n1 = 6

30.2

5.033

0.0265

Florida

n2 = 6

27.1

4.517

0.1193

Texas

n3 = 6

33.22

5.537

0.1575

Table 3. Summary Table.

Notice that there are differences among the sample means. Are the differences small enough to be explained solely by sampling variability? Or are they of sufficient magnitude so that a more reasonable explanation is that the μ ’s are not all equal? The conclusion depends on how much variation among the sample means (based on their deviations from the grand mean) compares to the variation within the three samples.

The grand mean is equal to the sum of all observations divided by the total sample size:

8898.png

SSTo = (5.11-5.0289) 2 + (5.01-5.0289) 2 +…+(5.24-5.0289) 2

+ (4.87-5.0289) 2 + (4.18-5.0289) 2 +…+(4.09-5.0289) 2

+ (5.46-5.0289) 2 + (6.29-5.0289) 2 +…+(5.30-5.0289) 2 = 4.6384

SSTr = 6(5.033-5.0289) 2 + 6(4.517-5.0289) 2 + 6(5.537-5.0289) 2 = 3.1214

SSE = SSTo – SSTr = 4.6384 – 3.1214 = 1.5170

8605.png

This test is based on df 1 = k – 1 = 2 and df 2 = N – k = 15. For α = 0.05, the F critical value is 3.68. Since the observed F = 15.4372 is greater than the F critical value of 3.68, we reject the null hypothesis. There is enough evidence to state that at least one of the means is different.

Software Solutions

093_1.tif

One-way ANOVA: pH vs. State

Source

DF

SS

MS

F

P

State

2

3.121

1.561

15.43

0.000

Error

15

1.517

0.101

Total

17 4.638

S = 0.3180 R-Sq = 67.29% R-Sq(adj) = 62.93%

Individual 95% CIs For Mean Based on Pooled StDev

Level

N

Mean

StDev

—-+———+———+———+—–

Alaska

6

5.0333

0.1629

(——*——)

Florida

6

4.5167

0.3455

(——*——)

Texas

6

5.5367

0.3969

(——*——)

—-+———+———+———+—–

4.40

4.80

5.20

5.60

Pooled StDev = 0.3180

The p-value (0.000) is less than the level of significance (0.05) so we will reject the null hypothesis.

092_1.tif

ANOVA: Single Factor

Groups

Count

Sum

Average

Variance

Column 1

6

30.2

5.033333

0.026547

Column 2

6

27.1

4.516667

0.119347

Column 3

6

33.22

5.536667

0.157507

Source of Variation

SS

df

MS

F

p-value

F crit

Between Groups

3.121378

2

1.560689

15.43199

0.000229

3.68232

Within Groups

1.517

15

0.101133

Total

4.638378

17

The p-value (0.000229) is less than alpha (0.05) so we reject the null hypothesis. There is enough evidence to support the claim that at least one of the means is different.

Once we have rejected the null hypothesis and found that at least one of the treatment means is different, the next step is to identify those differences. There are two approaches that can be used to answer this type of question: contrasts and multiple comparisons.

Contrasts can be used only when there are clear expectations BEFORE starting an experiment, and these are reflected in the experimental design. Contrasts are planned comparisons . For example, mule deer are treated with drug A, drug B, or a placebo to treat an infection. The three treatments are not symmetrical. The placebo is meant to provide a baseline against which the other drugs can be compared. Contrasts are more powerful than multiple comparisons because they are more specific. They are more able to pick up a significant difference. Contrasts are not always readily available in statistical software packages (when they are, you often need to assign the coefficients), or may be limited to comparing each sample to a control.

Multiple comparisons should be used when there are no justified expectations. They are aposteriori , pair-wise tests of significance. For example, we compare the gas mileage for six brands of all-terrain vehicles. We have no prior knowledge to expect any vehicle to perform differently from the rest. Pair-wise comparisons should be performed here, but only if an ANOVA test on all six vehicles rejected the null hypothesis first.

It is NOT appropriate to use a contrast test when suggested comparisons appear only after the data have been collected. We are going to focus on multiple comparisons instead of planned contrasts.

Multiple Comparisons

When the null hypothesis is rejected by the F-test, we believe that there are significant differences among the k population means. So, which ones are different? Multiple comparison method is the way to identify which of the means are different while controlling the experiment-wise error (the accumulated risk associated with a family of comparisons). There are many multiple comparison methods available.

In The Least Significant Difference Test , each individual hypothesis is tested with the student t-statistic. When the Type I error probability is set at some value and the variance s 2 has v degrees of freedom, the null hypothesis is rejected for any observed value such that |t o |>t α/2 , v. It is an abbreviated version of conducting all possible pair-wise t-tests. This method has weak experiment-wise error rate. Fisher’s Protected LSD is somewhat better at controlling this problem.

Bonferroni inequality is a conservative alternative when software is not available. When conducting n comparisons, α e ≤ n α c therefore α c = α e /n. In other words, divide the experiment-wise level of significance by the number of multiple comparisons to get the comparison-wise level of significance. The Bonferroni procedure is based on computing confidence intervals for the differences between each possible pair of μ ’s. The critical value for the confidence intervals comes from a table with (N – k ) degrees of freedom and k ( k – 1)/2 number of intervals. If a particular interval does not contain zero, the two means are declared to be significantly different from one another. An interval that contains zero indicates that the two means are NOT significantly different.

Image37382.PNG

Scheffe’s test is also a conservative method for all possible simultaneous comparisons suggested by the data. This test equates the F statistic of ANOVA with the t-test statistic. Since t 2 = F then t = √F, we can substitute √F( α e , v 1 , v 2 ) for t( α e , v 2 ) for Scheffe’s statistic.

Tukey’s test provides a strong sense of experiment-wise error rate for all pair-wise comparison of treatment means. This test is also known as the Honestly Significant Difference . This test orders the treatments from smallest to largest and uses the studentized range statistic

8905.png

The absolute difference of the two means is used because the location of the two means in the calculated difference is arbitrary, with the sign of the difference depending on which mean is used first. For unequal replications, the Tukey-Kramer approximation is used instead.

Student-Newman-Keuls (SNK) test is a multiple range test based on the studentized range statistic like Tukey’s. The critical value is based on a particular pair of means being tested within the entire set of ordered means. Two or more ranges among means are used for test criteria. While it is similar to Tukey’s in terms of a test statistic, it has weak experiment-wise error rates.

Bonferroni, Dunnett’s, and Scheffe’s tests are the most conservative, meaning that the difference between the two means must be greater before concluding a significant difference. The LSD and SNK tests are the least conservative. Tukey’s test is in the middle. Robert Kuehl, author of Design of Experiments: Statistical Principles of Research Design and Analysis (2000), states that the Tukey method provides the best protection against decision errors, along with a strong inference about magnitude and direction of differences.

Let’s go back to our question on mean rain acidity in Alaska, Florida, and Texas. The null and alternative hypotheses were as follows:

H : μA = μF = μT

H : at least one of the means is different

The p-value for the F-test was 0.000229, which is less than our 5% level of significance. We rejected the null hypothesis and had enough evidence to support the claim that at least one of the means was significantly different from another. We will use Bonferroni and Tukey’s methods for multiple comparisons in order to determine which mean(s) is different.

Bonferroni Multiple Comparison Method

A Bonferroni confidence interval is computed for each pair-wise comparison. For k populations, there will be k ( k -1)/2 multiple comparisons. The confidence interval takes the form of:

8913.png

Where MSE is from the analysis of variance table and the Bonferroni t critical value comes from the Bonferroni Table given below. The Bonferroni t critical value, instead of the student t critical value, combined with the use of the MSE is used to achieve a simultaneous confidence level of at least 95% for all intervals computed. The two means are judged to be significantly different if the corresponding interval does not include zero.

8535.png

For this problem, k = 3 so there are k ( k – 1)/2= 3(3 – 1)/2 = 3 multiple comparisons. The degrees of freedom are equal to N – k = 18 – 3 = 15. The Bonferroni critical value is 2.69.

8942.png

The first confidence interval contains all positive values. This tells you that there is a significant difference between the two means and that the mean rain pH for Alaska is significantly greater than the mean rain pH for Florida.

The second confidence interval contains all negative values. This tells you that there is a significant difference between the two means and that the mean rain pH of Alaska is significantly lower than the mean rain pH of Texas.

The third confidence interval also contains all negative values. This tells you that there is a significant difference between the two means and that the mean rain pH of Florida is significantly lower than the mean rain pH of Texas.

All three states have significantly different levels of rain pH. Texas has the highest rain pH, then Alaska followed by Florida, which has the lowest mean rain pH level. You can use the confidence intervals to estimate the mean difference between the states. For example, the average rain pH in Texas ranges from 0.5262 to 1.5138 higher than the average rain pH in Florida.

Now let’s use the Tukey method for multiple comparisons. We are going to let software compute the values for us. Excel doesn’t do multiple comparisons so we are going to rely on Minitab output.

095.tif

One-way ANOVA: pH vs. state

Source

DF

SS

MS

F

P

state

2

3.121

1.561

15.4

0.000

Error

15

1.517

0.101

Total

17

4.638

S = 0.3180

R-Sq = 67.29%

R-Sq(adj) = 62.93%

We have seen this part of the output before. We now want to focus on the Grouping Information Using Tukey Method. All three states have different letters indicating that the mean rain pH for each state is significantly different. They are also listed from highest to lowest. It is easy to see that Texas has the highest mean rain pH while Florida has the lowest.

Grouping Information Using Tukey Method

state

N

Mean

Grouping

Texas

6

5.5367

A

Alaska

6

5.0333

B

Florida

6

4.516

C

Means that do not share a letter are significantly different.

This next set of confidence intervals is similar to the Bonferroni confidence intervals. They estimate the difference of each pair of means. The individual confidence interval level is set at 97.97% instead of 95% thus controlling the experiment-wise error rate.

Tukey 95% Simultaneous Confidence Intervals

All Pairwise Comparisons among Levels of state

Individual confidence level =

state = Alaska subtracted from:

state

Lower

Center

Upper

———+———+———+———+

Florida

-0.9931

-0.5167

-0.0402

(—–*—-)

Texas

0.0269

0.5033

0.9798

(—–*—–)

———+———+———+———+

-0.80

0.00

0.80

1.60

state = Florida subtracted from:

state

Lower

Center

Upper

———+———+———+———+

Texas

0.5435

1.0200

1.4965

(—–*—–)

———+———+———+———+

-0.80

0.00

0.80

1.60

The first pairing is Florida – Alaska, which results in an interval of (-0.9931, -0.0402). The interval has all negative values indicating that Florida is significantly lower than Alaska. The second pairing is Texas – Alaska, which results in an interval of (0.0269, 0.9798). The interval has all positive values indicating that Texas is greater than Alaska. The third pairing is Texas – Florida, which results in an interval from (0.5435, 1.4965). All positive values indicate that Texas is greater than Florida.

The intervals are similar to the Bonferroni intervals with differences in width due to methods used. In both cases, the same conclusions are reached.

When we use one-way ANOVA and conclude that the differences among the means are significant, we can’t be absolutely sure that the given factor is responsible for the differences. It is possible that the variation of some other unknown factor is responsible. One way to reduce the effect of extraneous factors is to design an experiment so that it has a completely randomized design. This means that each element has an equal probability of receiving any treatment or belonging to any different group. In general good results require that the experiment be carefully designed and executed.

Additional example:

Natural Resources Biometrics Copyright © 2014 by Diane Kiernan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Null and Alternative Hypotheses | Definitions & Examples

Null & Alternative Hypotheses | Definitions, Templates & Examples

Published on May 6, 2022 by Shaun Turney . Revised on June 22, 2023.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :

  • Null hypothesis ( H 0 ): There’s no effect in the population .
  • Alternative hypothesis ( H a or H 1 ) : There’s an effect in the population.

Table of contents

Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, similarities and differences between null and alternative hypotheses, how to write null and alternative hypotheses, other interesting articles, frequently asked questions.

The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”:

  • The null hypothesis ( H 0 ) answers “No, there’s no effect in the population.”
  • The alternative hypothesis ( H a ) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample. It’s critical for your research to write strong hypotheses .

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

alternative hypothesis analysis of variance

The null hypothesis is the claim that there’s no effect in the population.

If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept . Be careful not to say you “prove” or “accept” the null hypothesis.

Null hypotheses often include phrases such as “no effect,” “no difference,” or “no relationship.” When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).

You can never know with complete certainty whether there is an effect in the population. Some percentage of the time, your inference about the population will be incorrect. When you incorrectly reject the null hypothesis, it’s called a type I error . When you incorrectly fail to reject it, it’s a type II error.

Examples of null hypotheses

The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.

( )
Does tooth flossing affect the number of cavities? Tooth flossing has on the number of cavities. test:

The mean number of cavities per person does not differ between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ = µ .

Does the amount of text highlighted in the textbook affect exam scores? The amount of text highlighted in the textbook has on exam scores. :

There is no relationship between the amount of text highlighted and exam scores in the population; β = 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression.* test:

The proportion of people with depression in the daily-meditation group ( ) is greater than or equal to the no-meditation group ( ) in the population; ≥ .

*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .

The alternative hypothesis ( H a ) is the other answer to your research question . It claims that there’s an effect in the population.

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.

Alternative hypotheses often include phrases such as “an effect,” “a difference,” or “a relationship.” When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes < or >). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

Examples of alternative hypotheses

The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.

Does tooth flossing affect the number of cavities? Tooth flossing has an on the number of cavities. test:

The mean number of cavities per person differs between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ ≠ µ .

Does the amount of text highlighted in a textbook affect exam scores? The amount of text highlighted in the textbook has an on exam scores. :

There is a relationship between the amount of text highlighted and exam scores in the population; β ≠ 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression. test:

The proportion of people with depression in the daily-meditation group ( ) is less than the no-meditation group ( ) in the population; < .

Null and alternative hypotheses are similar in some ways:

  • They’re both answers to the research question.
  • They both make claims about the population.
  • They’re both evaluated by statistical tests.

However, there are important differences between the two types of hypotheses, summarized in the following table.

A claim that there is in the population. A claim that there is in the population.

Equality symbol (=, ≥, or ≤) Inequality symbol (≠, <, or >)
Rejected Supported
Failed to reject Not supported

To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.

General template sentences

The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:

Does independent variable affect dependent variable ?

  • Null hypothesis ( H 0 ): Independent variable does not affect dependent variable.
  • Alternative hypothesis ( H a ): Independent variable affects dependent variable.

Test-specific template sentences

Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.

( )
test 

with two groups

The mean dependent variable does not differ between group 1 (µ ) and group 2 (µ ) in the population; µ = µ . The mean dependent variable differs between group 1 (µ ) and group 2 (µ ) in the population; µ ≠ µ .
with three groups The mean dependent variable does not differ between group 1 (µ ), group 2 (µ ), and group 3 (µ ) in the population; µ = µ = µ . The mean dependent variable of group 1 (µ ), group 2 (µ ), and group 3 (µ ) are not all equal in the population.
There is no correlation between independent variable and dependent variable in the population; ρ = 0. There is a correlation between independent variable and dependent variable in the population; ρ ≠ 0.
There is no relationship between independent variable and dependent variable in the population; β = 0. There is a relationship between independent variable and dependent variable in the population; β ≠ 0.
Two-proportions test The dependent variable expressed as a proportion does not differ between group 1 ( ) and group 2 ( ) in the population; = . The dependent variable expressed as a proportion differs between group 1 ( ) and group 2 ( ) in the population; ≠ .

Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (“ x affects y because …”).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses . In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 22). Null & Alternative Hypotheses | Definitions, Templates & Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/statistics/null-and-alternative-hypotheses/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, inferential statistics | an easy introduction & examples, hypothesis testing | a step-by-step guide with easy examples, type i & type ii errors | differences, examples, visualizations, what is your plagiarism score.

IMAGES

  1. PPT

    alternative hypothesis analysis of variance

  2. PPT

    alternative hypothesis analysis of variance

  3. The variance functions in Alternative hypotheses

    alternative hypothesis analysis of variance

  4. PPT

    alternative hypothesis analysis of variance

  5. Analysis of Variance (ANOVA) Explained with Formula, and an Example

    alternative hypothesis analysis of variance

  6. Alternative hypothesis

    alternative hypothesis analysis of variance

VIDEO

  1. Hypothesis Testing: the null and alternative hypotheses

  2. Lecture 8: Chi^2 Analysis of Variance Hypothesis Testing

  3. Lecture 8: Chi^2 Analysis of Variance Hypothesis Test Greater Than Example

  4. 1 Sample Variance Hypothesis Test by Minitab

  5. Review of Hypothesis Testing and Analysis of Variance (ANOVA)

  6. Hypothesis test for the difference of two variances

COMMENTS

  1. One-way ANOVA

    ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups. A one-way ANOVA uses one independent variable, while a two-way ANOVA uses two independent variables. As a crop researcher, you want to test the effect of three different fertilizer mixtures on crop yield.

  2. One-Way ANOVA: Definition, Formula, and Example

    A one-way ANOVA ("analysis of variance") compares the means of three or more independent groups to determine if there is a statistically significant difference between the corresponding population means. ... A one-way ANOVA uses the following null and alternative hypotheses: H 0 (null hypothesis): ...

  3. PDF One-Way Analysis of Variance: Comparing Several Means

    Alternative Hypothesis (multi-sided): In the population, the mean post-test score is significantly different across the three cohorts of students. Study Background This example illustrates how to conduct a one-way analysis of variance to compare the mean differences across three groups.

  4. ANOVA (Analysis of variance)

    Alternative Hypothesis (H1): This is the hypothesis that there is a difference between at least two of the group means. ... The Analysis of Variance (ANOVA) is a powerful statistical technique that is used widely across various fields and industries. Here are some of its key applications:

  5. Hypothesis Testing

    The hypothesis is based on available information and the investigator's belief about the population parameters. The specific test considered here is called analysis of variance (ANOVA) and is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups.

  6. 11.1: One-Way ANOVA

    To account for this P(Type I Error) inflation, we instead will do an analysis of variance (ANOVA) to test the equality between 3 or more population means \(\mu_{1}, \mu_{2}, \mu_{3}, \ldots, \mu_{k}\). ... The null hypothesis will always have the means equal to one another versus the alternative hypothesis that at least one mean is different ...

  7. Analysis of Variance

    Analysis of variance became widely known after being included in Fisher's 1925 book Statistical Methods for Research Workers (Fisher 1925). To test the hypothesis that all treatments have exactly the same effect, the F-test's p values closely approximate the permutation test's p values: the approximation is particularly close when the ...

  8. One-Way Analysis of Variance: Example

    With that in mind, here is the null hypothesis and the alternative hypothesis for a one-way analysis of variance: Null hypothesis: The null hypothesis states that the independent variable (dosage level) has no effect on the dependent variable (cholesterol level) in any treatment group. Thus, ... β j = 0 for all j. Alternative hypothesis: The ...

  9. 2.6

    The degrees of freedom associated with SSE is n -2 = 49-2 = 47. And the degrees of freedom add up: 1 + 47 = 48. The sums of squares add up: SSTO = SSR + SSE. That is, here: 53637 = 36464 + 17173. Let's tackle a few more columns of the analysis of variance table, namely the " mean square " column, labeled MS, and the F -statistic column labeled F.

  10. 13.1 One-Way ANOVA

    The Null and Alternative Hypotheses. The null hypothesis is that all the group population means are the same. The alternative hypothesis is that at least one pair of means is different. ... μ 1 = μ 2 = μ 3 and the three populations have the same distribution if the null hypothesis is true. The variance of the combined data is approximately ...

  11. PDF Chapter 7 One-way ANOVA

    Chapter 7One-way ANOVAOne-way ANOVA examines equality of population means for a quantitative out-come and a single categorical explanatory variable wi. h any number of levels.The t-test of Chapter 6 looks at quantitative outcomes with a categorical ex-planatory variable t. at has only two levels. The one-way Analysis of Variance (ANOVA) can be ...

  12. 3.5

    For this reason, it is often referred to as the analysis of variance F-test. The following section summarizes the ANOVA F-test. The ANOVA F-test for the slope parameter β 1. The null hypothesis is H 0: β 1 = 0. The alternative hypothesis is H A: β 1 ≠ 0. The test statistic is \(F^*=\frac{MSR}{MSE}\).

  13. ANOVA (Analysis of Variance)

    ANOVA, short for Analysis of Variance, is a statistical method used to see if there are significant differences between the averages of three or more unrelated groups. ... In general, if the p-value associated with the F is smaller than .05, then the null hypothesis is rejected and the alternative hypothesis is supported. If the null hypothesis ...

  14. 11.5: Hypotheses in ANOVA

    In ANOVA, we will still adopt the alternative hypothesis as the best explanation of our data if we reject the null hypothesis. However, when we look at the alternative hypothesis, we can see that it does not give us much information. We will know that a difference exists somewhere, but we will not know where that difference is.

  15. One-Way Analysis of Variance (ANOVA)

    The analysis is based on an examination of variation between and within groups, and is often called the one-way analysis of variance (or one-way ANOVA for short). We explicitly consider all possible sources of variation before carefully explaining how an ANOVA is conducted. ... The alternative hypothesis is \(H_{1}:\mu _{a}\ne \mu _{b}\), where ...

  16. Analysis of Variance (One-way ANOVA)

    A One-Way Analysis of Variance is a way to test the equality of three or more population means at one time by using sample variances, under the following assumptions: ... The null hypothesis is that all population means are equal, the alternative hypothesis is that at least one mean is different.

  17. Lesson 10: Introduction to ANOVA

    In this Lesson, we introduced One-way Analysis of Variance (ANOVA). The ANOVA test tests the hypothesis that the population means for the groups are the same against the hypothesis that at least one of the means is different. If the null hypothesis is rejected, we need to perform multiple comparisons to determine which means are different.

  18. 14 Chapter 14: Analysis of Variance

    Analysis of variance, often abbreviated to ANOVA for short, serves the same purpose as the t-tests we learned earlier in unit 2: ... Our alternative hypothesis is always exactly the same. Step 2: Find the Critical Value. Our test statistic for ANOVA, as we saw above, is F.

  19. 5.1: Analysis of Variance

    Analysis of variance (ANOVA) is an inferential method used to test the equality of three or more population means. \(H_0: \mu_1= \mu_2= \mu_3= \cdot =\mu_k\) ... Analysis of variance allows us to test the null hypothesis (all means are equal) against the alternative hypothesis (at least one mean is different) with a specified value of α.

  20. PDF Chapter 8:

    The analysis of variance (ANOVA) is a hypothesis-testing technique used to test the claim that three or more populations (or treatment) means are equal by examining the variances of samples that are taken. This is an extension of the two independent samples t-test. ANOVA is based on comparing the variance (or variation) between the data samples ...

  21. Two-Way ANOVA

    Two-Way ANOVA | Examples & When To Use It. Published on March 20, 2020 by Rebecca Bevans.Revised on June 22, 2023. ANOVA (Analysis of Variance) is a statistical test used to analyze the difference between the means of more than two groups. A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables.

  22. Chapter 5: One-Way Analysis of Variance

    The estimate of the variance in the denominator depends only on the sample variances and is not affected by the differences among the sample means. When the null hypothesis is true, the ratio of S B 2 and S W 2 will be close to 1. When the null hypothesis is false, S B 2 will tend to be larger than S W 2 due to the differences among the ...

  23. Null & Alternative Hypotheses

    The null hypothesis (H0) answers "No, there's no effect in the population.". The alternative hypothesis (Ha) answers "Yes, there is an effect in the population.". The null and alternative are always claims about the population. That's because the goal of hypothesis testing is to make inferences about a population based on a sample.

  24. Transcranial Magnetic Stimulation Facilitates Neural Speech Decoding

    We observe weak to moderate evidence for the alternative hypothesis in a Bayesian analysis of group means, with more robust results upon stimulation to a brain region governing multiple phoneme features. ... The model with the second highest significance and the lowest sample variance was Decoding Accuracy∼Task Accuracy (F (1,119) = 13.556, p ...