Have a language expert improve your writing
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
- Knowledge Base

ANOVA in R | A Complete Step-by-Step Guide with Examples
Published on March 6, 2020 by Rebecca Bevans . Revised on November 17, 2022.
ANOVA is a statistical test for estimating how a quantitative dependent variable changes according to the levels of one or more categorical independent variables . ANOVA tests whether there is a difference in means of the groups at each level of the independent variable.
The null hypothesis ( H 0 ) of the ANOVA is no difference in means, and the alternative hypothesis ( H a ) is that the means are different from one another.
In this guide, we will walk you through the process of a one-way ANOVA (one independent variable) and a two-way ANOVA (two independent variables).
Our sample dataset contains observations from an imaginary study of the effects of fertilizer type and planting density on crop yield.
We will also include examples of how to perform and interpret a two-way ANOVA with an interaction term, and an ANOVA with a blocking variable.
Sample dataset for ANOVA
Table of contents
Getting started in r, step 1: load the data into r, step 2: perform the anova test, step 3: find the best-fit model, step 4: check for homoscedasticity, step 5: do a post-hoc test, step 6: plot the results in a graph, step 7: report the results, frequently asked questions about anova.
If you haven’t used R before, start by downloading R and R Studio . Once you have both of these programs downloaded, open R Studio and click on File > New File > R Script .
Now you can copy and paste the code from the rest of this example into your script. To run the code, highlight the lines you want to run and click on the Run button on the top right of the text editor (or press ctrl + enter on the keyboard).
Install and load the packages
First, install the packages you will need for the analysis (this only needs to be done once):
Then load these packages into your R environment (do this every time you restart the R program):
Note that this data was generated for this example, it’s not from a real experiment.
We will use the same dataset for all of our examples in this walkthrough. The only difference between the different analyses is how many independent variables we include and in what combination we include them.
It is common for factors to be read as quantitative variables when importing a dataset into R. To avoid this, you can use the read.csv() command to read in the data, specifying within the command whether each of the variables should be quantitative (“numeric”) or categorical (“factor”).
Use the following code, replacing the path/to/your/file text with the actual path to your file:
Before continuing, you can check that the data has read in correctly:

You should see ‘density’, ‘block’, and ‘fertilizer’ listed as categorical variables with the number of observations at each level (i.e. 48 observations at density 1 and 48 observations at density 2).
‘Yield’ should be a quantitative variable with a numeric summary (minimum, median , mean , maximum).
ANOVA tests whether any of the group means are different from the overall mean of the data by checking the variance of each individual group against the overall variance of the data. If one or more groups falls outside the range of variation predicted by the null hypothesis (all group means are equal), then the test is statistically significant .
We can perform an ANOVA in R using the aov() function. This will calculate the test statistic for ANOVA and determine whether there is significant variation among the groups formed by the levels of the independent variable.
One-way ANOVA
In the one-way ANOVA example, we are modeling crop yield as a function of the type of fertilizer used. First we will use aov() to run the model, then we will use summary() to print the summary of the model.

The model summary first lists the independent variables being tested in the model (in this case we have only one, ‘fertilizer’) and the model residuals (‘Residual’). All of the variation that is not explained by the independent variables is called residual variance.
The rest of the values in the output table describe the independent variable and the residuals:
- The Df column displays the degrees of freedom for the independent variable (the number of levels in the variable minus 1), and the degrees of freedom for the residuals (the total number of observations minus one and minus the number of levels in the independent variables).
- The Sum Sq column displays the sum of squares (a.k.a. the total variation between the group means and the overall mean).
- The Mean Sq column is the mean of the sum of squares, calculated by dividing the sum of squares by the degrees of freedom for each parameter.
- The F value column is the test statistic from the F test. This is the mean square of each independent variable divided by the mean square of the residuals. The larger the F value, the more likely it is that the variation caused by the independent variable is real and not due to chance.
- The Pr(>F) column is the p value of the F statistic. This shows how likely it is that the F value calculated from the test would have occurred if the null hypothesis of no difference among group means were true.
The p value of the fertilizer variable is low ( p < 0.001), so it appears that the type of fertilizer used has a real impact on the final crop yield.
Two-way ANOVA
In the two-way ANOVA example, we are modeling crop yield as a function of type of fertilizer and planting density. First we use aov() to run the model, then we use summary() to print the summary of the model.

Adding planting density to the model seems to have made the model better: it reduced the residual variance (the residual sum of squares went from 35.89 to 30.765), and both planting density and fertilizer are statistically significant (p-values < 0.001).
Adding interactions between variables
Sometimes you have reason to think that two of your independent variables have an interaction effect rather than an additive effect.
For example, in our crop yield experiment, it is possible that planting density affects the plants’ ability to take up fertilizer. This might influence the effect of fertilizer type in a way that isn’t accounted for in the two-way model.
To test whether two variables have an interaction effect in ANOVA, simply use an asterisk instead of a plus-sign in the model:

In the output table, the ‘fertilizer:density’ variable has a low sum-of-squares value and a high p value, which means there is not much variation that can be explained by the interaction between fertilizer and planting density.
Adding a blocking variable
If you have grouped your experimental treatments in some way, or if you have a confounding variable that might affect the relationship you are interested in testing, you should include that element in the model as a blocking variable. The simplest way to do this is just to add the variable into the model with a ‘+’.
For example, in many crop yield studies, treatments are applied within ‘blocks’ in the field that may differ in soil texture, moisture, sunlight, etc. To control for the effect of differences among planting blocks we add a third term, ‘block’, to our ANOVA.

The ‘block’ variable has a low sum-of-squares value (0.486) and a high p value (p = 0.48), so it’s probably not adding much information to the model. It also doesn’t change the sum of squares for the two independent variables, which means that it’s not affecting how much variation in the dependent variable they explain.
There are now four different ANOVA models to explain the data. How do you decide which one to use? Usually you’ll want to use the ‘best-fit’ model – the model that best explains the variation in the dependent variable.
The Akaike information criterion (AIC) is a good test for model fit. AIC calculates the information value of each model by balancing the variation explained against the number of parameters used.
In AIC model selection, we compare the information value of each model and choose the one with the lowest AIC value (a lower number means more information explained!)
The model with the lowest AIC score (listed first in the table) is the best fit for the data:

From these results, it appears that the two.way model is the best fit. The two-way model has the lowest AIC value, and 71% of the AIC weight, which means that it explains 71% of the total variation in the dependent variable that can be explained by the full set of models.
The model with blocking term contains an additional 15% of the AIC weight, but because it is more than 2 delta-AIC worse than the best model, it probably isn’t good enough to include in your results.
To check whether the model fits the assumption of homoscedasticity , look at the model diagnostic plots in R using the plot() function:
The output looks like this:

The diagnostic plots show the unexplained variance (residuals) across the range of the observed data.
Each plot gives a specific piece of information about the model fit, but it’s enough to know that the red line representing the mean of the residuals should be horizontal and centered on zero (or on one, in the scale-location plot), meaning that there are no large outliers that would cause research bias in the model.
The normal Q-Q plot plots a regression between the theoretical residuals of a perfectly-homoscedastic model and the actual residuals of your model, so the closer to a slope of 1 this is the better. This Q-Q plot is very close, with only a bit of deviation.
From these diagnostic plots we can say that the model fits the assumption of homoscedasticity.
If your model doesn’t fit the assumption of homoscedasticity, you can try the Kruskall-Wallis test instead.
ANOVA tells us if there are differences among group means, but not what the differences are. To find out which groups are statistically different from one another, you can perform a Tukey’s Honestly Significant Difference (Tukey’s HSD) post-hoc test for pairwise comparisons:

From the post-hoc test results, we see that there are statistically significant differences (p < 0.05) between fertilizer groups 3 and 1 and between fertilizer types 3 and 2, but the difference between fertilizer groups 2 and 1 is not statistically significant. There is also a significant difference between the two different levels of planting density.
When plotting the results of a model, it is important to display:
- the raw data
- summary information, usually the mean and standard error of each group being compared
- letters or symbols above each group being compared to indicate the groupwise differences.
Find the groupwise differences
From the ANOVA test we know that both planting density and fertilizer type are significant variables. To display this information on a graph, we need to show which of the combinations of fertilizer type + planting density are statistically different from one another.
To do this, we can run another ANOVA + TukeyHSD test, this time using the interaction of fertilizer and planting density. We aren’t doing this to find out if the interaction term is significant (we already know it’s not), but rather to find out which group means are statistically different from one another so we can add this information to the graph.
Instead of printing the TukeyHSD results in a table, we’ll do it in a graph.

The significant groupwise differences are any where the 95% confidence interval doesn’t include zero. This is another way of saying that the p value for these pairwise differences is < 0.05.
From this graph, we can see that the fertilizer + planting density combinations which are significantly different from one another are 3:1-1:1 (read as “fertilizer type three + planting density 1 contrasted with fertilizer type 1 + planting density type 1”), 1:2-1:1, 2:2-1:1, 3:2-1:1, and 3:2-2:1.
We can make three labels for our graph: A (representing 1:1), B (representing all the intermediate combinations), and C (representing 3:2).
Make a data frame with the group labels
Now we need to make an additional data frame so we can add these groupwise differences to our graph.
First, summarize the original data using fertilizer type and planting density as grouping variables.
Next, add the group labels as a new variable in the data frame.
Your data frame should look like this:

Now we are ready to start making the plot for our report.
Plot the raw data

Add the means and standard errors to the graph

This is very hard to read, since all of the different groupings for fertilizer type are stacked on top of one another. We will solve this in the next step.
Split up the data
To show which groups are different from one another, use facet_wrap() to split the data up over the three types of fertilizer. To add labels, use geom_text() , and add the group letters from the mean.yield.data dataframe you made earlier.

Make the graph ready for publication
In this step we will remove the grey background and add axis labels.
The final version of your graph looks like this:

In addition to a graph, it’s important to state the results of the ANOVA test. Include:
- A brief description of the variables you tested
- The F value, degrees of freedom, and p values for each independent variable
- What the results mean.
A Tukey post-hoc test revealed that fertilizer mix 3 resulted in a higher yield on average than fertilizer mix 1 (0.59 bushels/acre), and a higher yield on average than fertilizer mix 2 (0.42 bushels/acre). Planting density was also significant, with planting density 2 resulting in an higher yield on average of 0.46 bushels/acre over planting density 1.
The only difference between one-way and two-way ANOVA is the number of independent variables . A one-way ANOVA has one independent variable, while a two-way ANOVA has two.
- One-way ANOVA : Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka) and race finish times in a marathon.
- Two-way ANOVA : Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka), runner age group (junior, senior, master’s), and race finishing times in a marathon.
All ANOVAs are designed to test for differences among three or more groups. If you are only testing for a difference between two groups, use a t-test instead.
A factorial ANOVA is any ANOVA that uses more than one categorical independent variable . A two-way ANOVA is a type of factorial ANOVA.
Some examples of factorial ANOVAs include:
- Testing the combined effects of vaccination (vaccinated or not vaccinated) and health status (healthy or pre-existing condition) on the rate of flu infection in a population.
- Testing the effects of marital status (married, single, divorced, widowed), job status (employed, self-employed, unemployed, retired), and family history (no family history, some family history) on the incidence of depression in a population.
- Testing the effects of feed type (type A, B, or C) and barn crowding (not crowded, somewhat crowded, very crowded) on the final weight of chickens in a commercial farming operation.
In ANOVA, the null hypothesis is that there is no difference among group means. If any group differs significantly from the overall group mean, then the ANOVA will report a statistically significant result.
Significant differences among group means are calculated using the F statistic, which is the ratio of the mean sum of squares (the variance explained by the independent variable) to the mean square error (the variance left over).
If the F statistic is higher than the critical value (the value of F that corresponds with your alpha value, usually 0.05), then the difference among groups is deemed statistically significant.
Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).
Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).
You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .
Cite this Scribbr article
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Bevans, R. (2022, November 17). ANOVA in R | A Complete Step-by-Step Guide with Examples. Scribbr. Retrieved March 6, 2023, from https://www.scribbr.com/statistics/anova-in-r/
Is this article helpful?
Rebecca Bevans
Other students also liked, one-way anova | when and how to use it (with examples), two-way anova | examples & when to use it, akaike information criterion | when & how to use it (example), what is your plagiarism score.

Towards Data Science

Dec 23, 2019
Member-only
Doing and reporting your first ANOVA and ANCOVA in R
How to test and report the impact of a categorical independent variable on an interval dependent variable..
Analysis of Variance , or ANOVA, is a frequently-used and fundamental statistical test in many sciences. In its most common form, it analyzes how much of the variance of the dependent variable can be attributed to the independent variable(s) in the model. It is most often used to analyze the impact of a categorical independent variable (e.g., experimental conditions, dog breeds, flower species, etc.) on an interval dependent variable. At its core, an ANOVA will provide much of the same information provided in a simple linear regression (i.e., OLS). However, an ANOVA can be seen as an alternative interface through which this information can be accessed. Different scientific domains might have different preferences, which generally imply which test you should be using.
An Analysis of Covariance , or ANCOVA, denotes an ANOVA with more than one independent variable. Imagine you would like to analyze the impact of the dog’s breed on the dog’s weight, controlling for the dog’s age. Without controlling for the dog’s age, you might never be able to identify the true impact of the dog’s breed on its weight. You, therefore, need to run an ANCOVA to ‘filter out’ the effect of the dog’s age to see if the dog’s breed still influences the weight. Controlling for another covariate might strengthen or weaken the impact of your independent variable of interest.
The dataset
For this exercise, I will use the iris dataset, which is available in core R and which we will load into the working environment under the name df using the following command:
The iris dataset contains variables describing the shape and size of different species of Iris flowers.
A typical hypothesis that one could test using an ANOVA could be if the species of the Iris (the independent categorical variable) has any impact on other features of the flower. In our case, we are going to test whether the species of the Iris has any impact on the petal length (the dependent interval variable).
Ensuring you don’t violate key assumptions
Before running the ANOVA, you must first confirm that a key assumption of the ANOVA is met in your dataset. Key assumptions are aspects, which are assumed in how your computer calculates your ANOVA results — if they are violated, your analysis might yield spurious results.
For an ANOVA, the assumption is the homogeneity of variance . This sounds complicated, but it basically checks that the variances in different groups created by the categorical independent variable are equal (i.e., the difference between the variances is zero). And we can test for the homogeneity of variance by running Levene’s test. Levene’s test is not available in base R, so we will use the car package for it.
Install the package.
Then load the package.
Then run Levene’s test.
This yields the following output:
As you can see, the test returned a significant outcome. Here it is important to know the hypotheses built into the test: Levene’s test’s null hypothesis, which we would accept if the test came back insignificantly, implies that the variance is homogenous, and we can proceed with our ANOVA. However, the test did come back significantly, which means that the variances between Petal.Length of the different species are significantly different.
Well… talk to your co-authors, colleagues, or supervisors at this point. Technically, you would have to do a robust ANOVA, which provides reliable results even in the face of inhomogeneous variances. However, not all academic disciplines follow this technical guidance… so talk to more senior colleagues in your field.
Anyhow, we will continue this tutorial as if Levene’s test came back insignificant.
Running the actual ANOVA
We do this by specifying this model using the formula notation, the name of the data set, and the Anova command:
In the command above, you can see that we tell R that we want to know if Species impacts Petal.Length in the dataset df using the aov command (which is the ANOVA command in R) and saving the result into the object fit . The two essential things about the above command are the syntax (i.e., the structure, the ~ symbol, the brackets, etc.) and the aov command. Everything else can be modified to fit your data: Petal.Length and Species are names specified by the iris dataset, and df and fit are just names I arbitrarily chose — they could be anything you would want to analyze.
As you might have noticed, R didn’t report any results yet. We need to tell R that we want to access the information saved into the object called fit using the following command:
This command yields the following output:
This table gives you a lot of information. Still, the key parts we are interested in are the row Species , as this contains the information for the independent variable we specified and the columns F-value and Pr(>F) . If our goal is to reject the null hypothesis (in this case, the null hypothesis is that the species of the Iris don’t have any impact on the petal length) and to accept our actual hypothesis (that the species do have an impact on the petal length), we are looking for high F-values and low p-values . In our case, the F-value is 1180 (which is very high), and the p-value is smaller than 0.0000000000000002 (2e-16 written out, and which is — you might have guessed it — very low). This finding supports our hypothesis that the species of the Iris has an impact on the petal length.
Reporting the results of the ANOVA
If we wanted to report this finding, it is good practice to report the means of the individual groups in the data (species in our case). We do this using the describeBy command from the psych package. Use the following command if you haven’t installed the psych package and want to use it for the first time:
Otherwise, or after installing the psych package, run the following commands.
For the describeBy function, you communicate the variable you want to see described ( Petal.Length ) and the grouping variable ( Species ). We need to specify df in front of the variable names as — differently to the formula notation used by the aov command used above — the describeBy command doesn’t allow us to specify the dataset separately. Running this command yields the following output:
In this output, we can see the three species Setosa, Versicolor, and Virginica, and in the 3rd column, we are presented with the mean of the values for Petal.Length for the three groups.
This finding could be reported in the following way:
We observed differences in petal lengths between the three species of Iris Setosa (M=1.46), Versicolor (M=4.26), and Virginica (M=5.55). An ANOVA showed that these differences between species were significant, i.e. there was a significant effect of the species on the petal length of the flower, F(2,147)=1180, p<.001.
One could also add a graph illustrating the differences using the package ggplot2. Run the below command to install the ggplot2 package if you haven’t already installed it.
And then run the command for the graph.
This produces the graph below. The code is rather complex, and explaining the syntax of ggplot2 goes beyond this article's scope, but try to adapt it and use it for your purposes.
Now imagine you wanted to do the above analysis but while controlling for other features of the flower's size. After all, it might be that the species does not influence petal.length specifically, but more generally, the species influences the overall size of the plant. So the question is: Controlling for other plant size measures, does species still influence the length of the petal? In our analysis, the other metric that represents plant size will be the variable Sepal.Length , which is also available in the iris dataset. So, we specify our extended model just by adding this new covariate. This is the ANCOVA — we are analyzing the impact of a categorical independent variable on an interval dependent variable while controlling for one or more covariates.
However and unlike before, we cannot simply run the summary command on the fit2 object now. Because by default and very strangely, base R uses type I errors as default. Type I errors are not a problem when performing a simple ANOVA. However, if we are trying to run an ANCOVA, type I errors will lead to wrong results, and we need to use type III errors. If you are interested in what Type I vs. Type III errors are, I can recommend the Jane Superbrain section at the bottom of page 457 in Andy Field’s book “Discovering Statistics Using R .”
Therefore, we need to use another function from a different package to specify the exact type of errors we wish to use. We will be using the car package. Run the below command to install the car package if you haven’t already installed it. It is the same package as for Levene’s test above, so if you’ve been following the tutorial from the beginning, you might not have to install and load the package. If you’re unsure, just run the commands to be safe.
Then, run the car Anova command on our fit2 object, specifying that we wish to use Type III errors.
This produces the following output:
As you can see in our row Species, column Pr(>F), which is the p-value, species still has a significant impact on the length of the petal, even when controlling for the length of the sepal. This could imply that the flowers really have different proportions and aren’t simply bigger or smaller because of the species.
Try running the summary command on the fit2 object to see that the summary command produces incorrect results; however, if you were to look at the fit2 object through the summary.lm command, which produces the output in the style of a linear model (i.e., OLS) and also uses type III errors, you would get the same correct information in the output as via the Anova command from the car package.
We could report this finding as shown below.
The covariate, sepal length, was significantly related to the flowers’ petal length, F(1,146)=194.95, p<.001. There was also a significant effect of the species of the plant on the petal length after controlling for the effect of the sepal length, F(2,146)=624.99, p<.001.
After completing either the ANOVA or ANCOVA, you should normally be running the appropriate post hoc tests to reveal more about the effects. After all, an ANOVA is merely an inferential test, i.e., it tests whether the data is distributed in a way that we would expect if the distribution were random. So far, we only know that there is a relationship between species and sepal length —we know that sepal length is non-randomly distributed when grouped by species. However, how exactly does species influence sepal length? One way of achieving this is by breaking down the variance explained by the independent variable of interest into its components . You can read more about this in my article on planned contrasts .
More from Towards Data Science
Your home for data science. A Medium publication sharing concepts, ideas and codes.
About Help Terms Privacy
Get the Medium app

Matthieu Renard
Text to speech
Stack Exchange Network
Stack Exchange network consists of 181 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. It only takes a minute to sign up.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
How to report the results of an anova() model comparison?
I have two models with the same outcome variable and the same independent variables, but Model 1 has two more independent variables than Model 2 . I do an anova() test in R to check whether these two variables are important. According to the output they are.
How do I have to present these results in an academic paper? I'd prefer mentioning it within the text, rather than adding an entire table.
Is something like this ok: F(60,916, 60,914) = 128.54, p < 0.001? I've seen some sources use this for anova tests which compare a variable across groups.

- 2 $\begingroup$ Charlotte, it depends on publication norm you use. For example, I use APA norms (American Psychological Association) and they recommend to report anova in this format: F(2, 60914) = 128.54, p < .001. Note that first degrees of freedom are the difference of d.f. between your model and its submodel. $\endgroup$ – Daniel Dostal Feb 26, 2021 at 9:58
- $\begingroup$ Thank you @DanielDostal, I also use APA so this solves my question! $\endgroup$ – Charlotte Feb 26, 2021 at 10:07
- $\begingroup$ Follow up question: Why is it F(2, 60914), and not F(2, 60916)? $\endgroup$ – Dunen Aug 1, 2022 at 10:51
Know someone who can answer? Share a link to this question via email , Twitter , or Facebook .
Your answer, sign up or log in, post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service , privacy policy and cookie policy
Browse other questions tagged r anova reporting or ask your own question .
- Featured on Meta
- We've added a "Necessary cookies only" option to the cookie consent popup
Hot Network Questions
- What did Ctrl+NumLock do?
- Does melting sea ice rise the global sea level?
- If `provider` is essential in communicating with the blockchain, how is this following code working?
- What do we really know about lotteries?
- Questions about bringing spices or nuts to New Zealand
- A story about a girl and a mechanical girl with a tattoo travelling on a train
- Does Matt. 5:8 also tell us that willing unrepentant sin causes doubt?
- Wildly different answers replicating a GEE model from SPSS
- Sending a Soyuz ship interplanetary - a plausible option?
- Is there a non-constant function on the sphere that diagonalizes all rotations simultaneously?
- Imtiaz Germain Primes
- How do you ensure that a red herring doesn't violate Chekhov's gun?
- The 3 attributes of Abraham Avinu mentioned in Artscroll's Midrash Rabbah Insights section?
- "Videre" is to "spectare" what "audire" is to...?
- Why isn't light switch turning off the outlet?
- FAA Handbooks Copyrights
- Is there a single-word adjective for "having exceptionally strong moral principles"?
- How or would these mechanical wings work?
- A Swiss watch company seized my watch, saying it was stolen. I bought it 10 years ago, is that legal?
- Which type of license doesn't require attribution in Github projects?
- Counting Letters in a String
- Is it suspicious or odd to stand by the gate of a GA airport watching the planes?
- Disconnect between goals and daily tasks...Is it me, or the industry?
- Quotients of number fields by certain prime powers
Your privacy
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy .

How to Report ANOVA Results

How to Do a Chi Square Report in APA
Analysis of Variance, or ANOVA, is a statistics technique used to compare the means of two samples. ANOVA tests are conducted assuming that the means of the samples analyzed are the same, and creates an "F" statistic used to accept or reject this assumption. This test is often used to compare three or more samples. Due to the nature of the technique, reporting it can often be difficult. Using a consistent way to report ANOVA results will save you time and help your readers better understand this test.
Prepare a standard table for your ANOVA results, including a row for every sample type and columns for samples, sum of the squares, Degrees of Freedom, F values and P values.
Start your report with an informal description in plain language. Indicate the type of analysis of variance conducted. Indicate the test conducted, the independent variable and the dependent variable, and enumerate the conditions of the test.
Write the formal conclusions of your test using statistical data. Mention the conclusion, the effects of the variables, and the F and probability value. The conclusion is usually to reject or to support the idea that the independent variable influenced the dependent variable. The values are stated directly in mathematical notation (For example p = 0.05).
- Identify the independent and dependent variables from your ANOVA test for Step 2.
Related Articles

How to Use a Chi Square Test in Likert Scales

How to Write Out the Results in APA Style

The Formula for T Scores

How to Interpret SPSS Regression Results

How to Convert a Raven Score to an IQ

APA Format for Multiple Choice Questions
How to Write a Lab Report Conclusion
How to cite the 4th amendment.
- University of Wisconsin: One Way ANOVA

Stats and R
- Hypothesis test
- Inferential statistics
Introduction
Aim and hypotheses of anova, variable type, independence, equality of variances - homogeneity, another method to test normality and homogeneity, preliminary analyses, interpretations of anova results, what’s next, issue of multiple testing, tukey hsd test, dunnett’s test, other p -values adjustment methods, visualization of anova and post-hoc tests on the same plot.

ANOVA (ANalysis Of VAriance) is a statistical test to determine whether two or more population means are different. In other words, it is used to compare two or more groups to see if they are significantly different .
In practice, however, the:
- Student t-test is used to compare 2 groups ;
- ANOVA generalizes the t-test beyond 2 groups, so it is used to compare 3 or more groups .
Note that there are several versions of the ANOVA (e.g., one-way ANOVA, two-way ANOVA, mixed ANOVA, repeated measures ANOVA, etc.). In this article, we present the simplest form only—the one-way ANOVA 1 —and we refer to it as ANOVA in the remaining of the article.
Although ANOVA is used to make inference about means of different groups, the method is called “analysis of variance ”. It is called like this because it compares the “between” variance (the variance between the different groups) and the variance “within” (the variance within each group). If the between variance is significantly larger than the within variance, the group means are declared to be different. Otherwise, we cannot conclude one way or the other. The two variances are compared to each other by taking the ratio ( \(\frac{variance_{between}}{variance_{within}}\) ) and then by comparing this ratio to a threshold from the Fisher probability distribution (a threshold based on a specific significance level, usually 5%).
This is enough theory regarding the ANOVA method for now. In the remaining of this article, we discuss about it from a more practical point of view, and in particular we will cover the following points:
- the aim of the ANOVA, when it should be used and the null/alternative hypothesis
- the underlying assumptions of the ANOVA and how to check them
- how to perform the ANOVA in R
- how to interpret results of the ANOVA
- understand the notion of post-hoc test and interpret the results
- how to visualize results of ANOVA and post-hoc tests
Data for the present article is the penguins dataset (an alternative to the well-known iris dataset), accessible via the {palmerpenguins} package :
The dataset contains data for 344 penguins of 3 different species (Adelie, Chinstrap and Gentoo). The dataset contains 8 variables, but we focus only on the flipper length and the species for this article, so we keep only those 2 variables:
(If you are unfamiliar with the pipe operator ( %>% ), you can also select variables with penguins[, c("species", "flipper_length_mm")] . Learn more ways to select variables in the article about data manipulation .)
Below some basic descriptive statistics and a plot (made with the {ggplot2} package ) of our dataset before we proceed to the goal of the ANOVA:
Flipper length varies from 172 to 231 mm, with a mean of 200.9 mm. There are respectively 152, 68 and 124 penguins of the species Adelie, Chinstrap and Gentoo.

Here, the factor is the species variable which contains 3 modalities or groups (Adelie, Chinstrap and Gentoo).
As mentioned in the introduction, the ANOVA is used to compare groups (in practice, 3 or more groups). More generally, it is used to:
- study whether measurements are similar across different modalities (also called levels or treatments in the context of ANOVA) of a categorical variable
- compare the impact of the different levels of a categorical variable on a quantitative variable
- explain a quantitative variable based on a qualitative variable
In this context and as an example, we are going to use an ANOVA to help us answer the question: “ Is the length of the flippers different between the 3 species of penguins? ”.
The null and alternative hypothesis of an ANOVA are:
- \(H_0\) : \(\mu_{Adelie} = \mu_{Chinstrap} = \mu_{Gentoo}\) ( \(\Rightarrow\) the 3 species are equal in terms of flipper length)
- \(H_1\) : at least one mean is different ( \(\Rightarrow\) at least one species is different from the other 2 species in terms of flipper length)
Be careful that the alternative hypothesis is not that all means are different. The opposite of all means being equal ( \(H_0\) ) is that at least one mean is different from the others ( \(H_1\) ).
In this sense, if the null hypothesis is rejected, it means that at least one species is different from the other 2, but not necessarily that all 3 species are different from each other. It could be that flipper length for the species Gentoo is different than for the species Chinstrap and Adelie, but flipper length is similar between Chinstrap and Adelie. Other types of test (known as post-hoc tests and covered in this section ) must be performed to test whether all 3 species differ.
Underlying assumptions of ANOVA
As for many statistical tests , there are some assumptions that need to be met in order to be able to interpret the results. When one or several assumptions are not met, although it is technically possible to perform these tests, it would be incorrect to interpret the results and trust the conclusions.
Below are the assumptions of the ANOVA, how to test them and which other tests exist if an assumption is not met:
- Variable type : ANOVA requires a mix of one continuous quantitative dependent variable (which corresponds to the measurements to which the question relates) and one qualitative independent variable (with at least 2 levels which will determine the groups to compare).
- Independence : the data, collected from a representative and randomly selected portion of the total population , should be independent between groups and within each group. The assumption of independence is most often verified based on the design of the experiment and on the good control of experimental conditions rather than via a formal test. If you are still unsure about independence based on the experiment design, ask yourself if one observation is related to another (if one observation has an impact on another) within each group or between the groups themselves. If not, it is most likely that you have independent samples . If observations between samples (forming the different groups to be compared) are dependent (for example, if three measurements have been collected on the same individuals as it is often the case in medical studies when measuring a metric (i) before, (ii) during and (iii) after a treatment), the repeated measures ANOVA should be preferred in order to take into account the dependency between the samples.
- In case of small samples, residuals 2 should follow approximately a normal distribution . The normality assumption can be tested visually thanks to a histogram and a QQ-plot , and/or formally via a normality test such as the Shapiro-Wilk or Kolmogorov-Smirnov test. If, even after a transformation of your data (e.g., logarithmic transformation, square root, Box-Cox, etc.), the residuals still do not follow approximately a normal distribution, the Kruskal-Wallis test can be applied ( kruskal.test(variable ~ group, data = dat in R). This non-parametric test, robust to non normal distributions, has the same goal than the ANOVA—compare 3 or more groups—but it uses sample medians instead of sample means to compare groups.
- In case of large samples, normality is not required (this is a common misconception!). By the central limit theorem , sample means of large samples are often well-approximated by a normal distribution even if the data are not normally distributed ( Stevens 2013 ) . 3 It is therefore not required to test the normality assumption when the number of observations in each group/sample is large (usually \(n \ge 30\) ).
- Equality of variances : the variances of the different groups should be equal in the populations (an assumption called homogeneity of the variances, or even sometimes referred as homoscedasticity, as opposed to heteroscedasticity if variances are different across groups). This assumption can be tested graphically (by comparing the dispersion in a boxplot or dotplot for instance), or more formally via the Levene’s test ( leveneTest(variable ~ group) from the {car} package) or Bartlett’s test, among others. If the hypothesis of equal variances is rejected, another version of the ANOVA can be used: the Welch ANOVA ( oneway.test(variable ~ group, var.equal = FALSE) ). Note that the Welch ANOVA does not require homogeneity of the variances, but the distributions should still follow approximately a normal distribution. Note that the Kruskal-Wallis test does not require the assumptions of normality nor homoscedasticity of the variances. 4
- use the non-parametric version (i.e., the Kruskal-Wallis test)
- transform your data (logarithmic or Box-Cox transformation, among others)
- or remove them (be careful)
Choosing the appropriate test depending on whether assumptions are met may be confusing so here is a brief summary:
- Check that your observations are independent.
- If variances are equal, use ANOVA .
- If variances are not equal, use the Welch ANOVA .
- If normality is not assumed, use the Kruskal-Wallis test .
Now that we have seen the underlying assumptions of the ANOVA, we review them specifically for our dataset before applying the appropriate version of the test.
The dependent variable flipper_length_mm is a quantitative variable and the independent variable species is a qualitative one (with 3 levels corresponding to the 3 species). So we have a mix of the two types of variable and this assumption is met.
Independence of the observations is assumed as data have been collected from a randomly selected portion of the population and measurements within and between the 3 samples are not related.
The independence assumption is most often verified based on the design of the experiment and on the good control of experimental conditions, as it is the case here.
If you really want to test it more formally, you can, however, test it via a statistical test—the Durbin-Watson test (in R: durbinWatsonTest(res_lm) where res_lm is a linear model). The null hypothesis of this test specifies an autocorrelation coefficient = 0, while the alternative hypothesis specifies an autocorrelation coefficient \(\ne\) 0.
Since the smallest sample size per group (i.e., per species) is 68, we have large samples. Therefore, we do not need to check normality.
Usually, we would directly test the homogeneity of the variances without testing normality. However, for the sake of illustration, we act as if the sample sizes were small in order to illustrate what would need to be done in that case.
Remember that normality of residuals can be tested visually via a histogram and a QQ-plot , and/or formally via a normality test (Shapiro-Wilk test for instance).
Before checking the normality assumption, we first need to compute the ANOVA (more on that in this section ). We then save the results in res_aov :
We can now check normality visually:

From the histogram and QQ-plot above, we can already see that the normality assumption seems to be met. Indeed, the histogram roughly form a bell curve, indicating that the residuals follow a normal distribution. Furthermore, points in the QQ-plots roughly follow the straight line and most of them are within the confidence bands, also indicating that residuals follow approximately a normal distribution.
Some researchers stop here and assume that normality is met, while others also test the assumption via a formal normality test . It is your choice to test it (i) only visually, (ii) only via a normality test, or (iii) both visually AND via a normality test. Bear in mind, however, the two following points:
- ANOVA is quite robust to small deviations from normality. This means that it is not an issue (from the perspective of the interpretation of the ANOVA results) if a small number of points deviates slightly from the normality,
- normality tests are sometimes quite conservative, meaning that the null hypothesis of normality may be rejected due to a limited deviation from normality. This is especially the case with large samples as power of the test increases with the sample size.
In practice, I tend to prefer the (i) visual approach only, but again, this is a matter of personal choice and also depends on the context of the analysis.
Still for the sake of illustration, we also now test the normality assumption via a normality test. You can use the Shapiro-Wilk test or the Kolmogorov-Smirnov test, among others.
Remember that the null and alternative hypothesis of these tests are:
- \(H_0\) : data come from a normal distribution
- \(H_1\) : data do not come from a normal distribution
In R, we can test normality of the residuals with the Shapiro-Wilk test thanks to the shapiro.test() function:
P -value of the Shapiro-Wilk test on the residuals is larger than the usual significance level of \(\alpha = 5\%\) , so we do not reject the hypothesis that residuals follow a normal distribution ( p -value = 0.261).
This result is in line with the visual approach. In our case, the normality assumption is thus met both visually and formally.
Side note: Remind that the p-value is the probability of having observations as extreme as the ones we have observed in the sample(s) given that the null hypothesis is true. If the p-value \(< \alpha\) (indicating that it is not likely to observe the data we have in the sample given that the null hypothesis is true), the null hypothesis is rejected, otherwise the null hypothesis is not rejected. See more about p-value and significance level if you are unfamiliar with those important statistical concepts.
Remember that if the normality assumption was not reached, some transformation(s) would need to be applied on the raw data in the hope that residuals would better fit a normal distribution, or you would need to use the non-parametric version of the ANOVA—the Kruskal-Wallis test .
As pointed out by a reader (see comments at the very end of the article), the normality assumption can also be tested on the “raw” data (i.e., the observations) instead of the residuals. However, if you test the normality assumption on the raw data, it must be tested for each group separately as the ANOVA requires normality in each group .
Testing normality on all residuals or on the observations per group is equivalent, and will give similar results. Indeed, saying “The distribution of Y within each group is normally distributed” is the same as saying “The residuals are normally distributed”.
Remember that residuals are the distance between the actual value of Y and the mean value of Y for a specific value of X, so the grouping variable is induced in the computation of the residuals.
So in summary, in ANOVA you actually have two options for testing normality:
- Checking normality separately for each group on the “raw” data (Y values)
- Checking normality on all residuals (but not per group)
In practice, you will see that it is often easier to just use the residuals and check them all together, especially if you have many groups or few observations per group.
If you are still not convinced: remember that an ANOVA is a special case of a linear model. Suppose your independent variable is a continuous variable (instead of a categorical variable ), the only option you have left is to check normality on the residuals, which is precisely what is done for testing normality in linear regression models.
Assuming residuals follow a normal distribution, it is now time to check whether the variances are equal across species or not. The result will have an impact on whether we use the ANOVA or the Welch ANOVA.
This can again be verified visually—via a boxplot or dotplot —or more formally via a statistical test (Levene’s test, among others).
Visually, we have:

Both the boxplot and the dotplot show a similar variance for the different species. In the boxplot, this can be seen by the fact that the boxes and the whiskers have a comparable size for all species.
There are a couple of outliers as shown by the points outside the whiskers, but this does not change the fact that the dispersion is more or less the same between the different species.
In the dotplot, this can be seen by the fact that points for all 3 species have more or less the same range , a sign of the dispersion and thus the variance being similar.
Like the normality assumption, if you feel that the visual approach is not sufficient, you can formally test for equality of the variances with a Levene’s or Bartlett’s test. Notice that the Levene’s test is less sensitive to departures from normal distribution than the Bartlett’s test.
The null and alternative hypothesis for both tests are:
- \(H_0\) : variances are equal
- \(H_1\) : at least one variance is different
In R, the Levene’s test can be performed thanks to the leveneTest() function from the {car} package:
The p -value being larger than the significance level of 0.05, we do not reject the null hypothesis, so we cannot reject the hypothesis that variances are equal between species ( p -value = 0.719).
This result is also in line with the visual approach, so the homogeneity of variances is met both visually and formally.
For your information, it is also possible to test the homogeneity of the variances and the normality of the residuals visually (and both at the same time) via the plot() function:

Plot on the left hand side shows that there is no evident relationships between residuals and fitted values (the mean of each group), so homogeneity of variances is assumed. If homogeneity of variances was violated, the red line would not be flat (horizontal).
Plot on the right hand side shows that residuals follow approximately a normal distribution, so normality is assumed. If normality was violated, points would consistently deviate from the dashed line.
There are several techniques to detect outliers . In this article, we focus on the most simple one (yet very efficient)—the visual approach via a boxplot:

There is one outlier in the group Adelie , as defined by the interquartile range criterion. This point is, however, not seen as a significant outlier so we can assume that the assumption of no significant outliers is met.
We showed that all assumptions of the ANOVA are met.
We can thus proceed to the implementation of the ANOVA in R, but first, let’s do some preliminary analyses to better understand the research question.
A good practice before actually performing the ANOVA in R is to visualize the data in relation to the research question. The best way to do so is to draw and compare boxplots of the quantitative variable flipper_length_mm for each species.
This can be done with the boxplot() function in base R (same code than the visual check of equal variances):

Or with the {ggplot2} package :

The boxplots above show that, at least for our sample, penguins of the species Gentoo seem to have the biggest flipper, and Adelie species the smallest flipper.
Besides a boxplot for each species, it is also a good practice to compute some descriptive statistics such as the mean and standard deviation by species.
This can be done, for instance, with the aggregate() function:
or with the summarise() and group_by() functions from the {dplyr} package:
Mean is also the lowest for Adelie and highest for Gentoo . Boxplots and descriptive statistics are, however, not enough to conclude that flippers are significantly different in the 3 populations of penguins.
As you guessed by now, only the ANOVA can help us to make inference about the population given the sample at hand, and help us to answer the initial research question “Is the length of the flippers different between the 3 species of penguins?”.
ANOVA in R can be done in several ways, of which two are presented below:
- With the oneway.test() function:
- With the summary() and aov() functions:
As you can see from the two outputs above, the test statistic ( F = in the first method and F value in the second one) and the p -value ( p-value in the first method and Pr(>F) in the second one) are exactly the same for both methods, which means that in case of equal variances, results and conclusions will be unchanged.
The advantage of the first method is that it is easy to switch from the ANOVA (used when variances are equal) to the Welch ANOVA (used when variances are un equal). This can be done by replacing var.equal = TRUE by var.equal = FALSE , as presented below:
The advantage of the second method, however, is that:
- the full ANOVA table (with degrees of freedom, mean squares, etc.) is printed, which may be of interest in some (theoritical) cases
- results of the ANOVA ( res_aov ) can be saved for later use (especially useful for post-hoc tests )
Given that the p -value is smaller than 0.05, we reject the null hypothesis, so we reject the hypothesis that all means are equal. Therefore, we can conclude that at least one species is different than the others in terms of flippers length ( p -value < 2.2e-16).
( For the sake of illustration , if the p -value was larger than 0.05: we cannot reject the null hypothesis that all means are equal, so we cannot reject the hypothesis that the 3 considered species of penguins are equal in terms of flippers length.)
A nice and easy way to report results of an ANOVA in R is with the report() function from the {report} package:
As you can see, the function interprets the results for you and indicates a large and significant main effect of the species on the flipper length ( p -value < .001).
Note that the report() function can be used for other analyses. See more tips and tricks in R if you find this one useful.
If the null hypothesis is not rejected ( p -value \(\ge\) 0.05), it means that we do not reject the hypothesis that all groups are equal. The ANOVA more or less stops here.
Other types of analyses can be performed of course, but—given the data at hand—we could not prove that at least one group was different so we usually do not go further with the ANOVA.
On the contrary, if the null hypothesis is rejected (as it is our case since the p -value < 0.05), we proved that at least one group is different. We can decide to stop here if we are only interested to test whether all species are equal in terms of flippers length.
But most of the time, when we showed thanks to an ANOVA that at least one group is different, we are also interested in knowing which one(s) is(are) different. Results of an ANOVA, however, do NOT tell us which group(s) is(are) different from the others.
To test this, we need to use other types of test, referred as post-hoc tests (in Latin, “after this”, so after obtaining statistically significant ANOVA results) or multiple pairwise-comparison tests. 5
This family of statistical tests is the topic of the following sections.
Post-hoc test
In order to see which group(s) is(are) different from the others, we need to compare groups 2 by 2 . In practice, since there are 3 species, we are going to compare species 2 by 2 as follows:
- Chinstrap versus Adelie
- Gentoo vs. Adelie
- Gentoo vs. Chinstrap
In theory, we could compare species thanks to 3 Student’s t-tests since we need to compare 2 groups and a t-test is used precisely in that case.
However, if several t-tests are performed, the issue of multiple testing (also referred as multiplicity) arises. In short, when several statistical tests are performed, some will have p -values less than \(\alpha\) purely by chance, even if all null hypotheses are in fact true.
To demonstrate the problem, consider our case where we have 3 hypotheses to test and a desired significance level of 0.05.
The probability of observing at least one significant result (at least one p -value < 0.05) just due to chance is:
\[\begin{equation} \begin{split} P(\text{at least 1 sig. result}) & = 1 - P(\text{no sig. results}) \\ & = 1 - (1 - 0.05)^3 \\ & = 0.142625 \end{split} \end{equation}\]
So, with as few as 3 tests being considered, we already have a 14.26% chance of observing at least one significant result, even if all of the tests are actually not significant.
And as the number of groups increases, the number of comparisons increases as well, so the probability of having a significant result simply due to chance keeps increasing.
For example, with 10 groups we need to make 45 comparisons and the probability of having at least one significant result by chance becomes \(1 - (1 - 0.05)^{45} = 90\%\) . So it is very likely to observe a significant result just by chance when comparing 10 groups, and when we have 14 groups or more we are almost certain (99%) to have a false positive!
Post-hoc tests take into account that multiple tests are done and deal with the problem by adjusting \(\alpha\) in some way, so that the probability of observing at least one significant result due to chance remains below our desired significance level. 6
Post-hoc tests in R and their interpretation
Post-hoc tests are a family of statistical tests so there are several of them. The most common ones are:
- Tukey HSD , used to compare all groups to each other (so all possible comparisons of 2 groups).
- Dunnett , used to make comparisons with a reference group . For example, consider 2 treatment groups and one control group. If you only want to compare the 2 treatment groups with respect to the control group, and you do not want to compare the 2 treatment groups to each other, the Dunnett’s test is preferred.
- Bonferroni correction if one has a set of planned comparisons to do.
The Bonferroni correction is simple: you simply divide the desired global \(\alpha\) level by the number of comparisons.
In our example, we have 3 comparisons so if we want to keep a global \(\alpha = 0.05\) , we have \(\alpha' = \frac{0.05}{3} = 0.0167\) . We can then simply perform a Student’s t-test for each comparison, and compare the obtained \(p\) -values with this new \(\alpha'\) .
The other two post-hoc tests are presented in the next sections.
Note that variances are assumed to be equal for all three methods (unless you use the Welch’s t-test instead of the Student’s t-test with the Bonferroni correction). If variances are not equal, you can use the Games-Howell test, among others.
In our case, since there is no “reference” species and we are interested in comparing all species, we are going to use the Tukey HSD test.
In R, the Tukey HSD test is done as follows. This is where the second method to perform the ANOVA comes handy because the results ( res_aov ) are reused for the post-hoc test:
In the output of the Tukey HSD test, we are interested in the table displayed after Linear Hypotheses: , and more precisely, in the first and last column of the table. The first column shows the comparisons which have been made; the last column ( Pr(>|t|) ) shows the adjusted 7 p -values for each comparison (with the null hypothesis being the two groups are equal and the alternative hypothesis being the two groups are different).
It is these adjusted p -values that are used to test whether two groups are significantly different or not, and we can be confident that the entire set of comparisons collectively has an error rate of 0.05.
In our example, we tested:
- Chinstrap versus Adelie (line Chinstrap - Adelie == 0 )
- Gentoo vs. Adelie (line Gentoo - Adelie == 0 )
- Gentoo vs. Chinstrap (line Gentoo - Chinstrap == 0 )
All three ajusted p -values are smaller than 0.05, so we reject the null hypothesis for all comparisons, which means that all species are significantly different in terms of flippers length.
The results of the post-hoc test can be visualized with the plot() function:

We see that the confidence intervals do not cross the zero line, which indicate that all groups are significantly different.
Note that the Tukey HSD test can also be done in R with the TukeyHSD() function:
With this code, it is the column p adj (also the last column) which is of interest. Notice that the conclusions are the same than above: all species are significantly different in terms of flippers length.
The results can also be visualized with the plot() function:

We have seen in this section that as the number of groups increases, the number of comparisons also increases. And as the number of comparisons increases , the post-hoc analysis must lower the individual significance level even further, which leads to lower statistical power (so a difference between group means in the population is less likely to be detected).
One method to mitigate this and increase the statistical power is by reducing the number of comparisons. This reduction allows the post-hoc procedure to use a larger individual error rate to achieve the desired global error rate.
While comparing all possible groups with a Tukey HSD test is a common approach, many studies have a control group and several treatment groups. For these studies, you may need to compare the treatment groups only to the control group, which reduces the number of comparisons.
Dunnett’s test does precisely this—it only compares a group taken as reference to all other groups, but it does not compare all groups to each others.
So to recap:
- the Tukey HSD test allows to compares all groups but at the cost of less power
- the Dunnett’s test allows to only make comparisons with a reference group , but with the benefit of more power
Now, again for the sake of illustration, consider that the species Adelie is the reference species and we are only interested in comparing the reference species against the other 2 species. In that scenario, we would use the Dunnett’s test.
In R, the Dunnett’s test is done as follows (the only difference with the code for the Tukey HSD test is in the line linfct = mcp(species = "Dunnett") ):
The interpretation is the same as for the Tukey HSD test’s except that in the Dunett’s test we only compare:
Both adjusted p -values (displayed in the last column) are below 0.05, so we reject the null hypothesis for both comparisons.
This means that both the species Chinstrap and Gentoo are significantly different from the reference species Adelie in terms of flippers length. (Nothing can be said about the comparison between Chinstrap and Gentoo though.)
Again, the results of the post-hoc test can be visualized with the plot() function:

We see that the confidence intervals do not cross the zero line, which indicate that both the species Gentoo and Chinstrap are significantly different from the reference species Adelie.
Note that in R, by default, the reference category for a factor variable is the first category in alphabetical order. This is the reason that, by default, the reference species is Adelie.
The reference category can be changed with the relevel() function (or with the {questionr} addin ). Considering that we want Gentoo as the reference category instead of Adelie:
Gentoo now being the first category of the three, it is indeed considered as the reference level.
In order to perform the Dunnett’s test with the new reference we first need to rerun the ANOVA to take into account the new reference:
We can then run the Dunett’s test with the new results of the ANOVA:

From the results above we conclude that Adelie and Chinstrap species are significantly different from Gentoo species in terms of flippers length (adjusted p -values < 1e-10).
Note that even if your study does not have a reference group which you can compare to the other groups, it is still often better to do multiple comparisons determined by some research questions than to do all-pairwise tests. By reducing the number of post-hoc comparisons to what is necessary only, and no more, you maximize the statistical power. 8
For the interested readers, note that you can use other p -values adjustment methods by using the pairwise.t.test() function:
By default, the Holm method is applied but other methods exist. See ?p.adjust for all available options.
If you are interested in including results of ANOVA and post-hoc tests on the same plot (directly on the boxplots), here are two pieces of code which may be of interest to you.
The first one is edited by me based on the code found in this article :

And the second method is from the {ggstatsplot} package:

As you can see on the above plot, boxplots by species are presented together with p -values of the ANOVA (after p = in the subtitle of the plot) and p -values of the post-hoc tests (above each comparison).
Besides the fact that these methods can be used to combine a visual representation and statistical results on the same plot, they also have the advantage that you can perform multiple ANOVA tests at once. See more information in this article .
In this article, we reviewed the goals and hypotheses of an ANOVA, what are the assumptions which need to be verified before being able to trust the results (namely, independence, normality and homogeneity), we then showed how to do an ANOVA in R and how to interpret the results .
An article about ANOVA would not be complete without discussing about post-hoc tests , and in particular, the Tukey HSD —to compare all groups—and the Dunnett’s test—to compare a reference group to all other groups.
Last but not least, we showed how to visualize the data and the results of the ANOVA and post-hoc tests in the same plot.
Thanks for reading.
As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.
(Note that this article is available for download on my Gumroad page .)
Note that it is called one-way or one-factor ANOVA because the means relate to the different modalities of a single independent variable, or factor. ↩︎
Residuals (denoted \(\epsilon\) ) are the differences between the observed values of the dependent variable ( \(y\) ) and the predicted values ( \(\hat{y}\) ). In the context of ANOVA, residuals correspond to the differences between the observed values and the mean of all values for that group. ↩︎
Stevens ( 2013 ) wrote, in p. 57, “Numerous studies have examined the effect of violations of assumptions in ANOVA, and an excellent summary of this literature has been provided by Glass, Peckham, and Sanders (1972). Their review indicates that non normality has only a slight effect on the type I error rate, even for very skewed or kurtotic distributions. For example, the actual \(\alpha\) s for some very non-normal populations were only .055 or .06: very minor deviations from the nominal level of .05. […] The basic reason is the Central Limit Theorem , which states that the sum of independent observations having any distribution whatsoever approaches a normal distribution as the number of observations increases. To be somewhat more specific, Bock (1975) notes,”even for distributions which depart markedly from normality, sums of 50 or more observations approximate to normality. For moderately non-normal distributions the approximation is good with as few as 10 to 20 observations” (p. 111). Now since the sums of independent observations approach normality rapidly, so do the means, and the sampling distribution of F is based on means. Thus the sampling distribution of F is only slightly affected, and therefore the critical values when sampling from normal and non-normal distributions will not differ by much. Lack of normality due to skewness also has only a slight effect on power (a few hundredths).” ↩︎
As long as you use the Kruskal-Wallis test to, in fine , compare groups, homoscedasticity is not required. If you wish to compare medians, the Kruskal-Wallis test requires homoscedasticity. See more information about the difference in this article . ↩︎
Note that, as discussed in the comments at the end of the article, post-hoc tests can under some circumstances be done directly (without an ANOVA). See the comments or Hsu ( 1996 ) for more details. ↩︎
Note that you could in principle apply the Bonferroni correction to all tests. For example, in the example above, with 3 tests and a global desired significance level of \(\alpha\) = 0.05, we would only reject a null hypothesis if the p -value is less than \(\frac{0.05}{3}\) = 0.0167. This method is, however, known to be quite conservative, leading to a potentially high rate of false negatives. ↩︎
The p -values are adjusted to keep the global significance level to the desired level. ↩︎
Thanks Michael Friendly for this suggestion. ↩︎
Related articles
- Correlation coefficient and correlation test in R
- One-proportion and chi-square goodness of fit test
- How to perform a one-sample t-test by hand and in R: test on one mean
- One-sample Wilcoxon test in R
- Hypothesis test by hand
Liked this post?
- Get updates every time a new article is published (no spam and unsubscribe anytime):
Yes, receive new posts by email
- Support the blog
Consulting FAQ Contribute Sitemap

Comparing Multiple Means in R
The ANOVA test (or Analysis of Variance ) is used to compare the mean of multiple groups. The term ANOVA is a little misleading. Although the name of the technique refers to variances, the main goal of ANOVA is to investigate differences in means.
This chapter describes the different types of ANOVA for comparing independent groups , including:
- One-way ANOVA : an extension of the independent samples t-test for comparing the means in a situation where there are more than two groups. This is the simplest case of ANOVA test where the data are organized into several groups according to only one single grouping variable (also called factor variable). Other synonyms are: 1 way ANOVA , one-factor ANOVA and between-subject ANOVA .
- two-way ANOVA used to evaluate simultaneously the effect of two different grouping variables on a continuous outcome variable. Other synonyms are: two factorial design , factorial anova or two-way between-subjects ANOVA .
- three-way ANOVA used to evaluate simultaneously the effect of three different grouping variables on a continuous outcome variable. Other synonyms are: factorial ANOVA or three-way between-subjects ANOVA .
Note that, the independent grouping variables are also known as between-subjects factors .
The main goal of two-way and three-way ANOVA is, respectively, to evaluate if there is a statistically significant interaction effect between two and three between-subjects factors in explaining a continuous outcome variable.
You will learn how to:
- Compute and interpret the different types of ANOVA in R for comparing independent groups.
- Check ANOVA test assumptions
- Perform post-hoc tests , multiple pairwise comparisons between groups to identify which groups are different
- Visualize the data using box plots, add ANOVA and pairwise comparisons p-values to the plot
Assumptions
Prerequisites, data preparation, summary statistics, visualization, check assumptions, computation, post-hoc tests, relaxing the homogeneity of variance assumption, post-hoct tests, related book.
Assume that we have 3 groups to compare, as illustrated in the image below. The dashed line indicates the group mean. The figure shows the variation between the means of the groups (panel A) and the variation within each group (panel B), also known as residual variance .
The idea behind the ANOVA test is very simple: if the average variation between groups is large enough compared to the average variation within groups, then you could conclude that at least one group mean is not equal to the others.
Thus, it’s possible to evaluate whether the differences between the group means are significant by comparing the two variance estimates. This is why the method is called analysis of variance even though the main goal is to compare the group means.

Briefly, the mathematical procedure behind the ANOVA test is as follow:
- Compute the within-group variance , also known as residual variance . This tells us, how different each participant is from their own group mean (see figure, panel B).
- Compute the variance between group means (see figure, panel A).
- Produce the F-statistic as the ratio of variance.between.groups/variance.within.groups .
Note that, a lower F value (F < 1) indicates that there are no significant difference between the means of the samples being compared.
However, a higher ratio implies that the variation among group means are greatly different from each other compared to the variation of the individual observations in each groups.
The ANOVA test makes the following assumptions about the data:
- Independence of the observations . Each subject should belong to only one group. There is no relationship between the observations in each group. Having repeated measures for the same participants is not allowed.
- No significant outliers in any cell of the design
- Normality . the data for each design cell should be approximately normally distributed.
- Homogeneity of variances . The variance of the outcome variable should be equal in every cell of the design.
Before computing ANOVA test, you need to perform some preliminary tests to check if the assumptions are met.
Note that, if the above assumptions are not met there are a non-parametric alternative ( Kruskal-Wallis test ) to the one-way ANOVA.
Unfortunately, there are no non-parametric alternatives to the two-way and the three-way ANOVA. Thus, in the situation where the assumptions are not met, you could consider running the two-way/three-way ANOVA on the transformed and non-transformed data to see if there are any meaningful differences.
If both tests lead you to the same conclusions, you might not choose to transform the outcome variable and carry on with the two-way/three-way ANOVA on the original data.
It’s also possible to perform robust ANOVA test using the WRS2 R package.
No matter your choice, you should report what you did in your results.
Make sure you have the following R packages:
- tidyverse for data manipulation and visualization
- ggpubr for creating easily publication ready plots
- rstatix provides pipe-friendly R functions for easy statistical analyses
- datarium : contains required data sets for this chapter
Load required R packages:
Key R functions: anova_test() [rstatix package], wrapper around the function car::Anova() .
One-way ANOVA
Here, we’ll use the built-in R data set named PlantGrowth . It contains the weight of plants obtained under a control and two different treatment conditions.
Load and inspect the data by using the function sample_n_by() to display one random row by groups:
Show the levels of the grouping variable:
If the levels are not automatically in the correct order, re-order them as follow:
The one-way ANOVA can be used to determine whether the means plant growths are significantly different between the three conditions.
Compute some summary statistics (count, mean and sd) of the variable weight organized by groups:
Create a box plot of weight by group :

Outliers can be easily identified using box plot methods, implemented in the R function identify_outliers() [rstatix package].
There were no extreme outliers.
Note that, in the situation where you have extreme outliers, this can be due to: 1) data entry errors, measurement errors or unusual values.
Yo can include the outlier in the analysis anyway if you do not believe the result will be substantially affected. This can be evaluated by comparing the result of the ANOVA test with and without the outlier.
It’s also possible to keep the outliers in the data and perform robust ANOVA test using the WRS2 package.

Normality assumption
The normality assumption can be checked by using one of the following two approaches:
- Analyzing the ANOVA model residuals to check the normality for all groups together. This approach is easier and it’s very handy when you have many groups or if there are few data points per group.
- Check normality for each group separately . This approach might be used when you have only a few groups and many data points per group.
In this section, we’ll show you how to proceed for both option 1 and 2.
Check normality assumption by analyzing the model residuals . QQ plot and Shapiro-Wilk test of normality are used. QQ plot draws the correlation between a given data and the normal distribution.

In the QQ plot, as all the points fall approximately along the reference line, we can assume normality. This conclusion is supported by the Shapiro-Wilk test. The p-value is not significant (p = 0.13), so we can assume normality.
Check normality assumption by groups . Computing Shapiro-Wilk test for each group level. If the data is normally distributed, the p-value should be greater than 0.05.
The score were normally distributed (p > 0.05) for each group, as assessed by Shapiro-Wilk’s test of normality.
Note that, if your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
QQ plot draws the correlation between a given data and the normal distribution. Create QQ plots for each group level:

All the points fall approximately along the reference line, for each cell. So we can assume normality of the data.
If you have doubt about the normality of the data, you can use the Kruskal-Wallis test , which is the non-parametric alternative to one-way ANOVA test.
Homogneity of variance assumption
- The residuals versus fits plot can be used to check the homogeneity of variances.

In the plot above, there is no evident relationships between residuals and fitted values (the mean of each groups), which is good. So, we can assume the homogeneity of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances :
From the output above, we can see that the p-value is > 0.05, which is not significant. This means that, there is not significant difference between variances across groups. Therefore, we can assume the homogeneity of variances in the different treatment groups.
In a situation where the homogeneity of variance assumption is not met, you can compute the Welch one-way ANOVA test using the function welch_anova_test() [rstatix package]. This test does not require the assumption of equal variances.
In the table above, the column ges corresponds to the generalized eta squared (effect size). It measures the proportion of the variability in the outcome variable (here plant weight ) that can be explained in terms of the predictor (here, treatment group ). An effect size of 0.26 (26%) means that 26% of the change in the weight can be accounted for the treatment conditions.
From the above ANOVA table, it can be seen that there are significant differences between groups (p = 0.016), which are highlighted with “*“, F(2, 27) = 4.85, p = 0.16, eta2[g] = 0.26.
- F indicates that we are comparing to an F-distribution (F-test); (2, 27) indicates the degrees of freedom in the numerator (DFn) and the denominator (DFd), respectively; 4.85 indicates the obtained F-statistic value
- p specifies the p-value
- ges is the generalized effect size (amount of variability due to the factor)
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. Key R function: tukey_hsd() [rstatix].
The output contains the following columns:
- estimate : estimate of the difference between means of the two groups
- conf.low , conf.high : the lower and the upper end point of the confidence interval at 95% (default)
- p.adj : p-value after adjustment for the multiple comparisons.
It can be seen from the output, that only the difference between trt2 and trt1 is significant (adjusted p-value = 0.012).
We could report the results of one-way ANOVA as follow:
A one-way ANOVA was performed to evaluate if the plant growth was different for the 3 different treatment groups: ctr (n = 10), trt1 (n = 10) and trt2 (n = 10).
Data is presented as mean +/- standard deviation. Plant growth was statistically significantly different between different treatment groups, F(2, 27) = 4.85, p = 0.016, generalized eta squared = 0.26.
Plant growth decreased in trt1 group (4.66 +/- 0.79) compared to ctr group (5.03 +/- 0.58). It increased in trt2 group (5.53 +/- 0.44) compared to trt1 and ctr group.
Tukey post-hoc analyses revealed that the increase from trt1 to trt2 (0.87, 95% CI (0.17 to 1.56)) was statistically significant (p = 0.012), but no other group differences were statistically significant.

The classical one-way ANOVA test requires an assumption of equal variances for all groups. In our example, the homogeneity of variance assumption turned out to be fine: the Levene test is not significant.
How do we save our ANOVA test, in a situation where the homogeneity of variance assumption is violated?
- The Welch one-way test is an alternative to the standard one-way ANOVA in the situation where the homogeneity of variance can’t be assumed (i.e., Levene test is significant).
- In this case, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

You can also perform pairwise comparisons using pairwise t-test with no assumption of equal variances:
Two-way ANOVA
We’ll use the jobsatisfaction dataset [datarium package], which contains the job satisfaction score organized by gender and education levels.
In this study, a research wants to evaluate if there is a significant two-way interaction between gender and education_level on explaining the job satisfaction score. An interaction effect occurs when the effect of one independent variable on an outcome variable depends on the level of the other independent variables. If an interaction effect does not exist, main effects could be reported.
Load the data and inspect one random row by groups:
In this example, the effect of “education_level” is our focal variable , that is our primary concern. It is thought that the effect of “education_level” will depend on one other factor, “gender”, which are called a moderator variable .
Compute the mean and the SD (standard deviation) of the score by groups:
Create a box plot of the score by gender levels, colored by education levels:

Identify outliers in each cell design:
Check normality assumption by analyzing the model residuals . QQ plot and Shapiro-Wilk test of normality are used.

Check normality assumption by groups . Computing Shapiro-Wilk test for each combinations of factor levels:
The score were normally distributed (p > 0.05) for each cell, as assessed by Shapiro-Wilk’s test of normality.
Create QQ plots for each cell of design:

This can be checked using the Levene’s test:
The Levene’s test is not significant (p > 0.05). Therefore, we can assume the homogeneity of variances in the different groups.
In the R code below, the asterisk represents the interaction effect and the main effect of each variable (and all lower-order interactions).
There was a statistically significant interaction between gender and level of education for job satisfaction score, F(2, 52) = 7.34, p = 0.002 .
A significant two-way interaction indicates that the impact that one factor (e.g., education_level) has on the outcome variable (e.g., job satisfaction score) depends on the level of the other factor (e.g., gender) (and vice versa). So, you can decompose a significant two-way interaction into:
- Simple main effect : run one-way model of the first variable at each level of the second variable,
- Simple pairwise comparisons : if the simple main effect is significant, run multiple pairwise comparisons to determine which groups are different.
For a non-significant two-way interaction , you need to determine whether you have any statistically significant main effects from the ANOVA output. A significant main effect can be followed up by pairwise comparisons between groups.
Procedure for significant two-way interaction
Compute simple main effects.
In our example, you could therefore investigate the effect of education_level at every level of gender or investigate the effect of gender at every level of the variable education_level .
Here, we’ll run a one-way ANOVA of education_level at each levels of gender .
Note that, if you have met the assumptions of the two-way ANOVA (e.g., homogeneity of variances), it is better to use the overall error term (from the two-way ANOVA) as input in the one-way ANOVA model. This will make it easier to detect any statistically significant differences if they exist (Keppel & Wickens, 2004; Maxwell & Delaney, 2004).
When you have failed the homogeneity of variances assumptions, you might consider running separate one-way ANOVAs with separate error terms.
In the R code below, we’ll group the data by gender and analyze the simple main effects of education level on Job Satisfaction score. The argument error is used to specify the ANOVA model from which the pooled error sum of squares and degrees of freedom are to be calculated.
The simple main effect of “education_level” on job satisfaction score was statistically significant for both male and female (p < 0.0001).
In other words, there is a statistically significant difference in mean job satisfaction score between males educated to either school, college or university level, F(2, 52) = 132, p < 0.0001. The same conclusion holds true for females , F(2, 52) = 62.8, p < 0.0001.
Note that, statistical significance of the simple main effect analyses was accepted at a Bonferroni-adjusted alpha level of 0.025. This corresponds to the current level you declare statistical significance at (i.e., p < 0.05) divided by the number of simple main effect you are computing (i.e., 2).
Compute pairwise comparisons
A statistically significant simple main effect can be followed up by multiple pairwise comparisons to determine which group means are different. We’ll now perform multiple pairwise comparisons between the different education_level groups by gender .
You can run and interpret all possible pairwise comparisons using a Bonferroni adjustment. This can be easily done using the function emmeans_test() [rstatix package], a wrapper around the emmeans package, which needs to be installed. Emmeans stands for estimated marginal means (aka least square means or adjusted means).
Compare the score of the different education levels by gender levels:
There was a significant difference of job satisfaction score between all groups for both males and females (p < 0.05).
Procedure for non-significant two-way interaction
Inspect main effects.
If the two-way interaction is not statistically significant, you need to consult the main effect for each of the two variables (gender and education_level) in the ANOVA output.
In our example, there was a statistically significant main effects of education_level (F(2, 52) = 187.89, p < 0.0001) on the job satisfaction score. However, the main effect of gender was not significant, F (1, 52) = 0.74, p = 0.39.
Perform pairwise comparisons between education level groups to determine which groups are significantly different. Bonferroni adjustment is applied. This analysis can be done using simply the R base function pairwise_t_test() or using the function emmeans_test() .
- Pairwise t-test:
All pairwise differences were statistically significant (p < 0.05).
- Pairwise comparisons using Emmeans test. You need to specify the overall model, from which the overall degrees of freedom are to be calculated. This will make it easier to detect any statistically significant differences if they exist.
A two-way ANOVA was conducted to examine the effects of gender and education level on job satisfaction score.
Residual analysis was performed to test for the assumptions of the two-way ANOVA. Outliers were assessed by box plot method, normality was assessed using Shapiro-Wilk’s normality test and homogeneity of variances was assessed by Levene’s test.
There were no extreme outliers, residuals were normally distributed (p > 0.05) and there was homogeneity of variances (p > 0.05).
There was a statistically significant interaction between gender and education level on job satisfaction score, F(2, 52) = 7.33, p = 0.0016, eta2[g] = 0.22 .
Consequently, an analysis of simple main effects for education level was performed with statistical significance receiving a Bonferroni adjustment. There was a statistically significant difference in mean “job satisfaction” scores for both males ( F(2, 52) = 132, p < 0.0001 ) and females ( F(2, 52) = 62.8, p < 0.0001 ) educated to either school, college or university level.
All pairwise comparisons were analyzed between the different education_level groups organized by gender . There was a significant difference of Job Satisfaction score between all groups for both males and females (p < 0.05).

Three-Way ANOVA
The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable.
We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine.
We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.
Load the data and inspect one random row by group combinations:
In this example, the effect of the treatment types is our focal variable , that is our primary concern. It is thought that the effect of treatments will depend on two other factors, “gender” and “risk” level of migraine, which are called moderator variables .
Compute the mean and the standard deviation (SD) of pain_score by groups:
Create a box plot of pain_score by treatment , color lines by risk groups and facet the plot by gender:

Identify outliers by groups:
It can be seen that, the data contain one extreme outlier (id = 57, female at high risk of migraine taking drug X)
Outliers can be due to: 1) data entry errors, 2) measurement errors or 3) unusual values.

In the QQ plot, as all the points fall approximately along the reference line, we can assume normality. This conclusion is supported by the Shapiro-Wilk test. The p-value is not significant (p = 0.4), so we can assume normality.
Check normality assumption by groups . Computing Shapiro-Wilk test for each combinations of factor levels.
The pain scores were normally distributed (p > 0.05) except for one group (female at high risk of migraine taking drug X, p = 0.0086), as assessed by Shapiro-Wilk’s test of normality.
Create QQ plot for each cell of design:

All the points fall approximately along the reference line, except for one group (female at high risk of migraine taking drug X), where we already identified an extreme outlier.
There was a statistically significant three-way interaction between gender, risk and treatment, F(2, 60) = 7.41, p = 0.001 .
If there is a significant three-way interaction effect , you can decompose it into:
- Simple two-way interaction : run two-way interaction at each level of third variable,
- Simple simple main effect : run one-way model at each level of second variable, and
- simple simple pairwise comparisons : run pairwise or other post-hoc comparisons if necessary.
If you do not have a statistically significant three-way interaction , you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.
In this section we’ll describe the procedure for a significant three-way interaction.
Compute simple two-way interactions
You are free to decide which two variables will form the simple two-way interactions and which variable will act as the third (moderator) variable. In our example, we want to evaluate the effect of risk*treatment interaction on pain_score at each level of gender.
Note that, when doing the two-way interaction analysis, it’s better to use the overall error term (or residuals) from the three-way ANOVA result, obtained previously using the whole dataset. This is particularly recommended when the homogeneity of variance assumption is met (Keppel & Wickens, 2004).
The use of group-specific error term is “safer” from any violations of the assumptions. However, the pooled error terms have greater power – particularly with small sample sizes – but are susceptible to problems if there are any violations of assumptions.
In the R code below, we’ll group the data by gender and fit the treatment*risk two-way interaction. The argument error is used to specify the three-way ANOVA model from which the pooled error sum of squares and degrees of freedom are to be calculated.
There was a statistically significant simple two-way interaction between risk and treatment ( risk:treatment ) for males, F(2, 60) = 5.25, p = 0.008, but not for females, F(2, 60) = 2.87, p = 0.065.
For males, this result suggests that the effect of treatment on “pain_score” depends on one’s “risk” of migraine. In other words, the risk moderates the effect of the type of treatment on pain_score.
Note that, statistical significance of a simple two-way interaction was accepted at a Bonferroni-adjusted alpha level of 0.025. This corresponds to the current level you declare statistical significance at (i.e., p < 0.05) divided by the number of simple two-way interaction you are computing (i.e., 2).
Compute simple simple main effects
A statistically significant simple two-way interaction can be followed up with simple simple main effects . In our example, you could therefore investigate the effect of treatment on pain_score at every level of risk or investigate the effect of risk at every level of treatment .
You will only need to do this for the simple two-way interaction for “males” as this was the only simple two-way interaction that was statistically significant. The error term again comes from the three-way ANOVA.
Group the data by gender and risk and analyze the simple simple main effects of treatment on pain_score:
In the table above, we only need the results for the simple simple main effects of treatment for: (1) “males” at “low” risk; and (2) “males” at “high” risk.
Statistical significance was accepted at a Bonferroni-adjusted alpha level of 0.025, that is 0.05 divided y the number of simple simple main effects you are computing (i.e., 2).
There was a statistically significant simple simple main effect of treatment for males at high risk of migraine, F(2, 60) = 14.8, p < 0.0001), but not for males at low risk of migraine, F(2, 60) = 0.66, p = 0.521.
This analysis indicates that, the type of treatment taken has a statistically significant effect on pain_score in males who are at high risk.
In other words, the mean pain_score in the treatment X, Y and Z groups was statistically significantly different for males who at high risk, but not for males at low risk.
Compute simple simple comparisons
A statistically significant simple simple main effect can be followed up by multiple pairwise comparisons to determine which group means are different. This can be easily done using the function emmeans_test() [rstatix package] described in the previous section.
Compare the different treatments by gender and risk variables:
In the pairwise comparisons table above, we are interested only in the simple simple comparisons for males at a high risk of a migraine headache. In our example, there are three possible combinations of group differences.
For male at high risk, there was a statistically significant mean difference between treatment X and treatment Y of 10.4 (p.adj < 0.001), and between treatment X and treatment Z of 13.1 (p.adj < 0.0001).
However, the difference between treatment Y and treatment Z (2.66) was not statistically significant, p.adj = 0.897.
A three-way ANOVA was conducted to determine the effects of gender, risk and treatment on migraine headache episode pain_score .
Residual analysis was performed to test for the assumptions of the three-way ANOVA. Normality was assessed using Shapiro-Wilk’s normality test and homogeneity of variances was assessed by Levene’s test.
Residuals were normally distributed (p > 0.05) and there was homogeneity of variances (p > 0.05).
Statistical significance was accepted at the p < 0.025 level for simple two-way interactions and simple simple main effects. There was a statistically significant simple two-way interaction between risk and treatment for males, F(2, 60) = 5.2, p = 0.008, but not for females, F(2, 60) = 2.8, p = 0.065.
All simple simple pairwise comparisons, between the different treatment groups, were run for males at high risk of migraine with a Bonferroni adjustment applied.
There was a statistically significant mean difference between treatment X and treatment Y. However, the difference between treatment Y and treatment Z, was not statistically significant.

This article describes how to compute and interpret ANOVA in R. We also explain the assumptions made by ANOVA tests and provide practical examples of R codes to check whether the test assumptions are met.
Recommended for you
This section contains best data science and self-development resources to help you on your path.
Coursera - Online Courses and Specialization
Data science.
- Course: Machine Learning: Master the Fundamentals by Stanford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University
Popular Courses Launched in 2020
- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services
Trending Courses
- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts
Amazing Selling Machine
- Free Training - How to Build a 7-Figure Amazon FBA Business You Can Run 100% From Home and Build Your Dream Life! by ASM
Books - Data Science
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
Version: Français
Comments ( 28 )
First, I LOVE your site – it is incredibly informative and easy to follow. Thanks!!
I am trying to calculate the simple mean error as in the 2-way anova above, though I would like to do it for many variables at once, so I am trying to use the “map2” function of the purrr package. I have been unsuccessful and cannot figure out how to make this work.
Here is my data: “` r df # A tibble: 6 x 15 #> id edge trt nl lm md c mgg mgcm p sp ap #> #> 1 1 S C 1.80 -1.13 1.75 -0.303 2.94 1.02 1.60 1166. 1.10 #> 2 2 S T NA NA NA NA NA NA NA NA NA #> 3 3 D C 1.60 -1.55 1.17 -0.341 1.85 0.787 1.41 663. 0.899 #> 4 4 D T 1.34 -2.22 0.962 -0.332 1.65 0.750 0.674 63.6 0.543 #> 5 5 S C 1.80 -0.165 2.16 -0.285 3.14 1.09 2.21 496. 1.16 #> 6 6 S T 2.14 0.250 2.44 -0.219 3.36 1.16 2.31 911. 1.20 #> # … with 3 more variables: la , lacm , lacmd “`
I’ve successfully run all the two-way anovas I need: “` r models_1 % group_by(edge) %>% map2(ind.trans[4:15], models_1, ~ anova_test(.x, error = .y, type = 3)) #> Error in df %>% group_by(edge) %>% map2(ind.trans[4:15], models_1, ~anova_test(.x, : could not find function “%>%” “`
I receive the following error, even though the number of columns I’m calling and the number of models match: Error: Mapped vectors must have consistent lengths: * `.x` has length 15 * `.y` has length 12
Thanks in advance! Created on 2020-05-19 by the [reprex package]( https://reprex.tidyverse.org ) (v0.3.0)
Would you please provide a reproducible example as described at How to Include Reproducible R Script Examples in Datanovia Comments . You will need the pubr R package, which can be installed using devtools::install_github(“kassambara/pubr”)
Hi, About the post hoc test, I would like to know what is the difference between applying a pairwise.t.test and TukeyHSD test if I got a significant result from a Two Way ANOVA?
Thank you! It was very-very helpful!
I appreciate your positive feedback, thank you!
Hi, about your simple main effect and simple two-way interaction, you noted that it needs Bonferroni-adjusted p-value, yet I had noticed that many people do not do the adjustment. Does it necessary?
I like your site. It’s helpful! Thanks. I am doing a three-way ANOVA, and I found that although you noted that it needs a Bonferroni-adjusted alpha level for simple two-way interaction and simple simple main effect in three-way ANOVA, many articles did not do the adjustment. I wonder if it is necessary to do the adjustment, or it’s ok if people don’t do it?
Hello, I have trouble adding the p-value of my post-Hoch test into a box plot. this message appears after running the code of the plot Warning message: Computation failed in `stat_bracket()`: Internal error in `vec_proxy_assign_opts()`: `proxy` of type `double` incompatible with `value` proxy of type `integer`.
this is my code:
hope you can help me 🙂
Hi, you have some errors in your code. It should look like as follow:
Thank you for your help; however, the message is still appearing with your code.
Warning message: Computation failed in `stat_bracket()`: Internal error in `vec_proxy_assign_opts()`: `proxy` of type `double` incompatible with `value` proxy of type `integer`.
Does this error appears, when using the demo data in this tutorial? Would you please send your data, so that I can check?
Yes, it also happens with the demo data tutorial, so something must be wrong with my R. Should I download it again?
Probably the problem is with the operating system. I received the same message on a 64 bit system but the calculations were correct on a 32 bit system. Please check it at your place
Great that it works! The issue should be something related with your R sessions.
First of all, thanks for the explanation. It’s very clear and easy to follow. I am concerned about the normality of the residuals assumption. I’m working with a 3×2 factorial design, with repeated measures in one of the two factors. In each of the six combinations, I have 8 -10 measures.
Do I have to check normality for each combination of factors or is it just one test with all the residuals?
Thanks very much.
Hello, thank you so much for this.
I have a simple question. What does “%>%” mean?
In R, %>% is the pipe operator. It takes the output of one statement and makes it the input of the next statement. When describing it, you can think of it as a “THEN”.
Take, for example, the following code chunk:
The code chunk above will translate to something like “you take the Iris data, then you subset the data and then you print the data”.
I can’t get the following code to work: headache %>% group_by(gender, risk, treatment) %>% shapiro_test(pain_score)
When I do this with my data I get an error: Error: Problem with `mutate()` input `data`. x Problem with `mutate()` input `data`. x sample size must be between 3 and 5000 ℹ Input `data` is `map(.data$data, .f, …)`. ℹ Input `data` is `map(.data$data, .f, …)`.
However, I can run the following format: PlantGrowth %>% group_by(group) %>% shapiro_test(weight)
It appears that it is the group_by function that is failing. When I use either of my variables in the group_by() the shapiro_test() works. However, put together it always fails.
Also, I could imagine the work around: running the shapiro_test() two times, one for each variable.
e.g. Something like this twice, changing the group_by() variable each time. PlantGrowth %>% group_by(group) %>% shapiro_test(weight)
Is that appropriate or will that ignore the influence of one variable on each test?
I also notice my other tibbles are shorter and appear to be only comparing the levels in one variable instead of two levels in both variables. Probably a related problem.
Dear Alboukadel Kassambara, Thanks a lot for your great job, this tutorial about ANOVA is the best i ever seen, and i can’t stop using your R packages.
I have a question for you or anybody in this kind community : In my data set, i’m doing a 3-way anova, and then pairwise comparison, with bonferroni correction. My concern is about the N chosen by the algorithm to compute the correction. I have a lot of subset (4x3x5), and even if my df is about 100 rows, I think that my pwc (and so my bonferroni correction) should be computed only in each subset (about 5 rows) indepandantly. PWC are OK because of the use of the “group_by” function, but the BF corr. use the overall N instead of the N of the subset (leading to adjusted p-values divided by 100 in spite of 4-5). Am i wrong in my interpretation of the bonferroni correction ? If the idea seems right, does anybody have an idea on how to resolve that ? (i mean, not strictly manually dividing myself all p-values, recreating p.signif symbols, etc). Thanks a lot to everybody ! Gaetan.
Dear Mr. Kassambara,
thank you so much for this amazing tutorial! It is the best I ever worked through, made so many things much clearer for me now and really helped me out with my thesis about animal movement data.
Kind regards
Thank you for your detailed descriptions. I have a question about how to switch the comparisons. For example, in the two-way ANOVA example, how would you write the model to compare gender at each education_level? Instead of comparing education_level of each gender as in the example.
Hi, thanks for the example. I am trying a 3-way anova. But experiencing with the codes. This command is gicving me an error, and this not able to generate the anova table
treatment.effect % group_by(gender, risk) %>% anova_test(pain_score ~ treatment, error = model) treatment.effect %>% filter(gender == “male”)
Error: Input must be a vector, not a object.
Backtrace: 1. treatment.effect %>% filter(gender == “male”) 11. vctrs:::stop_scalar_type(…) 12. vctrs:::stop_vctrs(msg, “vctrs_error_scalar_type”, actual = x)
Does anyone know how I can fix that?
Hi, thanks for a very helpful article. I have a question.
When I run the code for three-way anova, the simple-simple main effects with this script: treatment.effect % group_by(gender, risk) %>% anova_test(pain_score ~ treatment, error = model) treatment.effect %>% filter(gender == “male”)
It always shows this error notification Error: Input must be a vector, not a object.
Do you know how to fix it?
Hello, Everything worked so fine until the end with the final plot. The simple bxp is the same as showed but when I add the:
The simple bxp plots males first then female but after adding the stat_pvalue_manual() and labs() is inverted and pvalue lines are misplaced.
Can anyone explain why I get this error message when I run emmeans-test ; Error in contrast.emmGrid(res.emmeans, by = grouping.vars, method = method, : Nonconforming number of contrast coefficients
Do we not need to check if the data is balanced?
Dear Alboukadel Kassambara, I have a question, as we do Levene test when the assumption of normality doesn’t meet but in two way ANOVA the normality assumption is getting met in your results, if am not wrong you should have done the Bartlett test and according to the Bartlett p-value comes 0.02 which means it’s violating the assumption of homogeneity but Levene shows 0.06 value which means it meets the assumption, I’m a little bit confused why did you use Levene instead of Bartlett please clarify it. Thanks
Give a comment Cancel reply
Course curriculum.
- ANOVA in R 25 mins
- Repeated Measures ANOVA in R 25 mins
- Mixed ANOVA in R 25 mins
- ANCOVA in R 25 mins
- One-Way MANOVA in R 20 mins
- Kruskal-Wallis Test in R 15 mins
- Friedman Test in R 15 mins

Alboukadel Kassambara
Role : founder of datanovia.
- Website : https://www.datanovia.com/en
- Experience : >10 years
- Specialist in : Bioinformatics and Cancer Biology
- One-Way ANOVA Test in R
- Printable version
What is one-way ANOVA test?
Assumptions of anova test, how one-way anova test works, import your data into r, check your data, visualize your data, compute one-way anova test, interpret the result of one-way anova tests, tukey multiple pairwise-comparisons, multiple comparisons using multcomp package, pairewise t-test, check the homogeneity of variance assumption, relaxing the homogeneity of variance assumption, check the normality assumption, non-parametric alternative to one-way anova test.
ANOVA test hypotheses :
- Null hypothesis: the means of the different groups are the same
- Alternative hypothesis: At least one sample mean is not equal to the others.
Note that, if you have only two groups, you can use t-test . In this case the F-test and the t-test are equivalent.
Related Book:
Here we describe the requirement for ANOVA test . ANOVA test can be applied only when:
- The observations are obtained independently and randomly from the population defined by the factor levels
- The data of each factor level are normally distributed.
- These normal populations have a common variance. ( Levene’s test can be used to check this.)
Assume that we have 3 groups (A, B, C) to compare:
- Compute the common variance, which is called variance within samples ( \(S^2_{within}\) ) or residual variance .
- Compute the mean of each group
- Compute the variance between sample means ( \(S^2_{between}\) )
- Produce F-statistic as the ratio of \(S^2_{between}/S^2_{within}\) .
Note that, a lower ratio (ratio
Visualize your data and compute one-way ANOVA in R
Prepare your data as specified here: Best practices for preparing your data set for R
Save your data in an external .txt tab or .csv files
Import your data into R as follow:
Here, we’ll use the built-in R data set named PlantGrowth . It contains the weight of plants obtained under a control and two different treatment conditions.
To have an idea of what the data look like, we use the the function sample_n ()[in dplyr package]. The sample_n () function randomly picks a few of the observations in the data frame to print out:
In R terminology, the column “group” is called factor and the different categories (“ctr”, “trt1”, “trt2”) are named factor levels. The levels are ordered alphabetically .
If the levels are not automatically in the correct order, re-order them as follow:
It’s possible to compute summary statistics (mean and sd) by groups using the dplyr package.
- Compute summary statistics by groups - count, mean, sd:
To use R base graphs read this: R base graphs . Here, we’ll use the ggpubr R package for an easy ggplot2-based data visualization.
Install the latest version of ggpubr from GitHub as follow (recommended):
- Or, install from CRAN as follow:
- Visualize your data with ggpubr:
One-way ANOVA Test in R
If you still want to use R base graphs, type the following scripts:
We want to know if there is any significant difference between the average weights of plants in the 3 experimental conditions.
The R function aov () can be used to answer to this question. The function summary.aov () is used to summarize the analysis of variance model.
The output includes the columns F value and Pr(>F) corresponding to the p-value of the test.
As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the groups highlighted with “*" in the model summary.
Multiple pairwise-comparison between the means of groups
In one-way ANOVA test, a significant p-value indicates that some of the group means are different, but we don’t know which pairs of groups are different.
It’s possible to perform multiple pairwise-comparison, to determine if the mean difference between specific pairs of group are statistically significant.
As the ANOVA test is significant, we can compute Tukey HSD (Tukey Honest Significant Differences, R function: TukeyHSD ()) for performing multiple pairwise-comparison between the means of groups.
The function TukeyHD () takes the fitted ANOVA as an argument.
- diff : difference between means of the two groups
- lwr , upr : the lower and the upper end point of the confidence interval at 95% (default)
- p adj : p-value after adjustment for the multiple comparisons.
It can be seen from the output, that only the difference between trt2 and trt1 is significant with an adjusted p-value of 0.012.
It’s possible to use the function glht () [in multcomp package] to perform multiple comparison procedures for an ANOVA. glht stands for general linear hypothesis tests. The simplified format is as follow:
- model : a fitted model, for example an object returned by aov ().
- lincft (): a specification of the linear hypotheses to be tested. Multiple comparisons in ANOVA models are specified by objects returned from the function mcp ().
Use glht() to perform multiple pairwise-comparisons for a one-way ANOVA:
The function pairewise.t.test () can be also used to calculate pairwise comparisons between group levels with corrections for multiple testing.
The result is a table of p-values for the pairwise comparisons. Here, the p-values have been adjusted by the Benjamini-Hochberg method.
Check ANOVA assumptions: test validity?
The ANOVA test assumes that, the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots.
The residuals versus fits plot can be used to check the homogeneity of variances.
In the plot below, there is no evident relationships between residuals and fitted values (the mean of each groups), which is good. So, we can assume the homogeneity of variances.
Points 17, 15, 4 are detected as outliers, which can severely affect normality and homogeneity of variance. It can be useful to remove outliers to meet the test assumptions.
It’s also possible to use Bartlett’s test or Levene’s test to check the homogeneity of variances .
We recommend Levene’s test , which is less sensitive to departures from normal distribution. The function leveneTest () [in car package] will be used:
From the output above we can see that the p-value is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different treatment groups.
The classical one-way ANOVA test requires an assumption of equal variances for all groups. In our example, the homogeneity of variance assumption turned out to be fine: the Levene test is not significant.
How do we save our ANOVA test, in a situation where the homogeneity of variance assumption is violated?
An alternative procedure (i.e.: Welch one-way test ), that does not require that assumption have been implemented in the function oneway.test ().
- ANOVA test with no assumption of equal variances
- Pairwise t-tests with no assumption of equal variances
Normality plot of residuals . In the plot below, the quantiles of the residuals are plotted against the quantiles of the normal distribution. A 45-degree reference line is also plotted.
The normal probability plot of residuals is used to check the assumption that the residuals are normally distributed. It should approximately follow a straight line.
As all the points fall approximately along this reference line, we can assume normality.
The conclusion above, is supported by the Shapiro-Wilk test on the ANOVA residuals (W = 0.96, p = 0.6) which finds no indication that normality is violated.
Note that, a non-parametric alternative to one-way ANOVA is Kruskal-Wallis rank sum test , which can be used when ANNOVA assumptions are not met.
- Import your data from a .txt tab file: my_data . Here, we used my_data .
- Visualize your data: ggpubr::ggboxplot(my_data, x = “group”, y = “weight”, color = “group”)
- Compute one-way ANOVA test: summary(aov(weight ~ group, data = my_data))
- Tukey multiple pairwise-comparisons: TukeyHSD(res.aov)
- Two-Way ANOVA Test in R
- MANOVA Test in R: Multivariate Analysis of Variance
- Kruskal-Wallis Test in R (non parametric alternative to one-way ANOVA)
- (Quick-R: ANOVA/MANOVA)[ http://www.statmethods.net/stats/anova.html ]
- (Quick-R: (M)ANOVA Assumptions)[ http://www.statmethods.net/stats/anovaAssumptions.html ]
- (R and Analysis of Variance)[ http://personality-project.org/r/r.guide/r.anova.html
This analysis has been performed using R software (ver. 3.2.4).
Recommended for You!
Recommended for you.
This section contains best data science and self-development resources to help you on your path.
Coursera - Online Courses and Specialization
Data science.
- Course: Machine Learning: Master the Fundamentals by Standford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University
Popular Courses Launched in 2020
- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services
Trending Courses
- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts
Books - Data Science
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
by FeedBurner
Interpret the key results for Crossed Gage R&R Study
In this topic, step 1: use the anova table to identify significant factors and interactions, step 2: assess the variation for each source of measurement error, step 3: examine the graphs for more information on the gage study.
- Part: The variation that is from the parts.
- Operator: The variation that is from the operators.
- Operator*Part: The variation that is from the operator and part interaction. An interaction exists when an operator measures different parts differently.
- Error or repeatability: The variation that is not explained by part, operator, or the operator and part interaction.
If you select the Xbar and R option for Method of Analysis , Minitab does not display the ANOVA table.
If the p-value for the operator and part interaction is 0.05 or higher, Minitab removes the interaction because it is not significant and generates a second ANOVA table without the interaction.
Key Result: P
In these results, the p-value is 0.974, so Minitab generates a second two-way ANOVA table that omits the interaction from the final model.
- Total Gage R&R: The sum of the repeatability and the reproducibility variance components.
- Repeatability: The variability in measurements when the same operator measures the same part multiple times.
- Reproducibility: The variability in measurements when different operators measure the same part.
- Part-to-Part: The variability in measurements due to different parts.
Ideally, very little of the variability should be due to repeatability and reproducibility. Differences between parts (Part-to-Part) should account for most of the variability.
Key Results: VarComp, %Contribution
The %Contribution for part-to-part variation is 93.18%. Minitab divides the part-to-part variance component value, approximately 0.0285, by the total variation, approximately 0.0305, and multiplies by 100%. When the %Contribution from part-to-part variation is high, the measurement system can reliably distinguish between parts.
Key Results: %Study Var
Use the percent study variation (%Study Var) to compare the measurement system variation to the total variation. The %Study Var uses the process variation, as defined by 6 times the process standard deviation. Minitab displays the %Tolerance column when you enter a tolerance value, and Minitab displays the %Process column when you enter a historical standard deviation.
According to AIAG guidelines, if the measurement system variation is less than 10% of the process variation, then the measurement system is acceptable. Because the %Study Var, the %Tolerance, and the %Process are all greater than 10%, the measurement system might need improvement. For more information, go to Is my measurement system acceptable? .

Key Results: Components of Variation graph
The components of variation graph shows the variation from the sources of measurement error. Minitab displays bars for %Tolerance when you enter a tolerance value, and Minitab displays bars for %Process when you enter a historical standard deviation.
This graph shows that part-to-part variability is higher than the variability from repeatability and reproducibility, but the total gage R&R variation is higher than 10% and might be unacceptable.

- Minitab.com
- License Portal
- Stack Overflow Public questions & answers
- Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
- Talent Build your employer brand
- Advertising Reach developers & technologists worldwide
- About the company
Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Quick way to plot an anova
To perform an ANOVA in R I normally follow two steps:
1) I compute the anova summary with the function aov 2) I reorganise the data aggregating subject and condition to visualise the plot
I wonder whether is always neccesary this reorganisation of the data to see the results, or whether it exists a f(x) to plot rapidly the results.
Thanks for your suggestions
- 1 It would help if you provided a reproducible example with sample input data. What exactly are you plotting? – MrFlick May 18, 2017 at 13:32
I think what you mean is to illustrate the result of your test with a figure ? Anova are usually illustrate with boxplot.
You can make boxplot with the implemented function plot or boxplot
Or with ggplot
Hope this helps

- 1 In addition to my answer I think you should have a look at this post r-bloggers.com/one-way-analysis-of-variance-anova – Nico Coallier May 18, 2017 at 13:50
- This is exactly what I was looking for! Tks! – Guillon May 18, 2017 at 14:00
- I have edited the post, so you can see the signifiance level on the figure :) – Nico Coallier May 18, 2017 at 14:20
- Awesome! tks again 4 your answer – Guillon May 19, 2017 at 10:03
Your Answer
Sign up or log in, post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service , privacy policy and cookie policy
Not the answer you're looking for? Browse other questions tagged r aggregate anova or ask your own question .
- The Overflow Blog
- How Intuit democratizes AI development across teams through reusability sponsored post
- The nature of simulating nature: A Q&A with IBM Quantum researcher Dr. Jamie...
- Featured on Meta
- We've added a "Necessary cookies only" option to the cookie consent popup
- Launching the CI/CD and R Collectives and community editing features for...
- The [amazon] tag is being burninated
- Temporary policy: ChatGPT is banned
- Staging Ground Beta 1 Recap, and Reviewers needed for Beta 2
Hot Network Questions
- Why are all monasteries human?
- Is it bad that your characters don't have distinct voices or mannerisms?
- What is "ぷれせんとふぉーゆーさん" exactly referring to?
- Should I put my dog down to help the homeless?
- Can competent brass players play large leaps?
- How can I measure the power in mW of a radio signal?
- Problem with rowcolors and @{} in tabular environment
- Was Kip's defiance relevant to the Galactics' decision?
- Should sticker on top of HDD be peeled?
- Why isn't light switch turning off the outlet?
- Pixel 5 vibrates once when turned face down
- Displaying label if field contains 'X' or 'Y' value in QGIS
- Are you saving 'against' an effect if that effect applies when you successfully save?
- What video game is Charlie playing in Poker Face S01E07?
- A plastic tab/tag stuck out of the GE dryer drum gap. Does anyone know what it is, if it is a part of the dryer, and if so how I can fix it?
- Resistance depending on voltage - the chicken and the egg?
- Why did Ukraine abstain from the UNHRC vote on China?
- A Swiss watch company seized my watch, saying it was stolen. I bought it 10 years ago, is that legal?
- Forced to pay a customs fee for importing a used wedding dress into the Netherlands. Is there a way to avoid paying?
- I need to identify this connector from inside a hot tub control panel
- Why do academics stay as adjuncts for years rather than move around?
- How or would these mechanical wings work?
- Stationary vs measurable limits for large cardinals
- Does Hooke's Law apply to all springs?
Your privacy
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy .

Before You Go
How To Report One Way Anova Apa Owl?
30 Second Answer To report an ANOVA one-way, you will need to include a description of each dependent and independent variable, the F-value, and the corresponding p values. The following is the structure used to report the results of a one-way ANOVA: Description of each dependent and independent variable: The F-value and corresponding p values of the ANOVA: Explanation: Context with examples: Bullet points: Final thoughts:
Table of Contents
How do I report degrees of freedom in ANOVA APA?
In APA format, degrees of freedom for ANOVA tests are reported as the t-test, except that there are 2 degrees of freedom numbers; first the freedom between the groups, and then the freedom within the groups of Page 3 PY602 R. Guadagno spring 2010 (separated with a comma).
The American Psychological Association (APA) style guide is very specific about how to report results from ANOVA tests. In general, you should report the degrees of freedom for both the between-groups comparisons and the within-group comparisons.
For example, if you were testing the effects of three different treatments on a group of participants, you would have two degrees of freedom for the between-groups comparisons (3-1=2) and n-1 degrees of freedom for the within-groups comparisons, where n is the number of participants in each group.
In addition to reporting the degrees of freedom, it is also important to explain what they represent. In the context of ANOVA, degrees of freedom refer to the number of independent ratings that can be made based on the information in the data set.
For example, if you have a data set with 10 values, there are 9 degrees of freedom because there are 9 independent ratings that can be made (i.e., the 10th value can be predicted based on the other 9 values).
When reporting results from an ANOVA test in APA style, it is also important to report which post hoc test was used to make pairwise comparisons between groups. A post hoc test is a statistical test that is used to compare two or more groups after an ANOVA has been conducted.
There are many different post hoc tests that can be used, but the most common are the Tukey HSD and the Bonferroni correction. If you use a different post hoc test, be sure to specify which one you used in your results section.
Here is an example of how to report results from an ANOVA test in APA style:
How do you report F test results in APA?
The F statistic should be reported as F(1,145) = 5.43, p
How do I report Anova results in APA Style?
The F value, also known as the F statistic or p value, is reported in parentheses with the degree of freedom between groups and within groups.
Anova results are typically reported in APA style by including the following information in parentheses: the degree of freedom between groups and within groups, and the F value (also known as the F statistic or p value). For example, if you were reporting the results of an ANOVA on reading ability scores, you might write something like this:
How much does a gallon of milk cost?
A gallon of milk typically costs between $2 and $4.
What are some benefits of meditation?
The below answer has been rewritten using sophisticated vocabulary to form the answer:
When it comes to meditation, there are a plethora of benefits that can be reaped. For example, those who frequently meditate have been shown to have lower levels of anxiety and stress, as well as improved focus and concentration. In addition, meditation has also been found to improve sleep quality and increase overall feelings of well-being.
When it comes to the specifics of how meditation provides these benefits, it is thought that the practice helps to quieten the mind and bring about a state of mental clarity. This allows individuals to focus on the present moment and be more aware of their thoughts and feelings, which in turn leads to improved self-regulation. Additionally, the physiological effects of meditation have also been found to play a role in its stress-reducing effects, with regular practice leading to lowered heart rate and blood pressure.
So, in short, there are many benefits that can be gained from regular meditation. Not only can it improve mental well-being, but it also has physiological benefits that can help to reduce stress levels.
- Trending: What Does Baking Soda Do To Squirrels?
- Trending: Why Is Your Zucchini Plant Stem Splitting?
- Trending: How much does a 8 week old cocker spaniel weight?
- Trending: Why Does My Dog Pee When He Humps?
Tammy Slater
Tammy Slater is the founder of arew.org, a home and garden blog that provides inspiration and resources for homeowners and renters alike. A self-taught DIYer, Tammy loves nothing more than tackling a new project in her own home. When she's not blogging or spending time with her family, you can usually find her rooting around in the garden or at the hardware store.
Recent Posts
How Do You Get Rid Of Hard Calcium Deposits In The Shower?
Over time, showerheads can become clogged with deposits of calcium, magnesium, lime, silica, and other minerals. This mineral buildup can block the showerhead's water flow, preventing it from...
Can you cook mince 1 day out of date?
Can you cook mince 1 day out of date? The expiration date on food is there for a reason. It's not just a random date that the food company picked out of thin air. The expiration date — also labeled...
Terms and Conditions - Privacy Policy

Statistics Made Easy
The Complete Guide: How to Report Regression Results
In statistics, linear regression models are used to quantify the relationship between one or more predictor variables and a response variable .
We can use the following general format to report the results of a simple linear regression model :
Simple linear regression was used to test if [predictor variable] significantly predicted [response variable]. The fitted regression model was: [fitted regression equation] The overall regression was statistically significant (R 2 = [R 2 value], F(df regression, df residual) = [F-value], p = [p-value]). It was found that [predictor variable] significantly predicted [response variable] (β = [β-value], p = [p-value]).
And we can use the following format to report the results of a multiple linear regression model :
Multiple linear regression was used to test if [predictor variable 1], [predictor variable 2], … significantly predicted [response variable]. The fitted regression model was: [fitted regression equation] The overall regression was statistically significant (R 2 = [R 2 value], F(df regression, df residual) = [F-value], p = [p-value]). It was found that [predictor variable 1] significantly predicted [response variable] (β = [β-value], p = [p-value]). It was found that [predictor variable 2] did not significantly predict [response variable] (β = [β-value], p = [p-value]).
The following examples show how to report regression results for both a simple linear regression model and a multiple linear regression model.
Example: Reporting Results of Simple Linear Regression
Suppose a professor would like to use the number of hours studied to predict the exam score that students will receive on a certain exam. He collects data for 20 students and fits a simple linear regression model.
The following screenshot shows the output of the regression model:

Here is how to report the results of the model:
Simple linear regression was used to test if hours studied significantly predicted exam score. The fitted regression model was: Exam score = 67.1617 + 5.2503*(hours studied). The overall regression was statistically significant (R 2 = .73, F(1, 18) = 47.99, p < .000). It was found that hours studied significantly predicted exam score (β = 5.2503, p < .000).
Example: Reporting Results of Multiple Linear Regression
Suppose a professor would like to use the number of hours studied and the number of prep exams taken to predict the exam score that students will receive on a certain exam. He collects data for 20 students and fits a multiple linear regression model.

Multiple linear regression was used to test if hours studied and prep exams taken significantly predicted exam score. The fitted regression model was: Exam Score = 67.67 + 5.56*(hours studied) – 0.60*(prep exams taken) The overall regression was statistically significant (R 2 = 0.73, F(2, 17) = 23.46, p = < .000). It was found that hours studied significantly predicted exam score (β = 5.56, p = < .000). It was found that prep exams taken did not significantly predict exam score (β = -0.60, p = 0.52).
Additional Resources
How to Read and Interpret a Regression Table Understanding the Null Hypothesis for Linear Regression Understanding the F-Test of Overall Significance in Regression
Published by Zach
Leave a reply cancel reply.
Your email address will not be published. Required fields are marked *

IMAGES
VIDEO
COMMENTS
Here is how to report the results of the one-way ANOVA: A one-way ANOVA was performed to compare the effect of three different studying techniques on exam scores. A one-way ANOVA revealed that there was a statistically significant difference in mean exam score between at least two groups (F (2, 27) = [4.545], p = 0.02).
Getting started in R Step 1: Load the data into R Step 2: Perform the ANOVA test Step 3: Find the best-fit model Step 4: Check for homoscedasticity Step 5: Do a post-hoc test Step 6: Plot the results in a graph Step 7: Report the results Frequently asked questions about ANOVA Getting started in R
A one-way ANOVA is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups. This tutorial provides a complete guide on how to interpret the results of a one-way ANOVA in R. Step 1: Create the Data
Doing and reporting your first ANOVA and ANCOVA in R | by Matthieu Renard | Towards Data Science Write Sign up Sign In 500 Apologies, but something went wrong on our end. Refresh the page, check Medium 's site status, or find something interesting to read. Matthieu Renard 132 Followers Follow More from Medium Anmol Tomar in Geek Culture
Charlotte, it depends on publication norm you use. For example, I use APA norms (American Psychological Association) and they recommend to report anova in this format: F (2, 60914) = 128.54, p < .001. Note that first degrees of freedom are the difference of d.f. between your model and its submodel. - Daniel Dostal Feb 26, 2021 at 9:58
Prepare a standard table for your ANOVA results, including a row for every sample type and columns for samples, sum of the squares, Degrees of Freedom, F values and P values. Start your report with an informal description in plain language. Indicate the type of analysis of variance conducted. Indicate the test conducted, the independent ...
ANOVA (ANalysis Of VAriance) is a statistical test to determine whether two or more population means are different. In other words, it is used to compare two or more groups to see if they are significantly different. In practice, however, the: Student t-test is used to compare 2 groups;
Here are a few things to keep in mind when reporting the results of a two-way ANOVA: 1. Use a descriptive statistics table if necessary. It can be helpful to present a descriptive statistics table that shows the mean and standard deviation of values in each treatment group as well to give the reader a more complete picture of the data. 2.
How to report ANOVA results when # of means tested is large. Ask Question Asked 7 years, 8 months ago. Modified 7 years, 8 months ago. ... Then I want to determine which differences are significant so I run the TukeyHSD test and it report these results. t=TukeyHSD(results, conf.level = 0.95) #p-value<.05 means difference are significant t Tukey ...
How do I report degrees of freedom from ANOVA outputs? I have to report ANOVA results obtain from R. One set of outputs I obtained from a two-way ANOVA analysis is this: Df Sum Sq Mean Sq F value ...
The general syntax to fit a one-way ANOVA model in R is as follows: aov (response variable ~ predictor_variable, data = dataset) In our example, we can use the following code to fit the one-way ANOVA model, using weight_loss as the response variable and program as our predictor variable.
ANOVA in R 25 mins Comparing Multiple Means in R The ANOVA test (or Analysis of Variance) is used to compare the mean of multiple groups. The term ANOVA is a little misleading. Although the name of the technique refers to variances, the main goal of ANOVA is to investigate differences in means.
Visualize your data and compute one-way ANOVA in R Import your data into R Check your data Visualize your data Compute one-way ANOVA test Interpret the result of one-way ANOVA tests Multiple pairwise-comparison between the means of groups Tukey multiple pairwise-comparisons Multiple comparisons using multcomp package Pairewise t-test
Annotated ANOVA output The output you'll want to report for an ANOVA depends on the motivation for running the model (is it the main hypothesis test for your study, or just part of the preliminary descriptive stats?) and the reporting conventions for the journal you intend to submit to.
Step 1: Use the ANOVA table to identify significant factors and interactions Step 2: Assess the variation for each source of measurement error Step 3: Examine the graphs for more information on the gage study Step 1: Use the ANOVA table to identify significant factors and interactions
To perform an ANOVA in R I normally follow two steps: 1) I compute the anova summary with the function aov 2) I reorganise the data aggregating subject and condition to visualise the plot I wonder whether is always neccesary this reorganisation of the data to see the results, or whether it exists a f (x) to plot rapidly the results.
To perform ANOVA and confidence intervals in Six Sigma, you need to collect and organize your data, choose the appropriate test, and use a software tool, such as Minitab, Excel, or R, to perform ...
In order to report the results of an F test in APA style, you will need to first report the freedom between the groups, and then the freedom within the groups (these should be separated by a comma). The F statistic should then be reported, rounded to 2 decimal places, followed by the significance level. For example, if you were testing to see ...
Here is how to report the results of the model: Simple linear regression was used to test if hours studied significantly predicted exam score. The fitted regression model was: Exam score = 67.1617 + 5.2503* (hours studied). The overall regression was statistically significant (R2 = .73, F (1, 18) = 47.99, p < .000).