Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Understanding P values | Definition and Examples

Understanding P-values | Definition and Examples

Published on July 16, 2020 by Rebecca Bevans . Revised on June 22, 2023.

The p value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true.

P values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null hypothesis.

Table of contents

What is a null hypothesis, what exactly is a p value, how do you calculate the p value, p values and statistical significance, reporting p values, caution when using p values, other interesting articles, frequently asked questions about p-values.

All statistical tests have a null hypothesis. For most tests, the null hypothesis is that there is no relationship between your variables of interest or that there is no difference among groups.

For example, in a two-tailed t test , the null hypothesis is that the difference between two groups is zero.

  • Null hypothesis ( H 0 ): there is no difference in longevity between the two groups.
  • Alternative hypothesis ( H A or H 1 ): there is a difference in longevity between the two groups.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The p value , or probability value, tells you how likely it is that your data could have occurred under the null hypothesis. It does this by calculating the likelihood of your test statistic , which is the number calculated by a statistical test using your data.

The p value tells you how often you would expect to see a test statistic as extreme or more extreme than the one calculated by your statistical test if the null hypothesis of that test was true. The p value gets smaller as the test statistic calculated from your data gets further away from the range of test statistics predicted by the null hypothesis.

The p value is a proportion: if your p value is 0.05, that means that 5% of the time you would see a test statistic at least as extreme as the one you found if the null hypothesis was true.

P values are usually automatically calculated by your statistical program (R, SPSS, etc.).

You can also find tables for estimating the p value of your test statistic online. These tables show, based on the test statistic and degrees of freedom (number of observations minus number of independent variables) of your test, how frequently you would expect to see that test statistic under the null hypothesis.

The calculation of the p value depends on the statistical test you are using to test your hypothesis :

  • Different statistical tests have different assumptions and generate different test statistics. You should choose the statistical test that best fits your data and matches the effect or relationship you want to test.
  • The number of independent variables you include in your test changes how large or small the test statistic needs to be to generate the same p value.

No matter what test you use, the p value always describes the same thing: how often you can expect to see a test statistic as extreme or more extreme than the one calculated from your test.

P values are most often used by researchers to say whether a certain pattern they have measured is statistically significant.

Statistical significance is another way of saying that the p value of a statistical test is small enough to reject the null hypothesis of the test.

How small is small enough? The most common threshold is p < 0.05; that is, when you would expect to find a test statistic as extreme as the one calculated by your test only 5% of the time. But the threshold depends on your field of study – some fields prefer thresholds of 0.01, or even 0.001.

The threshold value for determining statistical significance is also known as the alpha value.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

null hypothesis p value 0.05

P values of statistical tests are usually reported in the results section of a research paper , along with the key information needed for readers to put the p values in context – for example, correlation coefficient in a linear regression , or the average difference between treatment groups in a t -test.

P values are often interpreted as your risk of rejecting the null hypothesis of your test when the null hypothesis is actually true.

In reality, the risk of rejecting the null hypothesis is often higher than the p value, especially when looking at a single study or when using small sample sizes. This is because the smaller your frame of reference, the greater the chance that you stumble across a statistically significant pattern completely by accident.

P values are also often interpreted as supporting or refuting the alternative hypothesis. This is not the case. The  p value can only tell you whether or not the null hypothesis is supported. It cannot tell you whether your alternative hypothesis is true, or why.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient
  • Null hypothesis

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

A p -value , or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test .

P -values are usually automatically calculated by the program you use to perform your statistical test. They can also be estimated using p -value tables for the relevant test statistic .

P -values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.

If the test statistic is far from the mean of the null distribution, then the p -value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

No. The p -value only tells you how likely the data you have observed is to have occurred under the null hypothesis .

If the p -value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Understanding P-values | Definition and Examples. Scribbr. Retrieved April 12, 2024, from https://www.scribbr.com/statistics/p-value/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, an easy introduction to statistical significance (with examples), test statistics | definition, interpretation, and examples, what is effect size and why does it matter (examples), what is your plagiarism score.

p-value Calculator

What is p-value, how do i calculate p-value from test statistic, how to interpret p-value, how to use the p-value calculator to find p-value from test statistic, how do i find p-value from z-score, how do i find p-value from t, p-value from chi-square score (χ² score), p-value from f-score.

Welcome to our p-value calculator! You will never again have to wonder how to find the p-value, as here you can determine the one-sided and two-sided p-values from test statistics, following all the most popular distributions: normal, t-Student, chi-squared, and Snedecor's F.

P-values appear all over science, yet many people find the concept a bit intimidating. Don't worry – in this article, we will explain not only what the p-value is but also how to interpret p-values correctly . Have you ever been curious about how to calculate the p-value by hand? We provide you with all the necessary formulae as well!

🙋 If you want to revise some basics from statistics, our normal distribution calculator is an excellent place to start.

Formally, the p-value is the probability that the test statistic will produce values at least as extreme as the value it produced for your sample . It is crucial to remember that this probability is calculated under the assumption that the null hypothesis H 0 is true !

More intuitively, p-value answers the question:

Assuming that I live in a world where the null hypothesis holds, how probable is it that, for another sample, the test I'm performing will generate a value at least as extreme as the one I observed for the sample I already have?

It is the alternative hypothesis that determines what "extreme" actually means , so the p-value depends on the alternative hypothesis that you state: left-tailed, right-tailed, or two-tailed. In the formulas below, S stands for a test statistic, x for the value it produced for a given sample, and Pr(event | H 0 ) is the probability of an event, calculated under the assumption that H 0 is true:

Left-tailed test: p-value = Pr(S ≤ x | H 0 )

Right-tailed test: p-value = Pr(S ≥ x | H 0 )

Two-tailed test:

p-value = 2 × min{Pr(S ≤ x | H 0 ), Pr(S ≥ x | H 0 )}

(By min{a,b} , we denote the smaller number out of a and b .)

If the distribution of the test statistic under H 0 is symmetric about 0 , then: p-value = 2 × Pr(S ≥ |x| | H 0 )

or, equivalently: p-value = 2 × Pr(S ≤ -|x| | H 0 )

As a picture is worth a thousand words, let us illustrate these definitions. Here, we use the fact that the probability can be neatly depicted as the area under the density curve for a given distribution. We give two sets of pictures: one for a symmetric distribution and the other for a skewed (non-symmetric) distribution.

  • Symmetric case: normal distribution:

p-values for symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

  • Non-symmetric case: chi-squared distribution:

p-values for non-symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

In the last picture (two-tailed p-value for skewed distribution), the area of the left-hand side is equal to the area of the right-hand side.

To determine the p-value, you need to know the distribution of your test statistic under the assumption that the null hypothesis is true . Then, with the help of the cumulative distribution function ( cdf ) of this distribution, we can express the probability of the test statistics being at least as extreme as its value x for the sample:

Left-tailed test:

p-value = cdf(x) .

Right-tailed test:

p-value = 1 - cdf(x) .

p-value = 2 × min{cdf(x) , 1 - cdf(x)} .

If the distribution of the test statistic under H 0 is symmetric about 0 , then a two-sided p-value can be simplified to p-value = 2 × cdf(-|x|) , or, equivalently, as p-value = 2 - 2 × cdf(|x|) .

The probability distributions that are most widespread in hypothesis testing tend to have complicated cdf formulae, and finding the p-value by hand may not be possible. You'll likely need to resort to a computer or to a statistical table, where people have gathered approximate cdf values.

Well, you now know how to calculate the p-value, but… why do you need to calculate this number in the first place? In hypothesis testing, the p-value approach is an alternative to the critical value approach . Recall that the latter requires researchers to pre-set the significance level, α, which is the probability of rejecting the null hypothesis when it is true (so of type I error ). Once you have your p-value, you just need to compare it with any given α to quickly decide whether or not to reject the null hypothesis at that significance level, α. For details, check the next section, where we explain how to interpret p-values.

As we have mentioned above, the p-value is the answer to the following question:

What does that mean for you? Well, you've got two options:

  • A high p-value means that your data is highly compatible with the null hypothesis; and
  • A small p-value provides evidence against the null hypothesis , as it means that your result would be very improbable if the null hypothesis were true.

However, it may happen that the null hypothesis is true, but your sample is highly unusual! For example, imagine we studied the effect of a new drug and got a p-value of 0.03 . This means that in 3% of similar studies, random chance alone would still be able to produce the value of the test statistic that we obtained, or a value even more extreme, even if the drug had no effect at all!

The question "what is p-value" can also be answered as follows: p-value is the smallest level of significance at which the null hypothesis would be rejected. So, if you now want to make a decision on the null hypothesis at some significance level α , just compare your p-value with α :

  • If p-value ≤ α , then you reject the null hypothesis and accept the alternative hypothesis; and
  • If p-value ≥ α , then you don't have enough evidence to reject the null hypothesis.

Obviously, the fate of the null hypothesis depends on α . For instance, if the p-value was 0.03 , we would reject the null hypothesis at a significance level of 0.05 , but not at a level of 0.01 . That's why the significance level should be stated in advance and not adapted conveniently after the p-value has been established! A significance level of 0.05 is the most common value, but there's nothing magical about it. Here, you can see what too strong a faith in the 0.05 threshold can lead to. It's always best to report the p-value, and allow the reader to make their own conclusions.

Also, bear in mind that subject area expertise (and common reason) is crucial. Otherwise, mindlessly applying statistical principles, you can easily arrive at statistically significant, despite the conclusion being 100% untrue.

As our p-value calculator is here at your service, you no longer need to wonder how to find p-value from all those complicated test statistics! Here are the steps you need to follow:

Pick the alternative hypothesis : two-tailed, right-tailed, or left-tailed.

Tell us the distribution of your test statistic under the null hypothesis: is it N(0,1), t-Student, chi-squared, or Snedecor's F? If you are unsure, check the sections below, as they are devoted to these distributions.

If needed, specify the degrees of freedom of the test statistic's distribution.

Enter the value of test statistic computed for your data sample.

Our calculator determines the p-value from the test statistic and provides the decision to be made about the null hypothesis. The standard significance level is 0.05 by default.

Go to the advanced mode if you need to increase the precision with which the calculations are performed or change the significance level .

In terms of the cumulative distribution function (cdf) of the standard normal distribution, which is traditionally denoted by Φ , the p-value is given by the following formulae:

Left-tailed z-test:

p-value = Φ(Z score )

Right-tailed z-test:

p-value = 1 - Φ(Z score )

Two-tailed z-test:

p-value = 2 × Φ(−|Z score |)

p-value = 2 - 2 × Φ(|Z score |)

🙋 To learn more about Z-tests, head to Omni's Z-test calculator .

We use the Z-score if the test statistic approximately follows the standard normal distribution N(0,1) . Thanks to the central limit theorem, you can count on the approximation if you have a large sample (say at least 50 data points) and treat your distribution as normal.

A Z-test most often refers to testing the population mean , or the difference between two population means, in particular between two proportions. You can also find Z-tests in maximum likelihood estimations.

The p-value from the t-score is given by the following formulae, in which cdf t,d stands for the cumulative distribution function of the t-Student distribution with d degrees of freedom:

Left-tailed t-test:

p-value = cdf t,d (t score )

Right-tailed t-test:

p-value = 1 - cdf t,d (t score )

Two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

p-value = 2 - 2 × cdf t,d (|t score |)

Use the t-score option if your test statistic follows the t-Student distribution . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails – the exact shape depends on the parameter called the degrees of freedom . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from the normal distribution N(0,1).

The most common t-tests are those for population means with an unknown population standard deviation, or for the difference between means of two populations , with either equal or unequal yet unknown population standard deviations. There's also a t-test for paired (dependent) samples .

🙋 To get more insights into t-statistics, we recommend using our t-test calculator .

Use the χ²-score option when performing a test in which the test statistic follows the χ²-distribution .

This distribution arises if, for example, you take the sum of squared variables, each following the normal distribution N(0,1). Remember to check the number of degrees of freedom of the χ²-distribution of your test statistic!

How to find the p-value from chi-square-score ? You can do it with the help of the following formulae, in which cdf χ²,d denotes the cumulative distribution function of the χ²-distribution with d degrees of freedom:

Left-tailed χ²-test:

p-value = cdf χ²,d (χ² score )

Right-tailed χ²-test:

p-value = 1 - cdf χ²,d (χ² score )

Remember that χ²-tests for goodness-of-fit and independence are right-tailed tests! (see below)

Two-tailed χ²-test:

p-value = 2 × min{cdf χ²,d (χ² score ), 1 - cdf χ²,d (χ² score )}

(By min{a,b} , we denote the smaller of the numbers a and b .)

The most popular tests which lead to a χ²-score are the following:

Testing whether the variance of normally distributed data has some pre-determined value. In this case, the test statistic has the χ²-distribution with n - 1 degrees of freedom, where n is the sample size. This can be a one-tailed or two-tailed test .

Goodness-of-fit test checks whether the empirical (sample) distribution agrees with some expected probability distribution. In this case, the test statistic follows the χ²-distribution with k - 1 degrees of freedom, where k is the number of classes into which the sample is divided. This is a right-tailed test .

Independence test is used to determine if there is a statistically significant relationship between two variables. In this case, its test statistic is based on the contingency table and follows the χ²-distribution with (r - 1)(c - 1) degrees of freedom, where r is the number of rows, and c is the number of columns in this contingency table. This also is a right-tailed test .

Finally, the F-score option should be used when you perform a test in which the test statistic follows the F-distribution , also known as the Fisher–Snedecor distribution. The exact shape of an F-distribution depends on two degrees of freedom .

To see where those degrees of freedom come from, consider the independent random variables X and Y , which both follow the χ²-distributions with d 1 and d 2 degrees of freedom, respectively. In that case, the ratio (X/d 1 )/(Y/d 2 ) follows the F-distribution, with (d 1 , d 2 ) -degrees of freedom. For this reason, the two parameters d 1 and d 2 are also called the numerator and denominator degrees of freedom .

The p-value from F-score is given by the following formulae, where we let cdf F,d1,d2 denote the cumulative distribution function of the F-distribution, with (d 1 , d 2 ) -degrees of freedom:

Left-tailed F-test:

p-value = cdf F,d1,d2 (F score )

Right-tailed F-test:

p-value = 1 - cdf F,d1,d2 (F score )

Two-tailed F-test:

p-value = 2 × min{cdf F,d1,d2 (F score ), 1 - cdf F,d1,d2 (F score )}

Below we list the most important tests that produce F-scores. All of them are right-tailed tests .

A test for the equality of variances in two normally distributed populations . Its test statistic follows the F-distribution with (n - 1, m - 1) -degrees of freedom, where n and m are the respective sample sizes.

ANOVA is used to test the equality of means in three or more groups that come from normally distributed populations with equal variances. We arrive at the F-distribution with (k - 1, n - k) -degrees of freedom, where k is the number of groups, and n is the total sample size (in all groups together).

A test for overall significance of regression analysis . The test statistic has an F-distribution with (k - 1, n - k) -degrees of freedom, where n is the sample size, and k is the number of variables (including the intercept).

With the presence of the linear relationship having been established in your data sample with the above test, you can calculate the coefficient of determination, R 2 , which indicates the strength of this relationship . You can do it by hand or use our coefficient of determination calculator .

A test to compare two nested regression models . The test statistic follows the F-distribution with (k 2 - k 1 , n - k 2 ) -degrees of freedom, where k 1 and k 2 are the numbers of variables in the smaller and bigger models, respectively, and n is the sample size.

You may notice that the F-test of an overall significance is a particular form of the F-test for comparing two nested models: it tests whether our model does significantly better than the model with no predictors (i.e., the intercept-only model).

Can p-value be negative?

No, the p-value cannot be negative. This is because probabilities cannot be negative, and the p-value is the probability of the test statistic satisfying certain conditions.

What does a high p-value mean?

A high p-value means that under the null hypothesis, there's a high probability that for another sample, the test statistic will generate a value at least as extreme as the one observed in the sample you already have. A high p-value doesn't allow you to reject the null hypothesis.

What does a low p-value mean?

A low p-value means that under the null hypothesis, there's little probability that for another sample, the test statistic will generate a value at least as extreme as the one observed for the sample you already have. A low p-value is evidence in favor of the alternative hypothesis – it allows you to reject the null hypothesis.

Absolute uncertainty

Plant spacing, relative error.

  • Biology (100)
  • Chemistry (100)
  • Construction (144)
  • Conversion (294)
  • Ecology (30)
  • Everyday life (262)
  • Finance (569)
  • Health (440)
  • Physics (509)
  • Sports (104)
  • Statistics (182)
  • Other (181)
  • Discover Omni (40)

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

AP®︎/College Statistics

Course: ap®︎/college statistics   >   unit 10.

  • Idea behind hypothesis testing
  • Examples of null and alternative hypotheses
  • Writing null and alternative hypotheses

P-values and significance tests

  • Comparing P-values to different significance levels
  • Estimating a P-value from a simulation
  • Estimating P-values from simulations
  • Using P-values to make conclusions

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Great Answer

Video transcript

Statology

Statistics Made Easy

How to Interpret a P-Value Greater Than 0.05 (With Examples)

A hypothesis test is used to test whether or not some hypothesis about a population parameter is true.

Whenever we perform a hypothesis test, we always define a null and alternative hypothesis:

  • Null Hypothesis (H 0 ): The sample data occurs purely from chance.
  • Alternative Hypothesis (H A ): The sample data is influenced by some non-random cause.

When performing a hypothesis test, we must specify the significance level to use.

Common choices for a significance level include:

If the p-value of the hypothesis test is less than the specified significance level, then we can reject the null hypothesis and conclude that we have sufficient evidence to say that the alternative hypothesis is true.

If the p-value is not less than the specified significance level, then we fail to reject the null hypothesis and conclude that we do not have sufficient evidence to say that the alternative hypothesis is true.

The following examples explain how to interpret a p-value greater than .05 in practice.

Example 1: Interpret P-Value Greater Than 0.05 (Biology)

Suppose a biologist believes that a certain fertilizer will cause plants to grow more during a one-year period than they normally do, which is currently 20 inches.

To test this, she applies the fertilizer to each of the plants in her laboratory for three months.

She then performs a hypothesis test using the following hypotheses:

The null hypothesis (H 0 ): μ = 20 inches (the fertilizer will have no effect on the mean plant growth)

The alternative hypothesis: (H A ): μ > 20 inches (the fertilizer will cause mean plant growth to increase)

Upon conducting a hypothesis test for a mean using a significance level of α = .05, the biologist receives a p-value of 0.2338 .

Since the p-value of 0.2338 is greater than the significance level of 0.05 , the biologist fails to reject the null hypothesis.

Thus, she concludes that there is not sufficient evidence to say that the fertilizer leads to increased plant growth.

Example 2: Interpret P-Value Greater Than 0.05 (Manufacturing)

A mechanical engineer believes that a new production process will reduce the number of faulty widgets produced at a certain factory, which is currently 3 faulty widgets per batch.

To test this, he uses the new process to produce a new batch of widgets.

He then performs a hypothesis test using the following hypotheses:

The null hypothesis (H 0 ): μ = 3 (the new process will have no effect on the mean number of faulty widgets per batch)

The alternative hypothesis: (H A ): μ < 3 (the new process will cause a reduction in the mean number of faulty widgets per batch)

The engineer performs a hypothesis test for a mean using a significance level of α = .05 and receives a p-value of 0.134 .

Since the p-value of 0.134  is greater than the significance level of 0.05 , the engineer fails to reject the null hypothesis.

Thus, he concludes that there is not sufficient evidence to say that the new process leads to a reduction in the mean number of faulty widgets produced in each batch.

Additional Resources

The following tutorials provide additional information about p-values:

An Explanation of P-Values and Statistical Significance Statistical vs. Practical Significance P-Value vs. Alpha: What’s the Difference?

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

9.3 - the p-value approach, example 9-4 section  .

x-ray of someone with lung cancer

Up until now, we have used the critical region approach in conducting our hypothesis tests. Now, let's take a look at an example in which we use what is called the P -value approach .

Among patients with lung cancer, usually, 90% or more die within three years. As a result of new forms of treatment, it is felt that this rate has been reduced. In a recent study of n = 150 lung cancer patients, y = 128 died within three years. Is there sufficient evidence at the \(\alpha = 0.05\) level, say, to conclude that the death rate due to lung cancer has been reduced?

The sample proportion is:

\(\hat{p}=\dfrac{128}{150}=0.853\)

The null and alternative hypotheses are:

\(H_0 \colon p = 0.90\) and \(H_A \colon p < 0.90\)

The test statistic is, therefore:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}=\dfrac{0.853-0.90}{\sqrt{\dfrac{0.90(0.10)}{150}}}=-1.92\)

And, the rejection region is:

Since the test statistic Z = −1.92 < −1.645, we reject the null hypothesis. There is sufficient evidence at the \(\alpha = 0.05\) level to conclude that the rate has been reduced.

Example 9-4 (continued) Section  

What if we set the significance level \(\alpha\) = P (Type I Error) to 0.01? Is there still sufficient evidence to conclude that the death rate due to lung cancer has been reduced?

In this case, with \(\alpha = 0.01\), the rejection region is Z ≤ −2.33. That is, we reject if the test statistic falls in the rejection region defined by Z ≤ −2.33:

Because the test statistic Z = −1.92 > −2.33, we do not reject the null hypothesis. There is insufficient evidence at the \(\alpha = 0.01\) level to conclude that the rate has been reduced.

threshold

In the first part of this example, we rejected the null hypothesis when \(\alpha = 0.05\). And, in the second part of this example, we failed to reject the null hypothesis when \(\alpha = 0.01\). There must be some level of \(\alpha\), then, in which we cross the threshold from rejecting to not rejecting the null hypothesis. What is the smallest \(\alpha \text{ -level}\) that would still cause us to reject the null hypothesis?

We would, of course, reject any time the critical value was smaller than our test statistic −1.92:

That is, we would reject if the critical value were −1.645, −1.83, and −1.92. But, we wouldn't reject if the critical value were −1.93. The \(\alpha \text{ -level}\) associated with the test statistic −1.92 is called the P -value . It is the smallest \(\alpha \text{ -level}\) that would lead to rejection. In this case, the P -value is:

P ( Z < −1.92) = 0.0274

So far, all of the examples we've considered have involved a one-tailed hypothesis test in which the alternative hypothesis involved either a less than (<) or a greater than (>) sign. What happens if we weren't sure of the direction in which the proportion could deviate from the hypothesized null value? That is, what if the alternative hypothesis involved a not-equal sign (≠)? Let's take a look at an example.

two zebra tails

What if we wanted to perform a " two-tailed " test? That is, what if we wanted to test:

\(H_0 \colon p = 0.90\) versus \(H_A \colon p \ne 0.90\)

at the \(\alpha = 0.05\) level?

Let's first consider the critical value approach . If we allow for the possibility that the sample proportion could either prove to be too large or too small, then we need to specify a threshold value, that is, a critical value, in each tail of the distribution. In this case, we divide the " significance level " \(\alpha\) by 2 to get \(\alpha/2\):

That is, our rejection rule is that we should reject the null hypothesis \(H_0 \text{ if } Z ≥ 1.96\) or we should reject the null hypothesis \(H_0 \text{ if } Z ≤ −1.96\). Alternatively, we can write that we should reject the null hypothesis \(H_0 \text{ if } |Z| ≥ 1.96\). Because our test statistic is −1.92, we just barely fail to reject the null hypothesis, because 1.92 < 1.96. In this case, we would say that there is insufficient evidence at the \(\alpha = 0.05\) level to conclude that the sample proportion differs significantly from 0.90.

Now for the P -value approach . Again, needing to allow for the possibility that the sample proportion is either too large or too small, we multiply the P -value we obtain for the one-tailed test by 2:

That is, the P -value is:

\(P=P(|Z|\geq 1.92)=P(Z>1.92 \text{ or } Z<-1.92)=2 \times 0.0274=0.055\)

Because the P -value 0.055 is (just barely) greater than the significance level \(\alpha = 0.05\), we barely fail to reject the null hypothesis. Again, we would say that there is insufficient evidence at the \(\alpha = 0.05\) level to conclude that the sample proportion differs significantly from 0.90.

Let's close this example by formalizing the definition of a P -value, as well as summarizing the P -value approach to conducting a hypothesis test.

The P -value is the smallest significance level \(\alpha\) that leads us to reject the null hypothesis.

Alternatively (and the way I prefer to think of P -values), the P -value is the probability that we'd observe a more extreme statistic than we did if the null hypothesis were true.

If the P -value is small, that is, if \(P ≤ \alpha\), then we reject the null hypothesis \(H_0\).

Note! Section  

writing hand

By the way, to test \(H_0 \colon p = p_0\), some statisticians will use the test statistic:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}}\)

rather than the one we've been using:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}\)

One advantage of doing so is that the interpretation of the confidence interval — does it contain \(p_0\)? — is always consistent with the hypothesis test decision, as illustrated here:

For the sake of ease, let:

\(se(\hat{p})=\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)

Two-tailed test. In this case, the critical region approach tells us to reject the null hypothesis \(H_0 \colon p = p_0\) against the alternative hypothesis \(H_A \colon p \ne p_0\):

if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \geq z_{\alpha/2}\) or if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \leq -z_{\alpha/2}\)

which is equivalent to rejecting the null hypothesis:

if \(\hat{p}-p_0 \geq z_{\alpha/2}se(\hat{p})\) or if \(\hat{p}-p_0 \leq -z_{\alpha/2}se(\hat{p})\)

if \(p_0 \geq \hat{p}+z_{\alpha/2}se(\hat{p})\) or if \(p_0 \leq \hat{p}-z_{\alpha/2}se(\hat{p})\)

That's the same as saying that we should reject the null hypothesis \(H_0 \text{ if } p_0\) is not in the \(\left(1-\alpha\right)100\%\) confidence interval!

Left-tailed test. In this case, the critical region approach tells us to reject the null hypothesis \(H_0 \colon p = p_0\) against the alternative hypothesis \(H_A \colon p < p_0\):

if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \leq -z_{\alpha}\)

if \(\hat{p}-p_0 \leq -z_{\alpha}se(\hat{p})\)

if \(p_0 \geq \hat{p}+z_{\alpha}se(\hat{p})\)

That's the same as saying that we should reject the null hypothesis \(H_0 \text{ if } p_0\) is not in the upper \(\left(1-\alpha\right)100\%\) confidence interval:

\((0,\hat{p}+z_{\alpha}se(\hat{p}))\)

null hypothesis p value 0.05

Module 7 - Comparing Continuous Outcomes

  •   Page:
  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  

On This Page sidebar

What to Report

Learn More sidebar

A test statistic enables us to determine a p-value, which is the probability (ranging from 0 to 1) of observing sample data as extreme (different) or more extreme if the null hypothesis were true. The smaller the p-value, the more incompatible the data are with the null hypothesis.

A p-value ≤ 0.05 is an arbitrary but commonly used criterion for determining whether an observed difference is "statistically significant" or not. While it does not take into account the possible effects of bias or confounding, a p-value of ≤ 0.05 suggests that there is a 5% probability or less that the observed differences were the result of sampling error (chance). Furthermore, while it does not indicate certainty, it suggests that the null hypothesis is probably not true, so we reject the null hypothesis and accept the alternative hypothesis if the p-value is less than or equal to 0.05. The 0.05 criterion is also called the "alpha level," indicating the probability of incorrectly rejecting the null hypothesis.

A p-value > 0.05 would be interpreted by many as "not statistically significant," meaning that there was not sufficiently strong evidence to reject the null hypothesis and conclude that the groups are different. This does not mean that the groups are the same. If the evidence for a difference is weak (not statistically significant), we fail to reject the null, but we never "accept the null ," i.e., we cannot conclude that they are the same – only that there is insufficient evidence to conclude that they are different.

While commonly used, p-values have fallen into some disfavor recently because the 0.05 criterion tends to devolve into a hard and fast rule that distinguishes "significantly different" from "not significantly different."

"A P value of 0.05 does not mean that there is a 95% chance that a given hypothesis is correct. Instead, it signifies that if the null hypothesis is true, and all other assumptions made are valid, there is a 5% chance of obtaining a result at least as extreme as the one observed. And a P value cannot indicate the importance of a finding; for instance, a drug can have a statistically significant effect on patients' blood glucose levels without having a therapeutic effect."

[Monya Baker: Statisticians issue warning over misuse of P values. Nature, March 7,2016]

Consider two studies evaluating the same hypothesis. Both studies find a small difference between the comparison groups, but for one study the p-value =0.06, and the authors conclude that the groups are "not significantly different"; the second study finds p=0.04, and the authors conclude that the groups are significantly different. Which is correct? Perhaps one solution is to simply report the p-value and let the reader come to their own conclusion.

Many researchers and practitioners now prefer confidence intervals, because they focus on the estimated effect size and how precise the estimate is rather than "Is there an effect?"

Also note that the meaning of "significant" depends on the audience. To scientists it mean "statistically significant," i.e., that p ≤ 0.05, but to a lay audience significant means "important."

  • Measure of effect: the magnitude of the difference between the groups, e.g., difference in means, risk ratio, risk difference, odds ratio, etc.
  • P-value: The probability of observing differences this great or greater if the null hypothesis is true.
  • Confidence interval: a measure of the precision of the measure of effect. The confidence interval estimates the range of values compatible with the evidence.

Many public health researchers and practitioners prefer confidence intervals, since p-values give less information and are often interpreted inappropriately. When reporting results one should provide all three of these.

return to top | previous page | next page

Content ©2019. Some Rights Reserved. Date last modified: May 16, 2019. Wayne W. LaMorte, MD, PhD, MPH

Creative Commons license Attribution Non-commercial

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Natl Cancer Inst

Logo of jnci

“ P < 0.05” Might Not Mean What You Think: American Statistical Association Clarifies P Values

In 2011, the U.S. Supreme Court unanimously ruled in Matrixx Initiatives Inc. v. Siracusano that investors could sue a drug company for failing to report adverse drug effects—even though they were not statistically significant.

Describing the case in the April 2, 2011, issue of the Wall Street Journal , Carl Bialik wrote, “A group of mathematicians has been trying for years to have a core statistical concept debunked. Now the Supreme Court might have done it for them.” That conclusion may have been overly optimistic, since misguided use of the P value continued unabated. However, in 2014 concerns about misinterpretation and misuse of P values led the American Statistical Association (ASA) Board to convene a panel of statisticians and experts from a variety of disciplines to draft a policy statement on the use of P values and hypothesis testing. After a year of discussion, ASA published a consensus statement in American Statistician (doi:10.1080/00031305.2016.1154108).

The statement consists of six principles in nontechnical language on the proper interpretation of P values, hypothesis testing, science and policy decision-making, and the necessity for full reporting and transparency of research studies. However, assembling a short, clear statement by such a diverse group took longer and was more contentious than expected. Participants wrote supplementary commentaries, available online with the published statement.

The panel discussed many misconceptions about P values. Test your knowledge: Which of the following is true?

  • P > 0.05 is the probability that the null hypothesis is true.
  • 1 minus the P value is the probability that the alternative hypothesis is true.
  • A statistically significant test result (P ≤ 0.05) means that the test hypothesis is false or should be rejected.
  • A P value greater than 0.05 means that no effect was observed.

If you answered “none of the above,” you may understand this slippery concept better than many researchers. The ASA panel defined the P value as “the probability under a specified statistical model that a statistical summary of the data (for example, the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.”

Why is the exact definition so important? Many authors use statistical software that presumably is based on the correct definition. “It’s very easy for researchers to get papers published and survive based on knowledge of what statistical packages are out there but not necessarily how to avoid the problems that statistical packages can create for you if you don’t understand their appropriate use,” said Barnett S. Kramer, M.D., M.P.H., JNCI ’s former editor in chief and now director of the National Cancer Institute’s Division of Cancer Prevention. (Kramer was not on the ASA panel.)

Part of the problem lies in how people interpret P values. According to the ASA statement, “A conclusion does not immediately become ‘true’ on one side of the divide and ‘false’ on the other.” Valuable information may be lost because researchers may not pursue “insignificant” results. Conversely, small effects with “significant” P values may be biologically or clinically unimportant. At best, such practices may slow scientific progress and waste resources. At worst, they may cause grievous harm when adverse effects go unreported. The Supreme Court case involved the drug Zicam, which caused permanent hearing loss in some users. Another drug, rofecoxib (Vioxx), was taken off the market because of adverse cardiovascular effects. The drug companies involved did not report those adverse effects because of lack of statistical significance in the original drug tests ( Rev. Soc. Econ. 2016;74:83–97; doi:10.1080/00346764.2016.1150730).

ASA panelists encouraged using alternative methods “that emphasize estimation over testing, such as confidence, credibility, or prediction intervals; Bayesian methods; alternative measures of evidence, such as likelihood ratios or Bayes Factors; and other approaches such as decision-theoretic modeling and false discovery rates.” However, any method can be used invalidly. “If success is defined based on passing some magic threshold, biases may continue to exert their influence regardless of whether the threshold is defined by a P value, Bayes factor, false-discovery rate, or anything else,” wrote panelist John Ioannidis, Ph.D., professor of medicine and of health research and policy at Stanford University School of Medicine in Stanford , Calif.

Some panelists argued that the P value per se is not the problem and that it has its proper uses. A P value can sometimes be “more informative than an interval”—such as when “the predictor of interest is a multicategorical variable,” said Clarice Weinberg, Ph.D., who was not on the panel. “While it is true that P values are imperfect measures of the extent of evidence against the null hypothesis, confidence intervals have a host of problems of their own,” said Weinberg, deputy chief of the Biostatistics and Computational Biology Branch and principal investigator of the National Institute of Environmental Health Sciences in Research Triangle Park, N.C.

“If success is defined based on passing some magic threshold, biases may continue to exert their influence regardless of whether the threshold is defined by a P value, Bayes factor, false-discovery rate, or anything else.”

Beyond simple misinterpretation of the P value and the associated loss of information, authors consciously or unconsciously but routinely engage in data dredging (aka fishing, P -hacking) and selective reporting. “Any statistical technique can be misused and it can be manipulated especially after you see the data generated from the study,” Kramer said. “You can fish through a sea of data and find one positive finding and then convince yourself that even before you started your study that would have been the key hypothesis and it has a lot of plausibility to the investigator.”

In response to those practices and concerns about replicability in science, some journals have banned the P value and inferential statistics. Others, such as JNCI , require confidence intervals and effect sizes, which “convey what a P value does not: the magnitude and relative importance of an effect,” wrote panel member Regina Nuzzo, Ph.D., professor of mathematics and computer sciences at Gallaudet University in Washington, D.C. ( Nature 2014;506:150–2).

How can practice improve? Panel members emphasized the need for full reporting and transparency by authors as well as changes in statistics education. In his commentary, Don Berry, Ph.D., professor of biostatistics at the University of Texas M.D. Anderson Cancer Center in Houston, urged researchers to report every aspect of the study. “The specifics of data collection and curation and even your intentions and motivation are critical for inference. What have you not told the statistician? Have you deleted some data points or experimental units, possibly because they seemed to be outliers?” he wrote.

An external file that holds a picture, illustration, etc.
Object name is djw194il1.jpg

Kramer advised researchers to “consult a statistician when writing a grant application rather than after the study is finished; limit the number of hypotheses to be tested to a realistic number that doesn’t increase the false discovery rate; be conservative in interpreting the data; don’t consider P = 0.05 as a magic number; and whenever possible, provide confidence intervals.” He also suggested, “Webinars and symposia on this issue will be useful to clinical scientists and bench researchers because they’re often not trained in these principles.” As the ASA statement concludes, “No single index should substitute for scientific reasoning.”

Icon Partners

  • Quality Improvement
  • Talk To Minitab

What Can You Say When Your P-Value is Greater Than 0.05?

Topics: Hypothesis Testing , Statistics

P-values are frequently misinterpreted, which causes many problems. I won't rehash those problems here since we have the rebutted the concerns over p-values in Part 1 .  But the fact remains that the p-value will continue to be one of the most frequently used tools for deciding if a result is statistically significant. 

While I generally like to believe that people   want to be honest and objective —especially smart people who do research and analyze data that may affect other people's lives — here are 500 pieces of evidence that fly in the face of that belief . 

We'll get back to that in a minute. But first, a quick review...

What's a P-Value, and How Do I Interpret It?

Most of us first encounter p-values when we conduct simple hypothesis tests, although they also are integral to many more sophisticated methods. Let's use Minitab Statistical Software to do a quick review of how they work (if you want to follow along and don't have Minitab, the full package is available free for 30 days ). We're going to compare fuel consumption for two different kinds of furnaces to see if there's a difference between their means. 

Go to File > Open Worksheet , and click the "Look in Minitab Sample Data Folder" button. Open the sample data set named Furnace.mtw , and choose Stat > Basic Statistics > 2 Sample t... from the menu. In the dialog box, enter "BTU.In" for Samples, and enter "Damper" for Sample IDs.

Press OK and Minitab returns the following output, in which I've highlighted the p-value. 

null hypothesis p value 0.05

In the majority of analyses, an alpha of 0.05 is used as the cutoff for significance. If the p-value is less than 0.05, we reject the null hypothesis that there's no difference between the means and conclude that a significant difference does exist. If the p-value is larger than 0.05, we cannot conclude that a significant difference exists. 

That's pretty straightforward, right?  Below 0.05, significant. Over 0.05, not significant. 

Ready for a demo of Minitab Statistical Software? Just ask! 

Talk to Minitab

"Missed It By That Much!"

In the example above, the result is clear: a p-value of 0.7 is so much higher than 0.05 that you can't apply any wishful thinking to the results.  But what if your p-value is really, really close to 0.05?  

Like, what if you had a p-value of 0.06? 

That's not significant. 

Oh. Okay, what about 0.055?

Not significant. 

How about 0.051?

It's still not statistically significant, and data analysts should not try to pretend otherwise.  A p-value is not a negotiation: if p > 0.05, the results are not significant. Period.

So, what should I say when I get a p-value that's higher than 0.05?  

How about saying this? "The results were not statistically significant." If that's what the data tell you, there is nothing wrong with saying so. 

No Matter How Thin You Slice It, It's Still Baloney.

Which brings me back to the blog post  I referenced at the beginning. Do give it a read, but the bottom line is that the author cataloged 500 different ways that contributors to scientific journals have used language to obscure their results (or lack thereof). 

As a student of language, I confess I find the list fascinating...but also upsetting. It's not right : These contributors are educated people who certainly understand A) what a p-value higher than 0.05 signifies, and B) that manipulating words to soften that result is deliberately deceptive. Or, to put it in words that are less soft, it's a damned lie.

Nonetheless, it happens frequently. 

Here are just a few of my favorites of the 500 different ways people have reported results that were not significant, accompanied by the p-values to which these creative interpretations applied:  

  • a certain trend toward significance (p=0.08)
  • approached the borderline of significance (p=0.07)
  • at the margin of statistical significance (p<0.07)
  • close to being statistically significant (p=0.055)
  • fell just short of statistical significance (p=0.12)
  • just very slightly missed the significance level (p=0.086)
  • near-marginal significance (p=0.18)
  • only slightly non-significant (p=0.0738)
  • provisionally significant (p=0.073)

and my very favorite:

  • quasi-significant (p=0.09)

I'm not sure what "quasi-significant" is even supposed to mean, but it  sounds  quasi-important, as long as you don't think about it too hard. But there's still no getting around the fact that a p-value of 0.09 is not a statistically significant result. 

The blogger does not address the question of whether the opposite situation occurs. Do contributors ever write that a p-value of, say, 0.049999 is:

  • quasi-insignificant
  • only slightly significant
  • provisionally insignificant
  • just on the verge of being non-significant
  • at the margin of statistical non-significance

I'll go out on a limb and posit that describing a p-value just under 0.05 in ways that diminish its statistical significance just   doesn't happen . However, downplaying statistical non-significance would appear to be almost endemic. 

That's why I find the above-referenced post so disheartening. It's distressing that you can so easily gather so many examples of bad behavior by data analysts who almost certainly know better .

You would never use language to try to obscure the outcome of your analysis, would you?

minitab-on-facebook

You Might Also Like

  • Trust Center

© 2023 Minitab, LLC. All Rights Reserved.

  • Terms of Use
  • Privacy Policy
  • Cookies Settings

What is The Null Hypothesis & When Do You Reject The Null Hypothesis

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A null hypothesis is a statistical concept suggesting no significant difference or relationship between measured variables. It’s the default assumption unless empirical evidence proves otherwise.

The null hypothesis states no relationship exists between the two variables being studied (i.e., one variable does not affect the other).

The null hypothesis is the statement that a researcher or an investigator wants to disprove.

Testing the null hypothesis can tell you whether your results are due to the effects of manipulating ​ the dependent variable or due to random chance. 

How to Write a Null Hypothesis

Null hypotheses (H0) start as research questions that the investigator rephrases as statements indicating no effect or relationship between the independent and dependent variables.

It is a default position that your research aims to challenge or confirm.

For example, if studying the impact of exercise on weight loss, your null hypothesis might be:

There is no significant difference in weight loss between individuals who exercise daily and those who do not.

Examples of Null Hypotheses

When do we reject the null hypothesis .

We reject the null hypothesis when the data provide strong enough evidence to conclude that it is likely incorrect. This often occurs when the p-value (probability of observing the data given the null hypothesis is true) is below a predetermined significance level.

If the collected data does not meet the expectation of the null hypothesis, a researcher can conclude that the data lacks sufficient evidence to back up the null hypothesis, and thus the null hypothesis is rejected. 

Rejecting the null hypothesis means that a relationship does exist between a set of variables and the effect is statistically significant ( p > 0.05).

If the data collected from the random sample is not statistically significance , then the null hypothesis will be accepted, and the researchers can conclude that there is no relationship between the variables. 

You need to perform a statistical test on your data in order to evaluate how consistent it is with the null hypothesis. A p-value is one statistical measurement used to validate a hypothesis against observed data.

Calculating the p-value is a critical part of null-hypothesis significance testing because it quantifies how strongly the sample data contradicts the null hypothesis.

The level of statistical significance is often expressed as a  p  -value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Usually, a researcher uses a confidence level of 95% or 99% (p-value of 0.05 or 0.01) as general guidelines to decide if you should reject or keep the null.

When your p-value is less than or equal to your significance level, you reject the null hypothesis.

In other words, smaller p-values are taken as stronger evidence against the null hypothesis. Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis.

In this case, the sample data provides insufficient data to conclude that the effect exists in the population.

Because you can never know with complete certainty whether there is an effect in the population, your inferences about a population will sometimes be incorrect.

When you incorrectly reject the null hypothesis, it’s called a type I error. When you incorrectly fail to reject it, it’s called a type II error.

Why Do We Never Accept The Null Hypothesis?

The reason we do not say “accept the null” is because we are always assuming the null hypothesis is true and then conducting a study to see if there is evidence against it. And, even if we don’t find evidence against it, a null hypothesis is not accepted.

A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. 

It is risky to conclude that the null hypothesis is true merely because we did not find evidence to reject it. It is always possible that researchers elsewhere have disproved the null hypothesis, so we cannot accept it as true, but instead, we state that we failed to reject the null. 

One can either reject the null hypothesis, or fail to reject it, but can never accept it.

Why Do We Use The Null Hypothesis?

We can never prove with 100% certainty that a hypothesis is true; We can only collect evidence that supports a theory. However, testing a hypothesis can set the stage for rejecting or accepting this hypothesis within a certain confidence level.

The null hypothesis is useful because it can tell us whether the results of our study are due to random chance or the manipulation of a variable (with a certain level of confidence).

A null hypothesis is rejected if the measured data is significantly unlikely to have occurred and a null hypothesis is accepted if the observed outcome is consistent with the position held by the null hypothesis.

Rejecting the null hypothesis sets the stage for further experimentation to see if a relationship between two variables exists. 

Hypothesis testing is a critical part of the scientific method as it helps decide whether the results of a research study support a particular theory about a given population. Hypothesis testing is a systematic way of backing up researchers’ predictions with statistical analysis.

It helps provide sufficient statistical evidence that either favors or rejects a certain hypothesis about the population parameter. 

Purpose of a Null Hypothesis 

  • The primary purpose of the null hypothesis is to disprove an assumption. 
  • Whether rejected or accepted, the null hypothesis can help further progress a theory in many scientific cases.
  • A null hypothesis can be used to ascertain how consistent the outcomes of multiple studies are.

Do you always need both a Null Hypothesis and an Alternative Hypothesis?

The null (H0) and alternative (Ha or H1) hypotheses are two competing claims that describe the effect of the independent variable on the dependent variable. They are mutually exclusive, which means that only one of the two hypotheses can be true. 

While the null hypothesis states that there is no effect in the population, an alternative hypothesis states that there is statistical significance between two variables. 

The goal of hypothesis testing is to make inferences about a population based on a sample. In order to undertake hypothesis testing, you must express your research hypothesis as a null and alternative hypothesis. Both hypotheses are required to cover every possible outcome of the study. 

What is the difference between a null hypothesis and an alternative hypothesis?

The alternative hypothesis is the complement to the null hypothesis. The null hypothesis states that there is no effect or no relationship between variables, while the alternative hypothesis claims that there is an effect or relationship in the population.

It is the claim that you expect or hope will be true. The null hypothesis and the alternative hypothesis are always mutually exclusive, meaning that only one can be true at a time.

What are some problems with the null hypothesis?

One major problem with the null hypothesis is that researchers typically will assume that accepting the null is a failure of the experiment. However, accepting or rejecting any hypothesis is a positive result. Even if the null is not refuted, the researchers will still learn something new.

Why can a null hypothesis not be accepted?

We can either reject or fail to reject a null hypothesis, but never accept it. If your test fails to detect an effect, this is not proof that the effect doesn’t exist. It just means that your sample did not have enough evidence to conclude that it exists.

We can’t accept a null hypothesis because a lack of evidence does not prove something that does not exist. Instead, we fail to reject it.

Failing to reject the null indicates that the sample did not provide sufficient enough evidence to conclude that an effect exists.

If the p-value is greater than the significance level, then you fail to reject the null hypothesis.

Subscribe or renew today

Every print subscription comes with full digital access

Science News

How the strange idea of ‘statistical significance’ was born.

A mathematical ritual has led researchers astray for decades

illustration of the letter p with a less than symbol and .05 above scientists doing various calculations

Research in social and biomedical science often uses a statistical method known as null hypothesis testing to determine whether results are “statistically significant.” A P value less than 0.05 is considered significant.

Dmi+T/iStock/Getty Images Plus, adapted by E. Otwell

Share this:

By Bruce Bower

August 12, 2021 at 12:00 pm

In the middle of the 20 th century, the field of psychology had a problem. In the wake of the Manhattan Project and in the early days of the space race, the so-called “hard sciences” were producing tangible, highly publicized results. Psychologists and other social scientists looked on enviously. Their results were squishy, and difficult to quantify.

Psychologists in particular wanted a statistical skeleton key to unlock true experimental insights. It was an unrealistic burden to place on statistics, but the longing for a mathematical seal of approval burned hot. So psychology textbook writers and publishers created one, and called it statistical significance .

By calculating just one number from their experimental results, called a P value, researchers could now deem those results “statistically significant.” That was all it took to claim — even if mistakenly — that an interesting and powerful effect had been demonstrated. The idea took off, and soon legions of researchers were reporting statistically significant results.

Science News 100

To celebrate our 100th anniversary, we’re highlighting some of the biggest advances in science over the last century. For more on the history of psychology, visit Century of Science: The science of us .

To make matters worse, psychology journals began to publish papers only if they reported statistically significant findings, prompting a surprisingly large number of investigators to massage their data — either by gaming the system or cheating — to get below the P value of 0.05 that granted that status. Inevitably, bogus findings and chance associations began to proliferate.

As editor of a journal called Memory & Cognition from 1993 to 1997, Geoffrey Loftus of the University of Washington tried valiantly to yank psychologists out of their statistical rut. At the start of his tenure, Loftus published an editorial telling researchers to stop mindlessly calculating whether experimental results are statistically significant or not ( SN: 5/16/13 ). That common practice impeded scientific progress, he warned.

Keep it simple, Loftus advised. Remember that a picture is worth a thousand reckonings of statistical significance. In that spirit, he recommended reporting straightforward averages to compare groups of volunteers in a psychology experiment. Graphs could show whether individuals’ scores covered a broad range or clumped around the average, enabling a calculation of whether the average score would likelychange a little or a lot in a repeat study. In this way, researchers could evaluate, say, whether volunteers scored better on a difficult math test if first allowed to write about their thoughts and feelings for 10 minutes, versus sitting quietly for 10 minutes.

Loftus might as well have tried to lasso a runaway train. Most researchers kept right on touting the statistical significance of their results.

“Significance testing is all about how the world isn’t and says nothing about how the world is,” Loftus later said when looking back on his attempt to change how psychologists do research.

null hypothesis p value 0.05

Sign Up For the Latest from Science News

Headlines and summaries of the latest Science News articles, delivered to your inbox

Thank you for signing up!

There was a problem signing you up.

What’s remarkable is not only that mid-20 th century psychology textbook writers and publishers fabricated significance testing out of a mishmash of conflicting statistical techniques ( SN: 6/7/97 ). It’s also that their weird creation was embraced by many other disciplines over the next few decades. It didn’t matter that eminent statisticians and psychologists panned significance testing from the start. The concocted calculation proved highly popular in social sciences, biomedical and epidemiological research, neuroscience and biological anthropology.

A human hunger for certainty fueled that academic movement. Lacking unifying theories to frame testable predictions, scientists studying the mind and other human-related topics rallied around a statistical routine. Repeating the procedure provided a false but comforting sense of having tapped into the truth. Known formally as null hypothesis significance testing, the practice assumes a null hypothesis (no difference, or no correlation, between experimental groups on measures of interest) and then rejects that hypothesis if the P value for observed data came out to less than 5 percent (P < .05).

A P value is the probability of an observed (or more extreme) result arising only from chance.

illustration of a bell curve with the p value segment shaded in blue

The problem is that slavishly performing this procedure absolves researchers of having to develop theories that make specific, falsifiable predictions — the fundamental elements of good science. Rejecting a null hypothesis doesn’t tell an investigator anything new. It only creates an opportunity to speculate about why an effect might have occurred. Statistically significant results are rarely used as a launching pad for testing alternative explanations of those findings.

Psychologist Gerd Gigerenzer, director of the Harding Risk Literacy Center in Berlin, considers it more accurate to call null hypothesis significance testing “the null ritual.”

Here’s an example of the null ritual in action. A 2012 study published in Science concluded that volunteers’ level of religious belief declined after viewing pictures of Auguste Rodin’s statue The Thinker , in line with an idea that mental reflection causes people to question their faith in supernatural entities. In this study, the null hypothesis predicted that volunteers’ religious beliefs would stay the same, on average, after seeing The Thinker , assuming that the famous sculpture has no effect on viewers’ spiritual convictions.

The null ritual dictated that the researchers calculate whether group differences in religious beliefs before and after perusing the statue would have occurred by chance in no more than one out of 20 trials, or no more than 5 percent of the time. That’s what P < .05 means. By meeting that threshold, the result was tagged statistically significant, and not likely due to mere chance.

If that sounds reasonable, hold on. Even after meeting an arbitrary 5 percent threshold for statistical significance, the study hadn’t demonstrated that statue viewers were losing their religion. Researchers could only conjecture about why that might be the case, because the null ritual forced them to assume that there is no effect. Talk about running in circles.

To top it off, an independent redo of The Thinker study found no statistically significant decline in religious beliefs among viewers of the pensive statue. Frequent failures to confirm statistically significant results have triggered a crisis of confidence in sciences wedded to the null ritual ( SN: 8/27/18 ).

Some journals now require investigators to fork over their research designs and experimental data before submitting research papers for peer review. The goal is to discourage data fudging and to up the odds of publishing results that can be confirmed by other researchers.

Percentage of ‘landmark’ cancer studies that a biotechnology firm was able to replicate over a decade (6 of 53)

But the real problem lies in the null ritual itself, Gigerenzer says. In the early 20 th century, and without ever calculating the statistical significance of anything, Wolfgang Köhler developed Gestalt laws of perception, Jean Piaget formulated a theory of how thinking develops in children and Ivan Pavlov discovered principles of classical conditioning. Those pioneering scientists typically studied one or a handful of individuals using the types of simple statistics endorsed decades later by Loftus.

From 1940 to 1955, psychologists concerned with demonstrating the practical value of their field, especially to educators, sought an objective tool for telling real from chance findings. Rather than acknowledging that conflicting statistical approaches existed, psychology textbook writers and publishers mashed those methods into the one-size-fits-all P value, Gigerenzer says.

One inspiration for the null ritual came from British statistician Ronald Fisher. Starting in the 1930s, Fisher devised a type of significance testing to analyze the likelihood of a null hypothesis, which a researcher could propose as either an effect or no effect. Fisher wanted to calculate the exact statistical significance associated with, say, using a particular fertilizer deemed promising for crop yields.

Around the same time, statisticians Jerzy Neyman and Egon Pearson argued that testing a single null hypothesis is useless. Instead, they insisted on determining which of at least two alternative hypotheses best explained experimental results. Neyman and Pearson calculated an experiment’s probability of accepting a hypothesis that’s actually true, something left unexamined in Fisher’s null hypothesis test.

Psychologists’ null ritual folded elements of both approaches into a confusing hodge-podge. Researchers often don’t realize that statistically significant results don’t prove that a true effect has been discovered.

And about half of surveyed medical, biological and psychological researchers wrongly assume that finding no statistical significance in a study means that there was no actual effect . A closer analysis may reveal findings consistent with a real effect, especially when the original results fell just short of the arbitrary cutoff for statistical significance.

Statistical errors

A study of German psychology professors and students found that most agreed with at least one false statement about the meaning of a P value.

Frequency of misconceptions about P values

bar chart showing percentage of misconceptions among psychology students, professors teaching statistics and professors not teaching statistics

It’s well past time to dump the null ritual, says psychologist and applied statistician Richard Morey of Cardiff University, Wales. Researchers need to focus on developing theories of mind and behavior that lead to testable predictions. In that brave new scientific world, investigators will choose which of many statistical tools best suits their needs. “Statistics offer ways to figure out how to doubt what you’re seeing,” Morey says.

There’s no doubt that the illusion of finding truth in statistical significance still appeals to researchers in many fields. Morey hopes that, perhaps within a few decades, the null ritual’s reign of errors will end.

More Stories from Science News on Psychology

null hypothesis p value 0.05

Timbre can affect what harmony is music to our ears

An illustration of many happy people

Not all cultures value happiness over other aspects of well-being

A profile photo of Bruce the Kea

What parrots can teach us about human intelligence

Depiction of Odysseus tying himself to his ship's mast to resist the Sirens' call. Psychologists call that act self-control.

Most people say self-control is the same as willpower. Researchers disagree

gifts wrapped in holiday paper

Here’s how to give a good gift, according to science

An illustration of a black man in a red jacket hugging himself while several other people walk by and don't seem to notice him.

Why scientists are expanding the definition of loneliness

A photo of two Indigenous youth skinning a moose as part of a hunting camp in Alberta, Canada.

An apology to Indigenous communities sparks a mental health rethink

A photo of a man in a yellow raincoat standing at the end of a dock with his back to the camera. The background, in black and white, shows Lake Achen in Austria.

Time in nature or exercise is touted for happiness. But evidence is lacking

Subscribers, enter your e-mail address for full access to the Science News archives and digital editions.

Not a subscriber? Become one now .

cropped psychological scales high resolution logo transparent 1.png

What is the significance of a P-value of 0.000?

Table of Contents

A P-value of 0.000 indicates that there is a very low probability of obtaining the observed results by chance alone. This suggests that there is a strong statistical significance to the results, and it is unlikely that they occurred due to random sampling error. In other words, the results are highly unlikely to be a coincidence and provide strong evidence to support the hypothesis being tested. This low P-value also suggests that the observed effect is likely to be real and not due to chance, making it a valuable measure in statistical analysis.

Here is Interpret a P-Value of 0.000

When you run a statistical test, whether it’s a chi-square test, a test for a population mean, a test for a population proportion, a linear regression, or any other test, you’re often interested in the resulting p-value from that test.

A p-value simply tells you the strength of evidence in support of a null hypothesis.

If the p-value is less than the significance level, we reject the null hypothesis.

So, when you get a p-value of 0.000, you should compare it to the significance level. Common significance levels include 0.1, 0.05, and 0.01.

Since 0.000 is lower than all of these significance levels, we would reject the null hypothesis in each case.

Let’s walk through an example to clear things up.

Example: Getting a P-Value of 0.000

A factory claims that they produce tires that each weigh 200 pounds.

An auditor comes in and tests the null hypothesis that the mean weight of a tire is 200 pounds against the alternative hypothesis that the mean weight of a tire is not 200 pounds, using a 0.05 level of significance.

The null hypothesis (H0):  μ = 200

The alternative hypothesis: (Ha):  μ ≠ 200

Upon conducting a hypothesis test for a mean, the auditor gets a p-value of 0.000.

Since the p-value of 0.000 is less than the significance level of 0.05, the auditor rejects the null hypothesis.

Thus, he concludes that there is sufficient evidence to say that the true average weight of a tire is not 200 pounds.

What a P-Value of 0.000 Means

Whether you use Microsoft Excel, a TI-84 calculator, SPSS, or some other software to compute the p-value of a statistical test, often times the p-value is not exactly 0.000, but rather something extremely small like 0.000000000023.

Most software only display three decimal places, though, which is why the p-value shows up as 0.000.

If you conduct a statistical test using a significance level of 0.1, 0.05, or 0.01 (or any significance level greater than 0.000) and get a p-value of 0.000, then reject the null hypothesis.

Related terms:

  • What is the significance of a regression slope?
  • Significance of Correlations
  • Statistical Significance
  • Significance of Attachment Patterns
  • How do I Interpret Significance Codes in R?
  • How do I sum a column if the value in that column is less than a given value?
  • How to interpret the F-Value and P-Value in ANOVA
  • What Does a High F Value Mean in ANOVA? what does the f value mean in anova
  • How can I use Excel to show a default value depending on the value of another cell?
  • To return a blank cell instead of #N/A when using the VLOOKUP function, you can add the IFERROR function to your formula. The syntax for IFERROR is: =IFERROR(value, value_if_error) In this case, the value would be the VLOOKUP function, and the value_if_error would be an empty quotation marks (“”). The updated formula would look like this: =IFERROR(VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]), “”) This will return a blank cell if the VLOOKUP function returns an error, such as #N/A. If the VLOOKUP function returns a valid result, it will be displayed as normal. Note: This solution will work for any type of error, not just #N/A. You can use it to return a blank cell for any type of error that may occur.

IMAGES

  1. What is P-value in hypothesis testing

    null hypothesis p value 0.05

  2. Understanding P-Values and Statistical Significance

    null hypothesis p value 0.05

  3. What is the p-value?. Detailed explanation of p-value

    null hypothesis p value 0.05

  4. P-Value

    null hypothesis p value 0.05

  5. Understanding P-Values and Statistical Significance

    null hypothesis p value 0.05

  6. Hypothesis testing tutorial using p value method

    null hypothesis p value 0.05

VIDEO

  1. Hypothesis Testing

  2. Hypothesis Testing

  3. Null Hypothesis Significance Testing

  4. Hypothesis Testing Using TI 84

  5. P-Value, Confidence Interval and Significance explained

  6. Null Hypothesis explained in HINDI

COMMENTS

  1. Understanding P-values

    The p value gets smaller as the test statistic calculated from your data gets further away from the range of test statistics predicted by the null hypothesis. The p value is a proportion: if your p value is 0.05, that means that 5% of the time you would see a test statistic at least as extreme as the one you found if the null hypothesis was true.

  2. How to Interpret a P-Value Less Than 0.05 (With Examples)

    The null hypothesis (H0): μ = 200. The alternative hypothesis: (HA): μ ≠ 200. Upon conducting a hypothesis test for a mean, the auditor gets a p-value of 0.0154. Since the p-value of 0.0154 is less than the significance level of 0.05, the auditor rejects the null hypothesis and concludes that there is sufficient evidence to say that the ...

  3. Understanding P-Values and Statistical Significance

    A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p -value, the less likely the results occurred by random chance, and the ...

  4. The P Value and Statistical Significance: Misunderstandings

    These are as follows: if the P value is 0.05, the null hypothesis has a 5% chance of being true; a nonsignificant P value means that (for example) there is no difference between groups; a statistically significant finding (P is below a predetermined threshold) is clinically important; studies that yield P values on opposite sides of 0.05 ...

  5. Interpreting P values

    Here is the technical definition of P values: P values are the probability of observing a sample statistic that is at least as extreme as your sample statistic when you assume that the null hypothesis is true. Let's go back to our hypothetical medication study. Suppose the hypothesis test generates a P value of 0.03.

  6. Null Hypothesis and the P-Value. If you don't have a background in

    One of the most commonly used p-value is 0.05. If the calculated p-value turns out to be less than 0.05, the null hypothesis is considered to be false, or nullified (hence the name null hypothesis). And if the value is greater than 0.05, the null hypothesis is considered to be true. Let me elaborate a bit on that.

  7. p-value Calculator

    Formally, the p-value is the probability that the test statistic will produce values at least as extreme as the value it produced for your sample.It is crucial to remember that this probability is calculated under the assumption that the null hypothesis H 0 is true!. More intuitively, p-value answers the question: Assuming that I live in a world where the null hypothesis holds, how probable is ...

  8. Using P-values to make conclusions (article)

    Onward! We use p -values to make conclusions in significance testing. More specifically, we compare the p -value to a significance level α to make conclusions about our hypotheses. If the p -value is lower than the significance level we chose, then we reject the null hypothesis H 0 in favor of the alternative hypothesis H a .

  9. Statistical significance: p value, 0.05 threshold, and applications to

    Importantly, when reporting p values, authors should always provide the actual value, not only statements of "p < 0.05" or "p ≥ 0.05", because p values give a measure of the degree of data compatibility with the null hypothesis. Notably, radiomics and big data, fuelled by the application of artificial intelligence, involve hundreds ...

  10. P-values and significance tests (video)

    About. Transcript. We compare a P-value to a significance level to make a conclusion in a significance test. Given the null hypothesis is true, a p-value is the probability of getting a result as or more extreme than the sample result by random chance alone. If a p-value is lower than our significance level, we reject the null hypothesis.

  11. How Hypothesis Tests Work: Significance Levels (Alpha) and P values

    Using P values and Significance Levels Together. If your P value is less than or equal to your alpha level, reject the null hypothesis. The P value results are consistent with our graphical representation. The P value of 0.03112 is significant at the alpha level of 0.05 but not 0.01.

  12. p-value

    In null-hypothesis significance testing, the -value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Even though reporting p-values of statistical tests is common practice in ...

  13. S.3.2 Hypothesis Testing (P-Value Approach)

    The P -value is, therefore, the area under a tn - 1 = t14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually. Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests.

  14. How to Interpret a P-Value Greater Than 0.05 (With Examples)

    Upon conducting a hypothesis test for a mean using a significance level of α = .05, the biologist receives a p-value of 0.2338. Since the p-value of 0.2338 is greater than the significance level of 0.05, the biologist fails to reject the null hypothesis. Thus, she concludes that there is not sufficient evidence to say that the fertilizer leads ...

  15. 9.3

    The test statistic is, therefore: Z = p ^ − p 0 p 0 ( 1 − p 0) n = 0.853 − 0.90 0.90 ( 0.10) 150 = − 1.92. And, the rejection region is: Z P lesson 9.3 α = 0.05 -1.645 0 0.90. Since the test statistic Z = −1.92 < −1.645, we reject the null hypothesis. There is sufficient evidence at the α = 0.05 level to conclude that the rate has ...

  16. P-Values

    A test statistic enables us to determine a p-value, which is the probability (ranging from 0 to 1) of observing sample data as extreme (different) or more extreme if the null hypothesis were true. The smaller the p-value, the more incompatible the data are with the null hypothesis. A p-value ≤ 0.05 is an arbitrary but commonly used criterion ...

  17. "P < 0.05" Might Not Mean What You Think: American Statistical

    P > 0.05 is the probability that the null hypothesis is true. 1 minus the P value is the probability that the alternative hypothesis is true. A statistically significant test result (P ≤ 0.05) means that the test hypothesis is false or should be rejected. A P value greater than 0.05 means that no effect was observed.

  18. What Can You Say When Your P-Value is Greater Than 0.05?

    In the majority of analyses, an alpha of 0.05 is used as the cutoff for significance. If the p-value is less than 0.05, we reject the null hypothesis that there's no difference between the means and conclude that a significant difference does exist. If the p-value is larger than 0.05, we cannot conclude that a significant difference exists.

  19. What Is The Null Hypothesis & When To Reject It

    The observed value is statistically significant (p ≤ 0.05), so the null hypothesis (N0) is rejected, and the alternative hypothesis (Ha) is accepted. Usually, a researcher uses a confidence level of 95% or 99% (p-value of 0.05 or 0.01) as general guidelines to decide if you should reject or keep the null.

  20. P-value Calculator

    The calculated p-value is used in comparison with a predefined significance level (alpha) to make decisions about the null hypothesis. If the p-value is less than or equal to alpha, typically 0.05, the results are considered statistically significant, leading to the rejection of the null hypothesis in favor of the alternative hypothesis. ...

  21. How the strange idea of 'statistical significance' was born

    A mathematical ritual known as null hypothesis significance testing has led researchers astray since the 1950s. ... and then rejects that hypothesis if the P value for observed data came out to ...

  22. The p-value and rejecting the null (for one- and two-tail tests)

    The p-value (or the observed level of significance) is the smallest level of significance at which you can reject the null hypothesis, assuming the null hypothesis is true. You can also think about the p-value as the total area of the region of rejection. Remember that in a one-tailed test, the region of rejection is consolidated into one tail ...

  23. What Is The Significance Of A P-value Of 0.000?

    Upon conducting a hypothesis test for a mean, the auditor gets a p-value of 0.000. Since the p-value of 0.000 is less than the significance level of 0.05, the auditor rejects the null hypothesis. Thus, he concludes that there is sufficient evidence to say that the true average weight of a tire is not 200 pounds. What a P-Value of 0.000 Means

  24. Interpretation of Shapiro-Wilk test

    W = 0.9502, p-value = 0.6921 Now, if I assume the significance level at 0.05 than the p-value is larger then alpha (0.6921 > 0.05) and I cannot reject the null hypothesis about the normal distribution, but does it allow me to say that the sample has a normal distribution? ... Failing to reject a null hypothesis is an indication that the sample ...