How to conduct a meta-analysis in eight steps: a practical guide

  • Open access
  • Published: 30 November 2021
  • Volume 72 , pages 1–19, ( 2022 )

Cite this article

You have full access to this open access article

  • Christopher Hansen 1 ,
  • Holger Steinmetz 2 &
  • Jörn Block 3 , 4 , 5  

138k Accesses

43 Citations

157 Altmetric

Explore all metrics

Avoid common mistakes on your manuscript.

1 Introduction

“Scientists have known for centuries that a single study will not resolve a major issue. Indeed, a small sample study will not even resolve a minor issue. Thus, the foundation of science is the cumulation of knowledge from the results of many studies.” (Hunter et al. 1982 , p. 10)

Meta-analysis is a central method for knowledge accumulation in many scientific fields (Aguinis et al. 2011c ; Kepes et al. 2013 ). Similar to a narrative review, it serves as a synopsis of a research question or field. However, going beyond a narrative summary of key findings, a meta-analysis adds value in providing a quantitative assessment of the relationship between two target variables or the effectiveness of an intervention (Gurevitch et al. 2018 ). Also, it can be used to test competing theoretical assumptions against each other or to identify important moderators where the results of different primary studies differ from each other (Aguinis et al. 2011b ; Bergh et al. 2016 ). Rooted in the synthesis of the effectiveness of medical and psychological interventions in the 1970s (Glass 2015 ; Gurevitch et al. 2018 ), meta-analysis is nowadays also an established method in management research and related fields.

The increasing importance of meta-analysis in management research has resulted in the publication of guidelines in recent years that discuss the merits and best practices in various fields, such as general management (Bergh et al. 2016 ; Combs et al. 2019 ; Gonzalez-Mulé and Aguinis 2018 ), international business (Steel et al. 2021 ), economics and finance (Geyer-Klingeberg et al. 2020 ; Havranek et al. 2020 ), marketing (Eisend 2017 ; Grewal et al. 2018 ), and organizational studies (DeSimone et al. 2020 ; Rudolph et al. 2020 ). These articles discuss existing and trending methods and propose solutions for often experienced problems. This editorial briefly summarizes the insights of these papers; provides a workflow of the essential steps in conducting a meta-analysis; suggests state-of-the art methodological procedures; and points to other articles for in-depth investigation. Thus, this article has two goals: (1) based on the findings of previous editorials and methodological articles, it defines methodological recommendations for meta-analyses submitted to Management Review Quarterly (MRQ); and (2) it serves as a practical guide for researchers who have little experience with meta-analysis as a method but plan to conduct one in the future.

2 Eight steps in conducting a meta-analysis

2.1 step 1: defining the research question.

The first step in conducting a meta-analysis, as with any other empirical study, is the definition of the research question. Most importantly, the research question determines the realm of constructs to be considered or the type of interventions whose effects shall be analyzed. When defining the research question, two hurdles might develop. First, when defining an adequate study scope, researchers must consider that the number of publications has grown exponentially in many fields of research in recent decades (Fortunato et al. 2018 ). On the one hand, a larger number of studies increases the potentially relevant literature basis and enables researchers to conduct meta-analyses. Conversely, scanning a large amount of studies that could be potentially relevant for the meta-analysis results in a perhaps unmanageable workload. Thus, Steel et al. ( 2021 ) highlight the importance of balancing manageability and relevance when defining the research question. Second, similar to the number of primary studies also the number of meta-analyses in management research has grown strongly in recent years (Geyer-Klingeberg et al. 2020 ; Rauch 2020 ; Schwab 2015 ). Therefore, it is likely that one or several meta-analyses for many topics of high scholarly interest already exist. However, this should not deter researchers from investigating their research questions. One possibility is to consider moderators or mediators of a relationship that have previously been ignored. For example, a meta-analysis about startup performance could investigate the impact of different ways to measure the performance construct (e.g., growth vs. profitability vs. survival time) or certain characteristics of the founders as moderators. Another possibility is to replicate previous meta-analyses and test whether their findings can be confirmed with an updated sample of primary studies or newly developed methods. Frequent replications and updates of meta-analyses are important contributions to cumulative science and are increasingly called for by the research community (Anderson & Kichkha 2017 ; Steel et al. 2021 ). Consistent with its focus on replication studies (Block and Kuckertz 2018 ), MRQ therefore also invites authors to submit replication meta-analyses.

2.2 Step 2: literature search

2.2.1 search strategies.

Similar to conducting a literature review, the search process of a meta-analysis should be systematic, reproducible, and transparent, resulting in a sample that includes all relevant studies (Fisch and Block 2018 ; Gusenbauer and Haddaway 2020 ). There are several identification strategies for relevant primary studies when compiling meta-analytical datasets (Harari et al. 2020 ). First, previous meta-analyses on the same or a related topic may provide lists of included studies that offer a good starting point to identify and become familiar with the relevant literature. This practice is also applicable to topic-related literature reviews, which often summarize the central findings of the reviewed articles in systematic tables. Both article types likely include the most prominent studies of a research field. The most common and important search strategy, however, is a keyword search in electronic databases (Harari et al. 2020 ). This strategy will probably yield the largest number of relevant studies, particularly so-called ‘grey literature’, which may not be considered by literature reviews. Gusenbauer and Haddaway ( 2020 ) provide a detailed overview of 34 scientific databases, of which 18 are multidisciplinary or have a focus on management sciences, along with their suitability for literature synthesis. To prevent biased results due to the scope or journal coverage of one database, researchers should use at least two different databases (DeSimone et al. 2020 ; Martín-Martín et al. 2021 ; Mongeon & Paul-Hus 2016 ). However, a database search can easily lead to an overload of potentially relevant studies. For example, key term searches in Google Scholar for “entrepreneurial intention” and “firm diversification” resulted in more than 660,000 and 810,000 hits, respectively. Footnote 1 Therefore, a precise research question and precise search terms using Boolean operators are advisable (Gusenbauer and Haddaway 2020 ). Addressing the challenge of identifying relevant articles in the growing number of database publications, (semi)automated approaches using text mining and machine learning (Bosco et al. 2017 ; O’Mara-Eves et al. 2015 ; Ouzzani et al. 2016 ; Thomas et al. 2017 ) can also be promising and time-saving search tools in the future. Also, some electronic databases offer the possibility to track forward citations of influential studies and thereby identify further relevant articles. Finally, collecting unpublished or undetected studies through conferences, personal contact with (leading) scholars, or listservs can be strategies to increase the study sample size (Grewal et al. 2018 ; Harari et al. 2020 ; Pigott and Polanin 2020 ).

2.2.2 Study inclusion criteria and sample composition

Next, researchers must decide which studies to include in the meta-analysis. Some guidelines for literature reviews recommend limiting the sample to studies published in renowned academic journals to ensure the quality of findings (e.g., Kraus et al. 2020 ). For meta-analysis, however, Steel et al. ( 2021 ) advocate for the inclusion of all available studies, including grey literature, to prevent selection biases based on availability, cost, familiarity, and language (Rothstein et al. 2005 ), or the “Matthew effect”, which denotes the phenomenon that highly cited articles are found faster than less cited articles (Merton 1968 ). Harrison et al. ( 2017 ) find that the effects of published studies in management are inflated on average by 30% compared to unpublished studies. This so-called publication bias or “file drawer problem” (Rosenthal 1979 ) results from the preference of academia to publish more statistically significant and less statistically insignificant study results. Owen and Li ( 2020 ) showed that publication bias is particularly severe when variables of interest are used as key variables rather than control variables. To consider the true effect size of a target variable or relationship, the inclusion of all types of research outputs is therefore recommended (Polanin et al. 2016 ). Different test procedures to identify publication bias are discussed subsequently in Step 7.

In addition to the decision of whether to include certain study types (i.e., published vs. unpublished studies), there can be other reasons to exclude studies that are identified in the search process. These reasons can be manifold and are primarily related to the specific research question and methodological peculiarities. For example, studies identified by keyword search might not qualify thematically after all, may use unsuitable variable measurements, or may not report usable effect sizes. Furthermore, there might be multiple studies by the same authors using similar datasets. If they do not differ sufficiently in terms of their sample characteristics or variables used, only one of these studies should be included to prevent bias from duplicates (Wood 2008 ; see this article for a detection heuristic).

In general, the screening process should be conducted stepwise, beginning with a removal of duplicate citations from different databases, followed by abstract screening to exclude clearly unsuitable studies and a final full-text screening of the remaining articles (Pigott and Polanin 2020 ). A graphical tool to systematically document the sample selection process is the PRISMA flow diagram (Moher et al. 2009 ). Page et al. ( 2021 ) recently presented an updated version of the PRISMA statement, including an extended item checklist and flow diagram to report the study process and findings.

2.3 Step 3: choice of the effect size measure

2.3.1 types of effect sizes.

The two most common meta-analytical effect size measures in management studies are (z-transformed) correlation coefficients and standardized mean differences (Aguinis et al. 2011a ; Geyskens et al. 2009 ). However, meta-analyses in management science and related fields may not be limited to those two effect size measures but rather depend on the subfield of investigation (Borenstein 2009 ; Stanley and Doucouliagos 2012 ). In economics and finance, researchers are more interested in the examination of elasticities and marginal effects extracted from regression models than in pure bivariate correlations (Stanley and Doucouliagos 2012 ). Regression coefficients can also be converted to partial correlation coefficients based on their t-statistics to make regression results comparable across studies (Stanley and Doucouliagos 2012 ). Although some meta-analyses in management research have combined bivariate and partial correlations in their study samples, Aloe ( 2015 ) and Combs et al. ( 2019 ) advise researchers not to use this practice. Most importantly, they argue that the effect size strength of partial correlations depends on the other variables included in the regression model and is therefore incomparable to bivariate correlations (Schmidt and Hunter 2015 ), resulting in a possible bias of the meta-analytic results (Roth et al. 2018 ). We endorse this opinion. If at all, we recommend separate analyses for each measure. In addition to these measures, survival rates, risk ratios or odds ratios, which are common measures in medical research (Borenstein 2009 ), can be suitable effect sizes for specific management research questions, such as understanding the determinants of the survival of startup companies. To summarize, the choice of a suitable effect size is often taken away from the researcher because it is typically dependent on the investigated research question as well as the conventions of the specific research field (Cheung and Vijayakumar 2016 ).

2.3.2 Conversion of effect sizes to a common measure

After having defined the primary effect size measure for the meta-analysis, it might become necessary in the later coding process to convert study findings that are reported in effect sizes that are different from the chosen primary effect size. For example, a study might report only descriptive statistics for two study groups but no correlation coefficient, which is used as the primary effect size measure in the meta-analysis. Different effect size measures can be harmonized using conversion formulae, which are provided by standard method books such as Borenstein et al. ( 2009 ) or Lipsey and Wilson ( 2001 ). There also exist online effect size calculators for meta-analysis. Footnote 2

2.4 Step 4: choice of the analytical method used

Choosing which meta-analytical method to use is directly connected to the research question of the meta-analysis. Research questions in meta-analyses can address a relationship between constructs or an effect of an intervention in a general manner, or they can focus on moderating or mediating effects. There are four meta-analytical methods that are primarily used in contemporary management research (Combs et al. 2019 ; Geyer-Klingeberg et al. 2020 ), which allow the investigation of these different types of research questions: traditional univariate meta-analysis, meta-regression, meta-analytic structural equation modeling, and qualitative meta-analysis (Hoon 2013 ). While the first three are quantitative, the latter summarizes qualitative findings. Table 1 summarizes the key characteristics of the three quantitative methods.

2.4.1 Univariate meta-analysis

In its traditional form, a meta-analysis reports a weighted mean effect size for the relationship or intervention of investigation and provides information on the magnitude of variance among primary studies (Aguinis et al. 2011c ; Borenstein et al. 2009 ). Accordingly, it serves as a quantitative synthesis of a research field (Borenstein et al. 2009 ; Geyskens et al. 2009 ). Prominent traditional approaches have been developed, for example, by Hedges and Olkin ( 1985 ) or Hunter and Schmidt ( 1990 , 2004 ). However, going beyond its simple summary function, the traditional approach has limitations in explaining the observed variance among findings (Gonzalez-Mulé and Aguinis 2018 ). To identify moderators (or boundary conditions) of the relationship of interest, meta-analysts can create subgroups and investigate differences between those groups (Borenstein and Higgins 2013 ; Hunter and Schmidt 2004 ). Potential moderators can be study characteristics (e.g., whether a study is published vs. unpublished), sample characteristics (e.g., study country, industry focus, or type of survey/experiment participants), or measurement artifacts (e.g., different types of variable measurements). The univariate approach is thus suitable to identify the overall direction of a relationship and can serve as a good starting point for additional analyses. However, due to its limitations in examining boundary conditions and developing theory, the univariate approach on its own is currently oftentimes viewed as not sufficient (Rauch 2020 ; Shaw and Ertug 2017 ).

2.4.2 Meta-regression analysis

Meta-regression analysis (Hedges and Olkin 1985 ; Lipsey and Wilson 2001 ; Stanley and Jarrell 1989 ) aims to investigate the heterogeneity among observed effect sizes by testing multiple potential moderators simultaneously. In meta-regression, the coded effect size is used as the dependent variable and is regressed on a list of moderator variables. These moderator variables can be categorical variables as described previously in the traditional univariate approach or (semi)continuous variables such as country scores that are merged with the meta-analytical data. Thus, meta-regression analysis overcomes the disadvantages of the traditional approach, which only allows us to investigate moderators singularly using dichotomized subgroups (Combs et al. 2019 ; Gonzalez-Mulé and Aguinis 2018 ). These possibilities allow a more fine-grained analysis of research questions that are related to moderating effects. However, Schmidt ( 2017 ) critically notes that the number of effect sizes in the meta-analytical sample must be sufficiently large to produce reliable results when investigating multiple moderators simultaneously in a meta-regression. For further reading, Tipton et al. ( 2019 ) outline the technical, conceptual, and practical developments of meta-regression over the last decades. Gonzalez-Mulé and Aguinis ( 2018 ) provide an overview of methodological choices and develop evidence-based best practices for future meta-analyses in management using meta-regression.

2.4.3 Meta-analytic structural equation modeling (MASEM)

MASEM is a combination of meta-analysis and structural equation modeling and allows to simultaneously investigate the relationships among several constructs in a path model. Researchers can use MASEM to test several competing theoretical models against each other or to identify mediation mechanisms in a chain of relationships (Bergh et al. 2016 ). This method is typically performed in two steps (Cheung and Chan 2005 ): In Step 1, a pooled correlation matrix is derived, which includes the meta-analytical mean effect sizes for all variable combinations; Step 2 then uses this matrix to fit the path model. While MASEM was based primarily on traditional univariate meta-analysis to derive the pooled correlation matrix in its early years (Viswesvaran and Ones 1995 ), more advanced methods, such as the GLS approach (Becker 1992 , 1995 ) or the TSSEM approach (Cheung and Chan 2005 ), have been subsequently developed. Cheung ( 2015a ) and Jak ( 2015 ) provide an overview of these approaches in their books with exemplary code. For datasets with more complex data structures, Wilson et al. ( 2016 ) also developed a multilevel approach that is related to the TSSEM approach in the second step. Bergh et al. ( 2016 ) discuss nine decision points and develop best practices for MASEM studies.

2.4.4 Qualitative meta-analysis

While the approaches explained above focus on quantitative outcomes of empirical studies, qualitative meta-analysis aims to synthesize qualitative findings from case studies (Hoon 2013 ; Rauch et al. 2014 ). The distinctive feature of qualitative case studies is their potential to provide in-depth information about specific contextual factors or to shed light on reasons for certain phenomena that cannot usually be investigated by quantitative studies (Rauch 2020 ; Rauch et al. 2014 ). In a qualitative meta-analysis, the identified case studies are systematically coded in a meta-synthesis protocol, which is then used to identify influential variables or patterns and to derive a meta-causal network (Hoon 2013 ). Thus, the insights of contextualized and typically nongeneralizable single studies are aggregated to a larger, more generalizable picture (Habersang et al. 2019 ). Although still the exception, this method can thus provide important contributions for academics in terms of theory development (Combs et al., 2019 ; Hoon 2013 ) and for practitioners in terms of evidence-based management or entrepreneurship (Rauch et al. 2014 ). Levitt ( 2018 ) provides a guide and discusses conceptual issues for conducting qualitative meta-analysis in psychology, which is also useful for management researchers.

2.5 Step 5: choice of software

Software solutions to perform meta-analyses range from built-in functions or additional packages of statistical software to software purely focused on meta-analyses and from commercial to open-source solutions. However, in addition to personal preferences, the choice of the most suitable software depends on the complexity of the methods used and the dataset itself (Cheung and Vijayakumar 2016 ). Meta-analysts therefore must carefully check if their preferred software is capable of performing the intended analysis.

Among commercial software providers, Stata (from version 16 on) offers built-in functions to perform various meta-analytical analyses or to produce various plots (Palmer and Sterne 2016 ). For SPSS and SAS, there exist several macros for meta-analyses provided by scholars, such as David B. Wilson or Andy P. Field and Raphael Gillet (Field and Gillett 2010 ). Footnote 3 Footnote 4 For researchers using the open-source software R (R Core Team 2021 ), Polanin et al. ( 2017 ) provide an overview of 63 meta-analysis packages and their functionalities. For new users, they recommend the package metafor (Viechtbauer 2010 ), which includes most necessary functions and for which the author Wolfgang Viechtbauer provides tutorials on his project website. Footnote 5 Footnote 6 In addition to packages and macros for statistical software, templates for Microsoft Excel have also been developed to conduct simple meta-analyses, such as Meta-Essentials by Suurmond et al. ( 2017 ). Footnote 7 Finally, programs purely dedicated to meta-analysis also exist, such as Comprehensive Meta-Analysis (Borenstein et al. 2013 ) or RevMan by The Cochrane Collaboration ( 2020 ).

2.6 Step 6: coding of effect sizes

2.6.1 coding sheet.

The first step in the coding process is the design of the coding sheet. A universal template does not exist because the design of the coding sheet depends on the methods used, the respective software, and the complexity of the research design. For univariate meta-analysis or meta-regression, data are typically coded in wide format. In its simplest form, when investigating a correlational relationship between two variables using the univariate approach, the coding sheet would contain a column for the study name or identifier, the effect size coded from the primary study, and the study sample size. However, such simple relationships are unlikely in management research because the included studies are typically not identical but differ in several respects. With more complex data structures or moderator variables being investigated, additional columns are added to the coding sheet to reflect the data characteristics. These variables can be coded as dummy, factor, or (semi)continuous variables and later used to perform a subgroup analysis or meta regression. For MASEM, the required data input format can deviate depending on the method used (e.g., TSSEM requires a list of correlation matrices as data input). For qualitative meta-analysis, the coding scheme typically summarizes the key qualitative findings and important contextual and conceptual information (see Hoon ( 2013 ) for a coding scheme for qualitative meta-analysis). Figure  1 shows an exemplary coding scheme for a quantitative meta-analysis on the correlational relationship between top-management team diversity and profitability. In addition to effect and sample sizes, information about the study country, firm type, and variable operationalizations are coded. The list could be extended by further study and sample characteristics.

figure 1

Exemplary coding sheet for a meta-analysis on the relationship (correlation) between top-management team diversity and profitability

2.6.2 Inclusion of moderator or control variables

It is generally important to consider the intended research model and relevant nontarget variables before coding a meta-analytic dataset. For example, study characteristics can be important moderators or function as control variables in a meta-regression model. Similarly, control variables may be relevant in a MASEM approach to reduce confounding bias. Coding additional variables or constructs subsequently can be arduous if the sample of primary studies is large. However, the decision to include respective moderator or control variables, as in any empirical analysis, should always be based on strong (theoretical) rationales about how these variables can impact the investigated effect (Bernerth and Aguinis 2016 ; Bernerth et al. 2018 ; Thompson and Higgins 2002 ). While substantive moderators refer to theoretical constructs that act as buffers or enhancers of a supposed causal process, methodological moderators are features of the respective research designs that denote the methodological context of the observations and are important to control for systematic statistical particularities (Rudolph et al. 2020 ). Havranek et al. ( 2020 ) provide a list of recommended variables to code as potential moderators. While researchers may have clear expectations about the effects for some of these moderators, the concerns for other moderators may be tentative, and moderator analysis may be approached in a rather exploratory fashion. Thus, we argue that researchers should make full use of the meta-analytical design to obtain insights about potential context dependence that a primary study cannot achieve.

2.6.3 Treatment of multiple effect sizes in a study

A long-debated issue in conducting meta-analyses is whether to use only one or all available effect sizes for the same construct within a single primary study. For meta-analyses in management research, this question is fundamental because many empirical studies, particularly those relying on company databases, use multiple variables for the same construct to perform sensitivity analyses, resulting in multiple relevant effect sizes. In this case, researchers can either (randomly) select a single value, calculate a study average, or use the complete set of effect sizes (Bijmolt and Pieters 2001 ; López-López et al. 2018 ). Multiple effect sizes from the same study enrich the meta-analytic dataset and allow us to investigate the heterogeneity of the relationship of interest, such as different variable operationalizations (López-López et al. 2018 ; Moeyaert et al. 2017 ). However, including more than one effect size from the same study violates the independency assumption of observations (Cheung 2019 ; López-López et al. 2018 ), which can lead to biased results and erroneous conclusions (Gooty et al. 2021 ). We follow the recommendation of current best practice guides to take advantage of using all available effect size observations but to carefully consider interdependencies using appropriate methods such as multilevel models, panel regression models, or robust variance estimation (Cheung 2019 ; Geyer-Klingeberg et al. 2020 ; Gooty et al. 2021 ; López-López et al. 2018 ; Moeyaert et al. 2017 ).

2.7 Step 7: analysis

2.7.1 outlier analysis and tests for publication bias.

Before conducting the primary analysis, some preliminary sensitivity analyses might be necessary, which should ensure the robustness of the meta-analytical findings (Rudolph et al. 2020 ). First, influential outlier observations could potentially bias the observed results, particularly if the number of total effect sizes is small. Several statistical methods can be used to identify outliers in meta-analytical datasets (Aguinis et al. 2013 ; Viechtbauer and Cheung 2010 ). However, there is a debate about whether to keep or omit these observations. Anyhow, relevant studies should be closely inspected to infer an explanation about their deviating results. As in any other primary study, outliers can be a valid representation, albeit representing a different population, measure, construct, design or procedure. Thus, inferences about outliers can provide the basis to infer potential moderators (Aguinis et al. 2013 ; Steel et al. 2021 ). On the other hand, outliers can indicate invalid research, for instance, when unrealistically strong correlations are due to construct overlap (i.e., lack of a clear demarcation between independent and dependent variables), invalid measures, or simply typing errors when coding effect sizes. An advisable step is therefore to compare the results both with and without outliers and base the decision on whether to exclude outlier observations with careful consideration (Geyskens et al. 2009 ; Grewal et al. 2018 ; Kepes et al. 2013 ). However, instead of simply focusing on the size of the outlier, its leverage should be considered. Thus, Viechtbauer and Cheung ( 2010 ) propose considering a combination of standardized deviation and a study’s leverage.

Second, as mentioned in the context of a literature search, potential publication bias may be an issue. Publication bias can be examined in multiple ways (Rothstein et al. 2005 ). First, the funnel plot is a simple graphical tool that can provide an overview of the effect size distribution and help to detect publication bias (Stanley and Doucouliagos 2010 ). A funnel plot can also support in identifying potential outliers. As mentioned above, a graphical display of deviation (e.g., studentized residuals) and leverage (Cook’s distance) can help detect the presence of outliers and evaluate their influence (Viechtbauer and Cheung 2010 ). Moreover, several statistical procedures can be used to test for publication bias (Harrison et al. 2017 ; Kepes et al. 2012 ), including subgroup comparisons between published and unpublished studies, Begg and Mazumdar’s ( 1994 ) rank correlation test, cumulative meta-analysis (Borenstein et al. 2009 ), the trim and fill method (Duval and Tweedie 2000a , b ), Egger et al.’s ( 1997 ) regression test, failsafe N (Rosenthal 1979 ), or selection models (Hedges and Vevea 2005 ; Vevea and Woods 2005 ). In examining potential publication bias, Kepes et al. ( 2012 ) and Harrison et al. ( 2017 ) both recommend not relying only on a single test but rather using multiple conceptionally different test procedures (i.e., the so-called “triangulation approach”).

2.7.2 Model choice

After controlling and correcting for the potential presence of impactful outliers or publication bias, the next step in meta-analysis is the primary analysis, where meta-analysts must decide between two different types of models that are based on different assumptions: fixed-effects and random-effects (Borenstein et al. 2010 ). Fixed-effects models assume that all observations share a common mean effect size, which means that differences are only due to sampling error, while random-effects models assume heterogeneity and allow for a variation of the true effect sizes across studies (Borenstein et al. 2010 ; Cheung and Vijayakumar 2016 ; Hunter and Schmidt 2004 ). Both models are explained in detail in standard textbooks (e.g., Borenstein et al. 2009 ; Hunter and Schmidt 2004 ; Lipsey and Wilson 2001 ).

In general, the presence of heterogeneity is likely in management meta-analyses because most studies do not have identical empirical settings, which can yield different effect size strengths or directions for the same investigated phenomenon. For example, the identified studies have been conducted in different countries with different institutional settings, or the type of study participants varies (e.g., students vs. employees, blue-collar vs. white-collar workers, or manufacturing vs. service firms). Thus, the vast majority of meta-analyses in management research and related fields use random-effects models (Aguinis et al. 2011a ). In a meta-regression, the random-effects model turns into a so-called mixed-effects model because moderator variables are added as fixed effects to explain the impact of observed study characteristics on effect size variations (Raudenbush 2009 ).

2.8 Step 8: reporting results

2.8.1 reporting in the article.

The final step in performing a meta-analysis is reporting its results. Most importantly, all steps and methodological decisions should be comprehensible to the reader. DeSimone et al. ( 2020 ) provide an extensive checklist for journal reviewers of meta-analytical studies. This checklist can also be used by authors when performing their analyses and reporting their results to ensure that all important aspects have been addressed. Alternative checklists are provided, for example, by Appelbaum et al. ( 2018 ) or Page et al. ( 2021 ). Similarly, Levitt et al. ( 2018 ) provide a detailed guide for qualitative meta-analysis reporting standards.

For quantitative meta-analyses, tables reporting results should include all important information and test statistics, including mean effect sizes; standard errors and confidence intervals; the number of observations and study samples included; and heterogeneity measures. If the meta-analytic sample is rather small, a forest plot provides a good overview of the different findings and their accuracy. However, this figure will be less feasible for meta-analyses with several hundred effect sizes included. Also, results displayed in the tables and figures must be explained verbally in the results and discussion sections. Most importantly, authors must answer the primary research question, i.e., whether there is a positive, negative, or no relationship between the variables of interest, or whether the examined intervention has a certain effect. These results should be interpreted with regard to their magnitude (or significance), both economically and statistically. However, when discussing meta-analytical results, authors must describe the complexity of the results, including the identified heterogeneity and important moderators, future research directions, and theoretical relevance (DeSimone et al. 2019 ). In particular, the discussion of identified heterogeneity and underlying moderator effects is critical; not including this information can lead to false conclusions among readers, who interpret the reported mean effect size as universal for all included primary studies and ignore the variability of findings when citing the meta-analytic results in their research (Aytug et al. 2012 ; DeSimone et al. 2019 ).

2.8.2 Open-science practices

Another increasingly important topic is the public provision of meta-analytical datasets and statistical codes via open-source repositories. Open-science practices allow for results validation and for the use of coded data in subsequent meta-analyses ( Polanin et al. 2020 ), contributing to the development of cumulative science. Steel et al. ( 2021 ) refer to open science meta-analyses as a step towards “living systematic reviews” (Elliott et al. 2017 ) with continuous updates in real time. MRQ supports this development and encourages authors to make their datasets publicly available. Moreau and Gamble ( 2020 ), for example, provide various templates and video tutorials to conduct open science meta-analyses. There exist several open science repositories, such as the Open Science Foundation (OSF; for a tutorial, see Soderberg 2018 ), to preregister and make documents publicly available. Furthermore, several initiatives in the social sciences have been established to develop dynamic meta-analyses, such as metaBUS (Bosco et al. 2015 , 2017 ), MetaLab (Bergmann et al. 2018 ), or PsychOpen CAMA (Burgard et al. 2021 ).

3 Conclusion

This editorial provides a comprehensive overview of the essential steps in conducting and reporting a meta-analysis with references to more in-depth methodological articles. It also serves as a guide for meta-analyses submitted to MRQ and other management journals. MRQ welcomes all types of meta-analyses from all subfields and disciplines of management research.

Gusenbauer and Haddaway ( 2020 ), however, point out that Google Scholar is not appropriate as a primary search engine due to a lack of reproducibility of search results.

One effect size calculator by David B. Wilson is accessible via: https://www.campbellcollaboration.org/escalc/html/EffectSizeCalculator-Home.php .

The macros of David B. Wilson can be downloaded from: http://mason.gmu.edu/~dwilsonb/ .

The macros of Field and Gillet ( 2010 ) can be downloaded from: https://www.discoveringstatistics.com/repository/fieldgillett/how_to_do_a_meta_analysis.html .

The tutorials can be found via: https://www.metafor-project.org/doku.php .

Metafor does currently not provide functions to conduct MASEM. For MASEM, users can, for instance, use the package metaSEM (Cheung 2015b ).

The workbooks can be downloaded from: https://www.erim.eur.nl/research-support/meta-essentials/ .

Aguinis H, Dalton DR, Bosco FA, Pierce CA, Dalton CM (2011a) Meta-analytic choices and judgment calls: Implications for theory building and testing, obtained effect sizes, and scholarly impact. J Manag 37(1):5–38

Google Scholar  

Aguinis H, Gottfredson RK, Joo H (2013) Best-practice recommendations for defining, identifying, and handling outliers. Organ Res Methods 16(2):270–301

Article   Google Scholar  

Aguinis H, Gottfredson RK, Wright TA (2011b) Best-practice recommendations for estimating interaction effects using meta-analysis. J Organ Behav 32(8):1033–1043

Aguinis H, Pierce CA, Bosco FA, Dalton DR, Dalton CM (2011c) Debunking myths and urban legends about meta-analysis. Organ Res Methods 14(2):306–331

Aloe AM (2015) Inaccuracy of regression results in replacing bivariate correlations. Res Synth Methods 6(1):21–27

Anderson RG, Kichkha A (2017) Replication, meta-analysis, and research synthesis in economics. Am Econ Rev 107(5):56–59

Appelbaum M, Cooper H, Kline RB, Mayo-Wilson E, Nezu AM, Rao SM (2018) Journal article reporting standards for quantitative research in psychology: the APA publications and communications BOARD task force report. Am Psychol 73(1):3–25

Aytug ZG, Rothstein HR, Zhou W, Kern MC (2012) Revealed or concealed? Transparency of procedures, decisions, and judgment calls in meta-analyses. Organ Res Methods 15(1):103–133

Begg CB, Mazumdar M (1994) Operating characteristics of a rank correlation test for publication bias. Biometrics 50(4):1088–1101. https://doi.org/10.2307/2533446

Bergh DD, Aguinis H, Heavey C, Ketchen DJ, Boyd BK, Su P, Lau CLL, Joo H (2016) Using meta-analytic structural equation modeling to advance strategic management research: Guidelines and an empirical illustration via the strategic leadership-performance relationship. Strateg Manag J 37(3):477–497

Becker BJ (1992) Using results from replicated studies to estimate linear models. J Educ Stat 17(4):341–362

Becker BJ (1995) Corrections to “Using results from replicated studies to estimate linear models.” J Edu Behav Stat 20(1):100–102

Bergmann C, Tsuji S, Piccinini PE, Lewis ML, Braginsky M, Frank MC, Cristia A (2018) Promoting replicability in developmental research through meta-analyses: Insights from language acquisition research. Child Dev 89(6):1996–2009

Bernerth JB, Aguinis H (2016) A critical review and best-practice recommendations for control variable usage. Pers Psychol 69(1):229–283

Bernerth JB, Cole MS, Taylor EC, Walker HJ (2018) Control variables in leadership research: A qualitative and quantitative review. J Manag 44(1):131–160

Bijmolt TH, Pieters RG (2001) Meta-analysis in marketing when studies contain multiple measurements. Mark Lett 12(2):157–169

Block J, Kuckertz A (2018) Seven principles of effective replication studies: Strengthening the evidence base of management research. Manag Rev Quart 68:355–359

Borenstein M (2009) Effect sizes for continuous data. In: Cooper H, Hedges LV, Valentine JC (eds) The handbook of research synthesis and meta-analysis. Russell Sage Foundation, pp 221–235

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR (2009) Introduction to meta-analysis. John Wiley, Chichester

Book   Google Scholar  

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR (2010) A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods 1(2):97–111

Borenstein M, Hedges L, Higgins J, Rothstein H (2013) Comprehensive meta-analysis (version 3). Biostat, Englewood, NJ

Borenstein M, Higgins JP (2013) Meta-analysis and subgroups. Prev Sci 14(2):134–143

Bosco FA, Steel P, Oswald FL, Uggerslev K, Field JG (2015) Cloud-based meta-analysis to bridge science and practice: Welcome to metaBUS. Person Assess Decis 1(1):3–17

Bosco FA, Uggerslev KL, Steel P (2017) MetaBUS as a vehicle for facilitating meta-analysis. Hum Resour Manag Rev 27(1):237–254

Burgard T, Bošnjak M, Studtrucker R (2021) Community-augmented meta-analyses (CAMAs) in psychology: potentials and current systems. Zeitschrift Für Psychologie 229(1):15–23

Cheung MWL (2015a) Meta-analysis: A structural equation modeling approach. John Wiley & Sons, Chichester

Cheung MWL (2015b) metaSEM: An R package for meta-analysis using structural equation modeling. Front Psychol 5:1521

Cheung MWL (2019) A guide to conducting a meta-analysis with non-independent effect sizes. Neuropsychol Rev 29(4):387–396

Cheung MWL, Chan W (2005) Meta-analytic structural equation modeling: a two-stage approach. Psychol Methods 10(1):40–64

Cheung MWL, Vijayakumar R (2016) A guide to conducting a meta-analysis. Neuropsychol Rev 26(2):121–128

Combs JG, Crook TR, Rauch A (2019) Meta-analytic research in management: contemporary approaches unresolved controversies and rising standards. J Manag Stud 56(1):1–18. https://doi.org/10.1111/joms.12427

DeSimone JA, Köhler T, Schoen JL (2019) If it were only that easy: the use of meta-analytic research by organizational scholars. Organ Res Methods 22(4):867–891. https://doi.org/10.1177/1094428118756743

DeSimone JA, Brannick MT, O’Boyle EH, Ryu JW (2020) Recommendations for reviewing meta-analyses in organizational research. Organ Res Methods 56:455–463

Duval S, Tweedie R (2000a) Trim and fill: a simple funnel-plot–based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56(2):455–463

Duval S, Tweedie R (2000b) A nonparametric “trim and fill” method of accounting for publication bias in meta-analysis. J Am Stat Assoc 95(449):89–98

Egger M, Smith GD, Schneider M, Minder C (1997) Bias in meta-analysis detected by a simple, graphical test. BMJ 315(7109):629–634

Eisend M (2017) Meta-Analysis in advertising research. J Advert 46(1):21–35

Elliott JH, Synnot A, Turner T, Simmons M, Akl EA, McDonald S, Salanti G, Meerpohl J, MacLehose H, Hilton J, Tovey D, Shemilt I, Thomas J (2017) Living systematic review: 1. Introduction—the why, what, when, and how. J Clin Epidemiol 91:2330. https://doi.org/10.1016/j.jclinepi.2017.08.010

Field AP, Gillett R (2010) How to do a meta-analysis. Br J Math Stat Psychol 63(3):665–694

Fisch C, Block J (2018) Six tips for your (systematic) literature review in business and management research. Manag Rev Quart 68:103–106

Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A (2018) Science of science. Science 359(6379). https://doi.org/10.1126/science.aao0185

Geyer-Klingeberg J, Hang M, Rathgeber A (2020) Meta-analysis in finance research: Opportunities, challenges, and contemporary applications. Int Rev Finan Anal 71:101524

Geyskens I, Krishnan R, Steenkamp JBE, Cunha PV (2009) A review and evaluation of meta-analysis practices in management research. J Manag 35(2):393–419

Glass GV (2015) Meta-analysis at middle age: a personal history. Res Synth Methods 6(3):221–231

Gonzalez-Mulé E, Aguinis H (2018) Advancing theory by assessing boundary conditions with metaregression: a critical review and best-practice recommendations. J Manag 44(6):2246–2273

Gooty J, Banks GC, Loignon AC, Tonidandel S, Williams CE (2021) Meta-analyses as a multi-level model. Organ Res Methods 24(2):389–411. https://doi.org/10.1177/1094428119857471

Grewal D, Puccinelli N, Monroe KB (2018) Meta-analysis: integrating accumulated knowledge. J Acad Mark Sci 46(1):9–30

Gurevitch J, Koricheva J, Nakagawa S, Stewart G (2018) Meta-analysis and the science of research synthesis. Nature 555(7695):175–182

Gusenbauer M, Haddaway NR (2020) Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res Synth Methods 11(2):181–217

Habersang S, Küberling-Jost J, Reihlen M, Seckler C (2019) A process perspective on organizational failure: a qualitative meta-analysis. J Manage Stud 56(1):19–56

Harari MB, Parola HR, Hartwell CJ, Riegelman A (2020) Literature searches in systematic reviews and meta-analyses: A review, evaluation, and recommendations. J Vocat Behav 118:103377

Harrison JS, Banks GC, Pollack JM, O’Boyle EH, Short J (2017) Publication bias in strategic management research. J Manag 43(2):400–425

Havránek T, Stanley TD, Doucouliagos H, Bom P, Geyer-Klingeberg J, Iwasaki I, Reed WR, Rost K, Van Aert RCM (2020) Reporting guidelines for meta-analysis in economics. J Econ Surveys 34(3):469–475

Hedges LV, Olkin I (1985) Statistical methods for meta-analysis. Academic Press, Orlando

Hedges LV, Vevea JL (2005) Selection methods approaches. In: Rothstein HR, Sutton A, Borenstein M (eds) Publication bias in meta-analysis: prevention, assessment, and adjustments. Wiley, Chichester, pp 145–174

Hoon C (2013) Meta-synthesis of qualitative case studies: an approach to theory building. Organ Res Methods 16(4):522–556

Hunter JE, Schmidt FL (1990) Methods of meta-analysis: correcting error and bias in research findings. Sage, Newbury Park

Hunter JE, Schmidt FL (2004) Methods of meta-analysis: correcting error and bias in research findings, 2nd edn. Sage, Thousand Oaks

Hunter JE, Schmidt FL, Jackson GB (1982) Meta-analysis: cumulating research findings across studies. Sage Publications, Beverly Hills

Jak S (2015) Meta-analytic structural equation modelling. Springer, New York, NY

Kepes S, Banks GC, McDaniel M, Whetzel DL (2012) Publication bias in the organizational sciences. Organ Res Methods 15(4):624–662

Kepes S, McDaniel MA, Brannick MT, Banks GC (2013) Meta-analytic reviews in the organizational sciences: Two meta-analytic schools on the way to MARS (the Meta-Analytic Reporting Standards). J Bus Psychol 28(2):123–143

Kraus S, Breier M, Dasí-Rodríguez S (2020) The art of crafting a systematic literature review in entrepreneurship research. Int Entrepreneur Manag J 16(3):1023–1042

Levitt HM (2018) How to conduct a qualitative meta-analysis: tailoring methods to enhance methodological integrity. Psychother Res 28(3):367–378

Levitt HM, Bamberg M, Creswell JW, Frost DM, Josselson R, Suárez-Orozco C (2018) Journal article reporting standards for qualitative primary, qualitative meta-analytic, and mixed methods research in psychology: the APA publications and communications board task force report. Am Psychol 73(1):26

Lipsey MW, Wilson DB (2001) Practical meta-analysis. Sage Publications, Inc.

López-López JA, Page MJ, Lipsey MW, Higgins JP (2018) Dealing with effect size multiplicity in systematic reviews and meta-analyses. Res Synth Methods 9(3):336–351

Martín-Martín A, Thelwall M, Orduna-Malea E, López-Cózar ED (2021) Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations. Scientometrics 126(1):871–906

Merton RK (1968) The Matthew effect in science: the reward and communication systems of science are considered. Science 159(3810):56–63

Moeyaert M, Ugille M, Natasha Beretvas S, Ferron J, Bunuan R, Van den Noortgate W (2017) Methods for dealing with multiple outcomes in meta-analysis: a comparison between averaging effect sizes, robust variance estimation and multilevel meta-analysis. Int J Soc Res Methodol 20(6):559–572

Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS medicine. 6(7):e1000097

Mongeon P, Paul-Hus A (2016) The journal coverage of Web of Science and Scopus: a comparative analysis. Scientometrics 106(1):213–228

Moreau D, Gamble B (2020) Conducting a meta-analysis in the age of open science: Tools, tips, and practical recommendations. Psychol Methods. https://doi.org/10.1037/met0000351

O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4(1):1–22

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A (2016) Rayyan—a web and mobile app for systematic reviews. Syst Rev 5(1):1–10

Owen E, Li Q (2021) The conditional nature of publication bias: a meta-regression analysis. Polit Sci Res Methods 9(4):867–877

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E,McDonald S,McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372. https://doi.org/10.1136/bmj.n71

Palmer TM, Sterne JAC (eds) (2016) Meta-analysis in stata: an updated collection from the stata journal, 2nd edn. Stata Press, College Station, TX

Pigott TD, Polanin JR (2020) Methodological guidance paper: High-quality meta-analysis in a systematic review. Rev Educ Res 90(1):24–46

Polanin JR, Tanner-Smith EE, Hennessy EA (2016) Estimating the difference between published and unpublished effect sizes: a meta-review. Rev Educ Res 86(1):207–236

Polanin JR, Hennessy EA, Tanner-Smith EE (2017) A review of meta-analysis packages in R. J Edu Behav Stat 42(2):206–242

Polanin JR, Hennessy EA, Tsuji S (2020) Transparency and reproducibility of meta-analyses in psychology: a meta-review. Perspect Psychol Sci 15(4):1026–1041. https://doi.org/10.1177/17456916209064

R Core Team (2021). R: A language and environment for statistical computing . R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ .

Rauch A (2020) Opportunities and threats in reviewing entrepreneurship theory and practice. Entrep Theory Pract 44(5):847–860

Rauch A, van Doorn R, Hulsink W (2014) A qualitative approach to evidence–based entrepreneurship: theoretical considerations and an example involving business clusters. Entrep Theory Pract 38(2):333–368

Raudenbush SW (2009) Analyzing effect sizes: Random-effects models. In: Cooper H, Hedges LV, Valentine JC (eds) The handbook of research synthesis and meta-analysis, 2nd edn. Russell Sage Foundation, New York, NY, pp 295–315

Rosenthal R (1979) The file drawer problem and tolerance for null results. Psychol Bull 86(3):638

Rothstein HR, Sutton AJ, Borenstein M (2005) Publication bias in meta-analysis: prevention, assessment and adjustments. Wiley, Chichester

Roth PL, Le H, Oh I-S, Van Iddekinge CH, Bobko P (2018) Using beta coefficients to impute missing correlations in meta-analysis research: Reasons for caution. J Appl Psychol 103(6):644–658. https://doi.org/10.1037/apl0000293

Rudolph CW, Chang CK, Rauvola RS, Zacher H (2020) Meta-analysis in vocational behavior: a systematic review and recommendations for best practices. J Vocat Behav 118:103397

Schmidt FL (2017) Statistical and measurement pitfalls in the use of meta-regression in meta-analysis. Career Dev Int 22(5):469–476

Schmidt FL, Hunter JE (2015) Methods of meta-analysis: correcting error and bias in research findings. Sage, Thousand Oaks

Schwab A (2015) Why all researchers should report effect sizes and their confidence intervals: Paving the way for meta–analysis and evidence–based management practices. Entrepreneurship Theory Pract 39(4):719–725. https://doi.org/10.1111/etap.12158

Shaw JD, Ertug G (2017) The suitability of simulations and meta-analyses for submissions to Academy of Management Journal. Acad Manag J 60(6):2045–2049

Soderberg CK (2018) Using OSF to share data: A step-by-step guide. Adv Methods Pract Psychol Sci 1(1):115–120

Stanley TD, Doucouliagos H (2010) Picture this: a simple graph that reveals much ado about research. J Econ Surveys 24(1):170–191

Stanley TD, Doucouliagos H (2012) Meta-regression analysis in economics and business. Routledge, London

Stanley TD, Jarrell SB (1989) Meta-regression analysis: a quantitative method of literature surveys. J Econ Surveys 3:54–67

Steel P, Beugelsdijk S, Aguinis H (2021) The anatomy of an award-winning meta-analysis: Recommendations for authors, reviewers, and readers of meta-analytic reviews. J Int Bus Stud 52(1):23–44

Suurmond R, van Rhee H, Hak T (2017) Introduction, comparison, and validation of Meta-Essentials: a free and simple tool for meta-analysis. Res Synth Methods 8(4):537–553

The Cochrane Collaboration (2020). Review Manager (RevMan) [Computer program] (Version 5.4).

Thomas J, Noel-Storr A, Marshall I, Wallace B, McDonald S, Mavergames C, Glasziou P, Shemilt I, Synnot A, Turner T, Elliot J (2017) Living systematic reviews: 2. Combining human and machine effort. J Clin Epidemiol 91:31–37

Thompson SG, Higgins JP (2002) How should meta-regression analyses be undertaken and interpreted? Stat Med 21(11):1559–1573

Tipton E, Pustejovsky JE, Ahmadi H (2019) A history of meta-regression: technical, conceptual, and practical developments between 1974 and 2018. Res Synth Methods 10(2):161–179

Vevea JL, Woods CM (2005) Publication bias in research synthesis: Sensitivity analysis using a priori weight functions. Psychol Methods 10(4):428–443

Viechtbauer W (2010) Conducting meta-analyses in R with the metafor package. J Stat Softw 36(3):1–48

Viechtbauer W, Cheung MWL (2010) Outlier and influence diagnostics for meta-analysis. Res Synth Methods 1(2):112–125

Viswesvaran C, Ones DS (1995) Theory testing: combining psychometric meta-analysis and structural equations modeling. Pers Psychol 48(4):865–885

Wilson SJ, Polanin JR, Lipsey MW (2016) Fitting meta-analytic structural equation models with complex datasets. Res Synth Methods 7(2):121–139. https://doi.org/10.1002/jrsm.1199

Wood JA (2008) Methodology for dealing with duplicate study effects in a meta-analysis. Organ Res Methods 11(1):79–95

Download references

Open Access funding enabled and organized by Projekt DEAL. No funding was received to assist with the preparation of this manuscript.

Author information

Authors and affiliations.

University of Luxembourg, Luxembourg, Luxembourg

Christopher Hansen

Leibniz Institute for Psychology (ZPID), Trier, Germany

Holger Steinmetz

Trier University, Trier, Germany

Erasmus University Rotterdam, Rotterdam, The Netherlands

Wittener Institut Für Familienunternehmen, Universität Witten/Herdecke, Witten, Germany

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Jörn Block .

Ethics declarations

Conflict of interest.

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

See Table 1 .

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Hansen, C., Steinmetz, H. & Block, J. How to conduct a meta-analysis in eight steps: a practical guide. Manag Rev Q 72 , 1–19 (2022). https://doi.org/10.1007/s11301-021-00247-4

Download citation

Published : 30 November 2021

Issue Date : February 2022

DOI : https://doi.org/10.1007/s11301-021-00247-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 30 January 2023

A systematic review and meta-analysis of the evidence on learning during the COVID-19 pandemic

  • Bastian A. Betthäuser   ORCID: orcid.org/0000-0002-4544-4073 1 , 2 , 3 ,
  • Anders M. Bach-Mortensen   ORCID: orcid.org/0000-0001-7804-7958 2 &
  • Per Engzell   ORCID: orcid.org/0000-0002-2404-6308 3 , 4 , 5  

Nature Human Behaviour volume  7 ,  pages 375–385 ( 2023 ) Cite this article

68k Accesses

96 Citations

1961 Altmetric

Metrics details

  • Social policy

To what extent has the learning progress of school-aged children slowed down during the COVID-19 pandemic? A growing number of studies address this question, but findings vary depending on context. Here we conduct a pre-registered systematic review, quality appraisal and meta-analysis of 42 studies across 15 countries to assess the magnitude of learning deficits during the pandemic. We find a substantial overall learning deficit (Cohen’s d  = −0.14, 95% confidence interval −0.17 to −0.10), which arose early in the pandemic and persists over time. Learning deficits are particularly large among children from low socio-economic backgrounds. They are also larger in maths than in reading and in middle-income countries relative to high-income countries. There is a lack of evidence on learning progress during the pandemic in low-income countries. Future research should address this evidence gap and avoid the common risks of bias that we identify.

Similar content being viewed by others

what is meta analysis in education research

Elementary school teachers’ perspectives about learning during the COVID-19 pandemic

Aymee Alvarez-Rivero, Candice Odgers & Daniel Ansari

what is meta analysis in education research

Measuring and forecasting progress in education: what about early childhood?

Linda M. Richter, Jere R. Behrman, … Hirokazu Yoshikawa

what is meta analysis in education research

School-based health care: improving academic outcomes for inner-city children—a prospective cohort quasi-experimental study

Saisujani Rasiah, Peter Jüni, … Sloane J. Freeman

The coronavirus disease 2019 (COVID-19) pandemic has led to one of the largest disruptions to learning in history. To a large extent, this is due to school closures, which are estimated to have affected 95% of the world’s student population 1 . But even when face-to-face teaching resumed, instruction has often been compromised by hybrid teaching, and by children or teachers having to quarantine and miss classes. The effect of limited face-to-face instruction is compounded by the pandemic’s consequences for children’s out-of-school learning environment, as well as their mental and physical health. Lockdowns have restricted children’s movement and their ability to play, meet other children and engage in extra-curricular activities. Children’s wellbeing and family relationships have also suffered due to economic uncertainties and conflicting demands of work, care and learning. These negative consequences can be expected to be most pronounced for children from low socio-economic family backgrounds, exacerbating pre-existing educational inequalities.

It is critical to understand the extent to which learning progress has changed since the onset of the COVID-19 pandemic. We use the term ‘learning deficit’ to encompass both a delay in expected learning progress, as well as a loss of skills and knowledge already gained. The COVID-19 learning deficit is likely to affect children’s life chances through their education and labour market prospects. At the societal level, it can have important implications for growth, prosperity and social cohesion. As policy-makers across the world are seeking to limit further learning deficits and to devise policies to recover learning deficits that have already been incurred, assessing the current state of learning is crucial. A careful assessment of the COVID-19 learning deficit is also necessary to weigh the true costs and benefits of school closures.

A number of narrative reviews have sought to summarize the emerging research on COVID-19 and learning, mostly focusing on learning progress relatively early in the pandemic 2 , 3 , 4 , 5 , 6 . Moreover, two reviews harmonized and synthesized existing estimates of learning deficits during the pandemic 7 , 8 . In line with the narrative reviews, these two reviews find a substantial reduction in learning progress during the pandemic. However, this finding is based on a relatively small number of studies (18 and 10 studies, respectively). The limited evidence that was available at the time these reviews were conducted also precluded them from meta-analysing variation in the magnitude of learning deficits over time and across subjects, different groups of students or country contexts.

In this Article, we conduct a systematic review and meta-analysis of the evidence on COVID-19 learning deficits 2.5 years into the pandemic. Our primary pre-registered research question was ‘What is the effect of the COVID-19 pandemic on learning progress amongst school-age children?’, and we address this question using evidence from studies examining changes in learning outcomes during the pandemic. Our second pre-registered research aim was ‘To examine whether the effect of the COVID-19 pandemic on learning differs across different social background groups, age groups, boys and girls, learning areas or subjects, national contexts’.

We contribute to the existing research in two ways. First, we describe and appraise the up-to-date body of evidence, including its geographic reach and quality. More specifically, we ask the following questions: (1) what is the state of the evidence, in terms of the available peer-reviewed research and grey literature, on learning progress of school-aged children during the COVID-19 pandemic?, (2) which countries are represented in the available evidence? and (3) what is the quality of the existing evidence?

Our second contribution is to harmonize, synthesize and meta-analyse the existing evidence, with special attention to variation across different subpopulations and country contexts. On the basis of the identified studies, we ask (4) to what extent has the learning progress of school-aged children changed since the onset of the pandemic?, (5) how has the magnitude of the learning deficit (if any) evolved since the beginning of the pandemic?, (6) to what extent has the pandemic reinforced inequalities between children from different socio-economic backgrounds?, (7) are there differences in the magnitude of learning deficits between subject domains (maths and reading) and between age groups (primary and secondary students)? and (8) to what extent does the magnitude of learning deficits vary across national contexts?

Below, we report our answers to each of these questions in turn. The questions correspond to the analysis plan set out in our pre-registered protocol ( https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021249944 ), but we have adjusted the order and wording to aid readability. We had planned to examine gender differences in learning progress during the pandemic, but found there to be insufficient evidence to conduct this subgroup analysis, as the large majority of the identified studies do not provide evidence on learning deficits separately by gender. We also planned to examine how the magnitude of learning deficits differs across groups of students with varying exposures to school closures. This was not possible as the available data on school closures lack sufficient depth with respect to variation of school closures within countries, across grade levels and with respect to different modes of instruction, to meaningfully examine this association.

The state of the evidence

Our systematic review identified 42 studies on learning progress during the COVID-19 pandemic that met our inclusion criteria. To be included in our systematic review and meta-analysis, studies had to use a measure of learning that can be standardized (using Cohen’s d ) and base their estimates on empirical data collected since the onset of the COVID-19 pandemic (rather than making projections based on pre-COVID-19 data). As shown in Fig. 1 , the initial literature search resulted in 5,153 hits after removal of duplicates. All studies were double screened by the first two authors. The formal database search process identified 15 eligible studies. We also hand searched relevant preprint repositories and policy databases. Further, to ensure that our study selection was as up to date as possible, we conducted two full forward and backward citation searches of all included studies on 15 February 2022, and on 8 August 2022. The citation and preprint hand searches allowed us to identify 27 additional eligible studies, resulting in a total of 42 studies. Most of these studies were published after the initial database search, which illustrates that the body of evidence continues to expand. Most studies provide multiple estimates of COVID-19 learning deficits, separately for maths and reading and for different school grades. The number of estimates ( n  = 291) is therefore larger than the number of included studies ( n  = 42).

figure 1

Flow diagram of the study identification and selection process, following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

The geographic reach of evidence is limited

Table 1 presents all included studies and estimates of COVID-19 learning deficits (in brackets), grouped by the 15 countries represented: Australia, Belgium, Brazil, Colombia, Denmark, Germany, Italy, Mexico, the Netherlands, South Africa, Spain, Sweden, Switzerland, the UK and the United States. About half of the estimates ( n  = 149) are from the United States, 58 are from the UK, a further 70 are from other European countries and the remaining 14 estimates are from Australia, Brazil, Colombia, Mexico and South Africa. As this list shows, there is a strong over-representation of studies from high-income countries, a dearth of studies from middle-income countries and no studies from low-income countries. This skewed representation should be kept in mind when interpreting our synthesis of the existing evidence on COVID-19 learning deficits.

The quality of evidence is mixed

We assessed the quality of the evidence using an adapted version of the Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool 9 . More specifically, we analysed the risk of bias of each estimate from confounding, sample selection, classification of treatments, missing data, the measurement of outcomes and the selection of reported results. A.M.B.-M. and B.A.B. performed the risk-of-bias assessments, which were independently checked by the respective other author. We then assigned each study an overall risk-of-bias rating (low, moderate, serious or critical) based on the estimate and domain with the highest risk of bias.

Figure 2a shows the distribution of all studies of COVID-19 learning deficits according to their risk-of-bias rating separately for each domain (top six rows), as well as the distribution of studies according to their overall risk of bias rating (bottom row). The overall risk of bias was considered ‘low’ for 15% of studies, ‘moderate’ for 30% of studies, ‘serious’ for 25% of studies and ‘critical’ for 30% of studies.

figure 2

a , Domain-specific and overall distribution of studies of COVID-19 learning deficits by risk of bias rating using ROBINS-I, including studies rated to be at critical risk of bias ( n  = 19 out of a total of n  = 61 studies shown in this figure). In line with ROBINS-I guidance, studies rated to be at critical risk of bias were excluded from all analyses and other figures in this article and in the Supplementary Information (including b ). b , z curve: distribution of the z scores of all estimates included in the meta-analysis ( n  = 291) to test for publication bias. The dotted line indicates z  = 1.96 ( P  = 0.050), the conventional threshold for statistical significance. The overlaid curve shows a normal distribution. The absence of a spike in the distribution of the z scores just above the threshold for statistical significance and the absence of a slump just below it indicate the absence of evidence for publication bias.

In line with ROBINS-I guidance, we excluded studies rated to be at critical risk of bias ( n  = 19) from all of our analyses and figures, except for Fig. 2a , which visualizes the distribution of studies according to their risk of bias 9 . These are thus not part of the 42 studies included in our meta-analysis. Supplementary Table 2 provides an overview of these studies as well as the main potential sources of risk of bias. Moreover, in Supplementary Figs. 3 – 6 , we replicate all our results excluding studies deemed to be at serious risk of bias.

As shown in Fig. 2a , common sources of potential bias were confounding, sample selection and missing data. Studies rated at risk of confounding typically compared only two timepoints, without accounting for longer time trends in learning progress. The main causes of selection bias were the use of convenience samples and insufficient consideration of self-selection by schools or students. Several studies found evidence of selection bias, often with students from a low socio-economic background or schools in deprived areas being under-represented after (as compared with before) the pandemic, but this was not always adjusted for. Some studies also reported a higher amount of missing data post-pandemic, again generally without adjustment, and several studies did not report any information on missing data. For an overview of the risk-of-bias ratings for each domain of each study, see Supplementary Fig. 1 and Supplementary Tables 1 and 2 .

No evidence of publication bias

Publication bias can occur if authors self-censor to conform to theoretical expectations, or if journals favour statistically significant results. To mitigate this concern, we include not only published papers, but also preprints, working papers and policy reports.

Moreover, Fig. 2b tests for publication bias by showing the distribution of z -statistics for the effect size estimates of all identified studies. The dotted line indicates z  = 1.96 ( P  = 0.050), the conventional threshold for statistical significance. The overlaid curve shows a normal distribution. If there was publication bias, we would expect a spike just above the threshold, and a slump just below it. There is no indication of this. Moreover, we do not find a left-skewed distribution of P values (see P curve in Supplementary Fig. 2a ), or an association between estimates of learning deficits and their standard errors (see funnel plot in Supplementary Fig. 2b ) that would suggest publication bias. Publication bias thus does not appear to be a major concern.

Having assessed the quality of the existing evidence, we now present the substantive results of our meta-analysis, focusing on the magnitude of COVID-19 learning deficits and on the variation in learning deficits over time, across different groups of students, and across country contexts.

Learning progress slowed substantially during the pandemic

Figure 3 shows the effect sizes that we extracted from each study (averaged across grades and learning subject) as well as the pooled effect size (red diamond). Effects are expressed in standard deviations, using Cohen’s d . Estimates are pooled using inverse variance weights. The pooled effect size across all studies is d  = −0.14, t (41) = −7.30, two-tailed P  = 0.000, 95% confidence interval (CI) −0.17 to −0.10. Under normal circumstances, students generally improve their performance by around 0.4 standard deviations per school year 10 , 11 , 12 . Thus, the overall effect of d  = −0.14 suggests that students lost out on 0.14/0.4, or about 35%, of a school year’s worth of learning. On average, the learning progress of school-aged children has slowed substantially during the pandemic.

figure 3

Effect sizes are expressed in standard deviations, using Cohen’s d , with 95% CI, and are sorted by magnitude.

Learning deficits arose early in the pandemic and persist

One may expect that children were able to recover learning that was lost early in the pandemic, after teachers and families had time to adjust to the new learning conditions and after structures for online learning and for recovering early learning deficits were set up. However, existing research on teacher strikes in Belgium 13 and Argentina 14 , shortened school years in Germany 15 and disruptions to education during World War II 16 suggests that learning deficits are difficult to compensate and tend to persist in the long run.

Figure 4 plots the magnitude of estimated learning deficits (on the vertical axis) by the date of measurement (on the horizontal axis). The colour of the circles reflects the relevant country, the size of the circles indicates the sample size for a given estimate and the line displays a linear trend. The figure suggests that learning deficits opened up early in the pandemic and have neither closed nor substantially widened since then. We find no evidence that the slope coefficient is different from zero ( β months  = −0.00, t (41) = −7.30, two-tailed P  = 0.097, 95% CI −0.01 to 0.00). This implies that efforts by children, parents, teachers and policy-makers to adjust to the changed circumstance have been successful in preventing further learning deficits but so far have been unable to reverse them. As shown in Supplementary Fig. 8 , the pattern of persistent learning deficits also emerges within each of the three countries for which we have a relatively large number of estimates at different timepoints: the United States, the UK and the Netherlands. However, it is important to note that estimates of learning deficits are based on distinct samples of students. Future research should continue to follow the learning progress of cohorts of students in different countries to reveal how learning deficits of these cohorts have developed and continue to develop since the onset of the pandemic.

figure 4

The horizontal axis displays the date on which learning progress was measured. The vertical axis displays estimated learning deficits, expressed in standard deviation (s.d.) using Cohen’s d . The colour of the circles reflects the respective country, the size of the circles indicates the sample size for a given estimate and the line displays a linear trend with a 95% CI. The trend line is estimated as a linear regression using ordinary least squares, with standard errors clustered at the study level ( n  = 42 clusters). β months  = −0.00, t (41) = −7.30, two-tailed P  = 0.097, 95% CI −0.01 to 0.00.

Socio-economic inequality in education increased

Existing research on the development of learning gaps during summer vacations 17 , 18 , disruptions to schooling during the Ebola outbreak in Sierra Leone and Guinea 19 , and the 2005 earthquake in Pakistan 20 shows that the suspension of face-to-face teaching can increase educational inequality between children from different socio-economic backgrounds. Learning deficits during the COVID-19 pandemic are likely to have been particularly pronounced for children from low socio-economic backgrounds. These children have been more affected by school closures than children from more advantaged backgrounds 21 . Moreover, they are likely to be disadvantaged with respect to their access and ability to use digital learning technology, the quality of their home learning environment, the learning support they receive from teachers and parents, and their ability to study autonomously 22 , 23 , 24 .

Most studies we identify examine changes in socio-economic inequality during the pandemic, attesting to the importance of the issue. As studies use different measures of socio-economic background (for example, parental income, parental education, free school meal eligibility or neighbourhood disadvantage), pooling the estimates is not possible. Instead, we code all estimates according to whether they indicate a reduction, no change or an increase in learning inequality during the pandemic. Figure 5 displays this information. Estimates that indicate an increase in inequality are shown on the right, those that indicate a decrease on the left and those that suggest no change in the middle. Squares represent estimates of changes in inequality during the pandemic in reading performance, and circles represent estimates of changes in inequality in maths performance. The shading represents when in the pandemic educational inequality was measured, differentiating between the first, second and third year of the pandemic. Estimates are also arranged horizontally by grade level. A large majority of estimates indicate an increase in educational inequality between children from different socio-economic backgrounds. This holds for both maths and reading, across primary and secondary education, at each stage of the pandemic, and independently of how socio-economic background is measured.

figure 5

Each circle/square refers to one estimate of over-time change in inequality in maths/reading performance ( n  = 211). Estimates that find a decrease/no change/increase in inequality are grouped on the left/middle/right. Within these categories, estimates are ordered horizontally by school grade. The shading indicates when in the pandemic a given measure was taken.

Learning deficits are larger in maths than in reading

Available research on summer learning deficits 17 , 25 , student absenteeism 26 , 27 and extreme weather events 28 suggests that learning progress in mathematics is more dependent on formal instruction than in reading. This might be due to parents being better equipped to help their children with reading, and children advancing their reading skills (but not their maths skills) when reading for enjoyment outside of school. Figure 6a shows that, similarly to earlier disruptions to learning, the estimated learning deficits during the COVID-19 pandemic are larger for maths than for reading (mean difference δ  = −0.07, t (41) = −4.02, two-tailed P  = 0.000, 95% CI −0.11 to −0.04). This difference is statistically significant and robust to dropping estimates from individual countries (Supplementary Fig. 9 ).

figure 6

Each plot shows the distribution of COVID-19 learning deficit estimates for the respective subgroup, with the box marking the interquartile range and the white circle denoting the median. Whiskers mark upper and lower adjacent values: the furthest observation within 1.5 interquartile range of either side of the box. a , Learning subject (reading versus maths). Median: reading −0.09, maths −0.18. Interquartile range: reading −0.15 to −0.02, maths −0.23 to −0.09. b , Level of education (primary versus secondary). Median: primary −0.12, secondary −0.12. Interquartile range: primary −0.19 to −0.05, secondary −0.21 to −0.06. c , Country income level (high versus middle). Median: high −0.12, middle −0.37. Interquartile range: high −0.20 to −0.05, middle −0.65 to −0.30.

No evidence of variation across grade levels

One may expect learning deficits to be smaller for older than for younger children, as older children may be more autonomous in their learning and better able to cope with a sudden change in their learning environment. However, older students were subject to longer school closures in some countries, such as Denmark 29 , based partly on the assumption that they would be better able to learn from home. This may have offset any advantage that older children would otherwise have had in learning remotely.

Figure 6b shows the distribution of estimates of learning deficits for students at the primary and secondary level, respectively. Our analysis yields no evidence of variation in learning deficits across grade levels (mean difference δ  = −0.01, t (41) = −0.59, two-tailed P  = 0.556, 95% CI −0.06 to 0.03). Due to the limited number of available estimates of learning deficits, we cannot be certain about whether learning deficits differ between primary and secondary students or not.

Learning deficits are larger in poorer countries

Low- and middle-income countries were already struggling with a learning crisis before the pandemic. Despite large expansions of the proportion of children in school, children in low- and middle-income countries still perform poorly by international standards, and inequality in learning remains high 30 , 31 , 32 . The pandemic is likely to deepen this learning crisis and to undo past progress. Schools in low- and middle-income countries have not only been closed for longer, but have also had fewer resources to facilitate remote learning 33 , 34 . Moreover, the economic resources, availability of digital learning equipment and ability of children, parents, teachers and governments to support learning from home are likely to be lower in low- and middle-income countries 35 .

As discussed above, most evidence on COVID-19 learning deficits comes from high-income countries. We found no studies on low-income countries that met our inclusion criteria, and evidence from middle-income countries is limited to Brazil, Colombia, Mexico and South Africa. Figure 6c groups the estimates of COVID-19 learning deficits in these four middle-income countries together (on the right) and compares them with estimates from high-income countries (on the left). The learning deficit is appreciably larger in middle-income countries than in high-income countries (mean difference δ  = −0.29, t (41) = −2.78, two-tailed P  = 0.008, 95% CI −0.50 to −0.08). In fact, the three largest estimates of learning deficits in our sample are from middle-income countries (Fig. 3 ) 36 , 37 , 38 .

Two years since the COVID-19 pandemic, there is a growing number of studies examining the learning progress of school-aged children during the pandemic. This paper first systematically reviews the existing literature on learning progress of school-aged children during the pandemic and appraises its geographic reach and quality. Second, it harmonizes, synthesizes and meta-analyses the existing evidence to examine the extent to which learning progress has changed since the onset of the pandemic, and how it varies across different groups of students and across country contexts.

Our meta-analysis suggests that learning progress has slowed substantially during the COVID-19 pandemic. The pooled effect size of d  = −0.14, implies that students lost out on about 35% of a normal school year’s worth of learning. This confirms initial concerns that substantial learning deficits would arise during the pandemic 10 , 39 , 40 . But our results also suggest that fears of an accumulation of learning deficits as the pandemic continues have not materialized 41 , 42 . On average, learning deficits emerged early in the pandemic and have neither closed nor widened substantially. Future research should continue to follow the learning progress of cohorts of students in different countries to reveal how learning deficits of these cohorts have developed and continue to develop since the onset of the pandemic.

Most studies that we identify find that learning deficits have been largest for children from disadvantaged socio-economic backgrounds. This holds across different timepoints during the pandemic, countries, grade levels and learning subjects, and independently of how socio-economic background is measured. It suggests that the pandemic has exacerbated educational inequalities between children from different socio-economic backgrounds, which were already large before the pandemic 43 , 44 . Policy initiatives to compensate learning deficits need to prioritize support for children from low socio-economic backgrounds in order to allow them to recover the learning they lost during the pandemic.

There is a need for future research to assess how the COVID-19 pandemic has affected gender inequality in education. So far, there is very little evidence on this issue. The large majority of the studies that we identify do not examine learning deficits separately by gender.

Comparing estimates of learning deficits across subjects, we find that learning deficits tend to be larger in maths than in reading. As noted above, this may be due to the fact that parents and children have been in a better position to compensate school-based learning in reading by reading at home. Accordingly, there are grounds for policy initiatives to prioritize the compensation of learning deficits in maths and other science subjects.

A limitation of this study and the existing body of evidence on learning progress during the COVID-19 pandemic is that the existing studies primarily focus on high-income countries, while there is a dearth of evidence from low- and middle-income countries. This is particularly concerning because the small number of existing studies from middle-income countries suggest that learning deficits have been particularly severe in these countries. Learning deficits are likely to be even larger in low-income countries, considering that these countries already faced a learning crisis before the pandemic, generally implemented longer school closures, and were under-resourced and ill-equipped to facilitate remote learning 32 , 33 , 34 , 35 , 45 . It is critical that this evidence gap on low- and middle-income countries is addressed swiftly, and that the infrastructure to collect and share data on educational performance in middle- and low-income countries is strengthened. Collecting and making available these data is a key prerequisite for fully understanding how learning progress and related outcomes have changed since the onset of the pandemic 46 .

A further limitation is that about half of the studies that we identify are rated as having a serious or critical risk of bias. We seek to limit the risk of bias in our results by excluding all studies rated to be at critical risk of bias from all of our analyses. Moreover, in Supplementary Figs. 3 – 6 , we show that our results are robust to further excluding studies deemed to be at serious risk of bias. Future studies should minimize risk of bias in estimating learning deficits by employing research designs that appropriately account for common sources of bias. These include a lack of accounting for secular time trends, non-representative samples and imbalances between treatment and comparison groups.

The persistence of learning deficits two and a half years into the pandemic highlights the need for well-designed, well-resourced and decisive policy initiatives to recover learning deficits. Policy-makers, schools and families will need to identify and realize opportunities to complement and expand on regular school-based learning. Experimental evidence from low- and middle-income countries suggests that even relatively low-tech and low-cost learning interventions can have substantial, positive effects on students’ learning progress in the context of remote learning. For example, sending SMS messages with numeracy problems accompanied by short phone calls was found to lead to substantial learning gains in numeracy in Botswana 47 . Sending motivational text messages successfully limited learning losses in maths and Portuguese in Brazil 48 .

More evidence is needed to assess the effectiveness of other interventions for limiting or recovering learning deficits. Potential avenues include the use of the often extensive summer holidays to offer summer schools and learning camps, extending school days and school weeks, and organizing and scaling up tutoring programmes. Further potential lies in developing, advertising and providing access to learning apps, online learning platforms or educational TV programmes that are free at the point of use. Many countries have already begun investing substantial resources to capitalize on some of these opportunities. If these interventions prove effective, and if the momentum of existing policy efforts is maintained and expanded, the disruptions to learning during the pandemic may be a window of opportunity to improve the education afforded to children.

Eligibility criteria

We consider all types of primary research, including peer-reviewed publications, preprints, working papers and reports, for inclusion. To be eligible for inclusion, studies have to measure learning progress using test scores that can be standardized across studies using Cohen’s d . Moreover, studies have to be in English, Danish, Dutch, French, German, Norwegian, Spanish or Swedish.

Search strategy and study identification

We identified relevant studies using the following steps. First, we developed a Boolean search string defining the population (school-aged children), exposure (the COVID-19 pandemic) and outcomes of interest (learning progress). The full search string can be found in Section 1.1 of Supplementary Information . Second, we used this string to search the following academic databases: Coronavirus Research Database, the Education Resources Information Centre, International Bibliography of the Social Sciences, Politics Collection (PAIS index, policy file index, political science database and worldwide political science abstracts), Social Science Database, Sociology Collection (applied social science index and abstracts, sociological abstracts and sociology database), Cumulative Index to Nursing and Allied Health Literature, and Web of Science. Second, we hand-searched multiple preprint and working paper repositories (Social Science Research Network, Munich Personal RePEc Archive, IZA, National Bureau of Economic Research, OSF Preprints, PsyArXiv, SocArXiv and EdArXiv) and relevant policy websites, including the websites of the Organization for Economic Co-operation and Development, the United Nations, the World Bank and the Education Endowment Foundation. Third, we periodically posted our protocol via Twitter in order to crowdsource additional relevant studies not identified through the search. All titles and abstracts identified in our search were double-screened using the Rayyan online application 49 . Our initial search was conducted on 27 April 2021, and we conducted two forward and backward citation searches of all eligible studies identified in the above steps, on 14 February 2022, and on 8 August 2022, to ensure that our analysis includes recent relevant research.

Data extraction

From the studies that meet our inclusion criteria we extracted all estimates of learning deficits during the pandemic, separately for maths and reading and for different school grades. We also extracted the corresponding sample size, standard error, date(s) of measurement, author name(s) and country. Last, we recorded whether studies differentiate between children’s socio-economic background, which measure is used to this end and whether studies find an increase, decrease or no change in learning inequality. We contacted study authors if any of the above information was missing in the study. Data extraction was performed by B.A.B. and validated independently by A.M.B.-M., with discrepancies resolved through discussion and by conferring with P.E.

Measurement and standardizationr

We standardize all estimates of learning deficits during the pandemic using Cohen’s d , which expresses effect sizes in terms of standard deviations. Cohen’s d is calculated as the difference in the mean learning gain in a given subject (maths or reading) over two comparable periods before and after the onset of the pandemic, divided by the pooled standard deviation of learning progress in this subject:

Effect sizes expressed as β coefficients are converted to Cohen’s d :

We use a binary indicator for whether the study outcome is maths or reading. One study does not differentiate the outcome but includes a composite of maths and reading scores 50 .

Level of education

We distinguish between primary and secondary education. We first consulted the original studies for this information. Where this was not stated in a given study, students’ age was used in conjunction with information about education systems from external sources to determine the level of education 51 .

Country income level

We follow the World Bank’s classification of countries into four income groups: low, lower-middle, upper-middle and high income. Four countries in our sample are in the upper-middle-income group: Brazil, Colombia, Mexico and South Africa. All other countries are in the high-income group.

Data synthesis

We synthesize our data using three synthesis techniques. First, we generate a forest plot, based on all available estimates of learning progress during the pandemic. We pool estimates using a random-effects restricted maximum likelihood model and inverse variance weights to calculate an overall effect size (Fig. 3 ) 52 . Second, we code all estimates of changes in educational inequality between children from different socio-economic backgrounds during the pandemic, according to whether they indicate an increase, a decrease or no change in educational inequality. We visualize the resulting distribution using a harvest plot (Fig. 5 ) 53 . Third, given that the limited amount of available evidence precludes multivariate or causal analyses, we examine the bivariate association between COVID-19 learning deficits and the months in which learning was measured using a scatter plot (Fig. 4 ), and the bivariate association between COVID-19 learning deficits and subject, grade level and countries’ income level, using a series of violin plots (Fig. 6 ). The reported estimates, CIs and statistical significance tests of these bivariate associations are based on common-effects models with standard errors clustered by study, and two-sided tests. With respect to statistical tests reported, the data distribution was assumed to be normal, but this was not formally tested. The distribution of estimates of learning deficits is shown separately for the different moderator categories in Fig. 6 .

Pre-registration

We prospectively registered a protocol of our systematic review and meta-analysis in the International Prospective Register of Systematic Reviews (CRD42021249944) on 19 April 2021 ( https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021249944 ).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data used in the analyses for this manuscript were compiled by the authors based on the studies identified in the systematic review. The data are available on the Open Science Framework repository ( https://doi.org/10.17605/osf.io/u8gaz ). For our systematic review, we searched the following databases: Coronavirus Research Database ( https://proquest.libguides.com/covid19 ), Education Resources Information Centre database ( https://eric.ed.gov ), International Bibliography of the Social Sciences ( https://about.proquest.com/en/products-services/ibss-set-c/ ), Politics Collection ( https://about.proquest.com/en/products-services/ProQuest-Politics-Collection/ ), Social Science Database ( https://about.proquest.com/en/products-services/pq_social_science/ ), Sociology Collection ( https://about.proquest.com/en/products-services/ProQuest-Sociology-Collection/ ), Cumulative Index to Nursing and Allied Health Literature ( https://www.ebsco.com/products/research-databases/cinahl-database ) and Web of Science ( https://clarivate.com/webofsciencegroup/solutions/web-of-science/ ). We also searched the following preprint and working paper repositories: Social Science Research Network ( https://papers.ssrn.com/sol3/DisplayJournalBrowse.cfm ), Munich Personal RePEc Archive ( https://mpra.ub.uni-muenchen.de ), IZA ( https://www.iza.org/content/publications ), National Bureau of Economic Research ( https://www.nber.org/papers?page=1&perPage=50&sortBy=public_date ), OSF Preprints ( https://osf.io/preprints/ ), PsyArXiv ( https://psyarxiv.com ), SocArXiv ( https://osf.io/preprints/socarxiv ) and EdArXiv ( https://edarxiv.org ).

Code availability

All code needed to replicate our findings is available on the Open Science Framework repository ( https://doi.org/10.17605/osf.io/u8gaz ).

The Impact of COVID-19 on Children. UN Policy Briefs (United Nations, 2020).

Donnelly, R. & Patrinos, H. A. Learning loss during Covid-19: An early systematic review. Prospects (Paris) 51 , 601–609 (2022).

Hammerstein, S., König, C., Dreisörner, T. & Frey, A. Effects of COVID-19-related school closures on student achievement: a systematic review. Front. Psychol. https://doi.org/10.3389/fpsyg.2021.746289 (2021).

Panagouli, E. et al. School performance among children and adolescents during COVID-19 pandemic: a systematic review. Children 8 , 1134 (2021).

Article   PubMed   PubMed Central   Google Scholar  

Patrinos, H. A., Vegas, E. & Carter-Rau, R. An Analysis of COVID-19 Student Learning Loss (World Bank, 2022).

Zierer, K. Effects of pandemic-related school closures on pupils’ performance and learning in selected countries: a rapid review. Educ. Sci. 11 , 252 (2021).

Article   Google Scholar  

König, C. & Frey, A. The impact of COVID-19-related school closures on student achievement: a meta-analysis. Educ. Meas. Issues Pract. 41 , 16–22 (2022).

Storey, N. & Zhang, Q. A meta-analysis of COVID learning loss. Preprint at EdArXiv (2021).

Sterne, J. A. et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ https://doi.org/10.1136/bmj.i4919 (2016).

Azevedo, J. P., Hasan, A., Goldemberg, D., Iqbal, S. A. & Geven, K. Simulating the Potential Impacts of COVID-19 School Closures on Schooling and Learning Outcomes: A Set of Global Estimates (World Bank, 2020).

Bloom, H. S., Hill, C. J., Black, A. R. & Lipsey, M. W. Performance trajectories and performance gaps as achievement effect-size benchmarks for educational interventions. J. Res. Educ. Effectiveness 1 , 289–328 (2008).

Hill, C. J., Bloom, H. S., Black, A. R. & Lipsey, M. W. Empirical benchmarks for interpreting effect sizes in research. Child Dev. Perspect. 2 , 172–177 (2008).

Belot, M. & Webbink, D. Do teacher strikes harm educational attainment of students? Labour 24 , 391–406 (2010).

Jaume, D. & Willén, A. The long-run effects of teacher strikes: evidence from Argentina. J. Labor Econ. 37 , 1097–1139 (2019).

Cygan-Rehm, K. Are there no wage returns to compulsory schooling in Germany? A reassessment. J. Appl. Econ. 37 , 218–223 (2022).

Ichino, A. & Winter-Ebmer, R. The long-run educational cost of World War II. J. Labor Econ. 22 , 57–87 (2004).

Cooper, H., Nye, B., Charlton, K., Lindsay, J. & Greathouse, S. The effects of summer vacation on achievement test scores: a narrative and meta-analytic review. Rev. Educ. Res. 66 , 227–268 (1996).

Allington, R. L. et al. Addressing summer reading setback among economically disadvantaged elementary students. Read. Psychol. 31 , 411–427 (2010).

Smith, W. C. Consequences of school closure on access to education: lessons from the 2013–2016 Ebola pandemic. Int. Rev. Educ. 67 , 53–78 (2021).

Andrabi, T., Daniels, B. & Das, J. Human capital accumulation and disasters: evidence from the Pakistan earthquake of 2005. J. Hum. Resour . https://doi.org/10.35489/BSG-RISE-WP_2020/039 (2021).

Parolin, Z. & Lee, E. K. Large socio-economic, geographic and demographic disparities exist in exposure to school closures. Nat. Hum. Behav. 5 , 522–528 (2021).

Goudeau, S., Sanrey, C., Stanczak, A., Manstead, A. & Darnon, C. Why lockdown and distance learning during the COVID-19 pandemic are likely to increase the social class achievement gap. Nat. Hum. Behav. 5 , 1273–1281 (2021).

Article   PubMed   Google Scholar  

Bailey, D. H., Duncan, G. J., Murnane, R. J. & Au Yeung, N. Achievement gaps in the wake of COVID-19. Educ. Researcher 50 , 266–275 (2021).

van de Werfhorst, H. G. Inequality in learning is a major concern after school closures. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.2105243118 (2021).

Alexander, K. L., Entwisle, D. R. & Olson, L. S. Schools, achievement, and inequality: a seasonal perspective. Educ. Eval. Policy Anal. 23 , 171–191 (2001).

Aucejo, E. M. & Romano, T. F. Assessing the effect of school days and absences on test score performance. Econ. Educ. Rev. 55 , 70–87 (2016).

Gottfried, M. A. The detrimental effects of missing school: evidence from urban siblings. Am. J. Educ. 117 , 147–182 (2011).

Goodman, J. Flaking Out: Student Absences and Snow Days as Disruptions of Instructional Time (National Bureau of Economic Research, 2014).

Birkelund, J. F. & Karlson, K. B. No evidence of a major learning slide 14 months into the COVID-19 pandemic in Denmark. European Societies https://doi.org/10.1080/14616696.2022.2129085 (2022).

Angrist, N., Djankov, S., Goldberg, P. K. & Patrinos, H. A. Measuring human capital using global learning data. Nature 592 , 403–408 (2021).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Torche, F. in Social Mobility in Developing Countries: Concepts, Methods, and Determinants (eds Iversen, V., Krishna, A. & Sen, K.) 139–171 (Oxford Univ. Press, 2021).

World Development Report 2018: Learning to Realize Education’s Promise (World Bank, 2018).

Policy Brief: Education during COVID-19 and Beyond (United Nations, 2020).

One Year into COVID-19 Education Disruption: Where Do We Stand? (UNESCO, 2021).

Azevedo, J. P., Hasan, A., Goldemberg, D., Geven, K. & Iqbal, S. A. Simulating the potential impacts of COVID-19 school closures on schooling and learning outcomes: a set of global estimates. World Bank Res. Observer 36 , 1–40 (2021).

Google Scholar  

Ardington, C., Wills, G. & Kotze, J. COVID-19 learning losses: early grade reading in South Africa. Int. J. Educ. Dev. 86 , 102480 (2021).

Hevia, F. J., Vergara-Lope, S., Velásquez-Durán, A. & Calderón, D. Estimation of the fundamental learning loss and learning poverty related to COVID-19 pandemic in Mexico. Int. J. Educ. Dev. 88 , 102515 (2022).

Lichand, G., Doria, C. A., Leal-Neto, O. & Fernandes, J. P. C. The impacts of remote learning in secondary education during the pandemic in Brazil. Nat. Hum. Behav. 6 , 1079–1086 (2022).

Major, L. E., Eyles, A., Machin, S. et al. Learning Loss since Lockdown: Variation across the Home Nations (Centre for Economic Performance, London School of Economics and Political Science, 2021).

Di Pietro, G., Biagi, F., Costa, P., Karpinski, Z. & Mazza, J. The Likely Impact of COVID-19 on Education: Reflections Based on the Existing Literature and Recent International Datasets (Publications Office of the European Union, 2020).

Fuchs-Schündeln, N., Krueger, D., Ludwig, A. & Popova, I. The Long-Term Distributional and Welfare Effects of COVID-19 School Closures (National Bureau of Economic Research, 2020).

Kaffenberger, M. Modelling the long-run learning impact of the COVID-19 learning shock: actions to (more than) mitigate loss. Int. J. Educ. Dev. 81 , 102326 (2021).

Attewell, P. & Newman, K. S. Growing Gaps: Educational Inequality around the World (Oxford Univ. Press, 2010).

Betthäuser, B. A., Kaiser, C. & Trinh, N. A. Regional variation in inequality of educational opportunity across europe. Socius https://doi.org/10.1177/23780231211019890 (2021).

Angrist, N. et al. Building back better to avert a learning catastrophe: estimating learning loss from covid-19 school shutdowns in africa and facilitating short-term and long-term learning recovery. Int. J. Educ. Dev. 84 , 102397 (2021).

Conley, D. & Johnson, T. Opinion: Past is future for the era of COVID-19 research in the social sciences. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.2104155118 (2021).

Angrist, N., Bergman, P. & Matsheng, M. Experimental evidence on learning using low-tech when school is out. Nat. Hum. Behav. 6 , 941–950 (2022).

Lichand, G., Christen, J. & van Egeraat, E. Do Behavioral Nudges Work under Remote Learning? Evidence from Brazil during the Pandemic (Univ. Zurich, 2022).

Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan—a web and mobile app for systematic reviews. Syst. Rev. 5 , 1–10 (2016).

Tomasik, M. J., Helbling, L. A. & Moser, U. Educational gains of in-person vs. distance learning in primary and secondary schools: a natural experiment during the COVID-19 pandemic school closures in Switzerland. Int. J. Psychol. 56 , 566–576 (2021).

Eurybase: The Information Database on Education Systems in Europe (Eurydice, 2021).

Borenstein, M., Hedges, L. V., Higgins, J. P. & Rothstein, H. R. A basic introduction to fixed-effect and random-effects models for meta-analysis. Res. Synth. Methods 1 , 97–111 (2010).

Ogilvie, D. et al. The harvest plot: a method for synthesising evidence about the differential effects of interventions. BMC Med. Res. Methodol. 8 , 1–7 (2008).

Gore, J., Fray, L., Miller, A., Harris, J. & Taggart, W. The impact of COVID-19 on student learning in New South Wales primary schools: an empirical study. Aust. Educ. Res. 48 , 605–637 (2021).

Gambi, L. & De Witte, K. The Resiliency of School Outcomes after the COVID-19 Pandemic: Standardised Test Scores and Inequality One Year after Long Term School Closures (FEB Research Report Department of Economics, 2021).

Maldonado, J. E. & De Witte, K. The effect of school closures on standardised student test outcomes. Br. Educ. Res. J. 48 , 49–94 (2021).

Vegas, E. COVID-19’s Impact on Learning Losses and Learning Inequality in Colombia (Center for Universal Education at Brookings, 2022).

Depping, D., Lücken, M., Musekamp, F. & Thonke, F. in Schule während der Corona-Pandemie. Neue Ergebnisse und Überblick über ein dynamisches Forschungsfeld (eds Fickermann, D. & Edelstein, B.) 51–79 (Münster & New York: Waxmann, 2021).

Ludewig, U. et al. Die COVID-19 Pandemie und Lesekompetenz von Viertklässler*innen: Ergebnisse der IFS-Schulpanelstudie 2016–2021 (Institut für Schulentwicklungsforschung, Univ. Dortmund, 2022).

Schult, J., Mahler, N., Fauth, B. & Lindner, M. A. Did students learn less during the COVID-19 pandemic? Reading and mathematics competencies before and after the first pandemic wave. Sch. Eff. Sch. Improv. https://doi.org/10.1080/09243453.2022.2061014 (2022).

Schult, J., Mahler, N., Fauth, B. & Lindner, M. A. Long-term consequences of repeated school closures during the COVID-19 pandemic for reading and mathematics competencies. Front. Educ. https://doi.org/10.3389/feduc.2022.867316 (2022).

Bazoli, N., Marzadro, S., Schizzerotto, A. & Vergolini, L. Learning Loss and Students’ Social Origins during the COVID-19 Pandemic in Italy (FBK-IRVAPP Working Papers 3, 2022).

Borgonovi, F. & Ferrara, A. The effects of COVID-19 on inequalities in educational achievement in Italy. Preprint at SSRN https://doi.org/10.2139/ssrn.4171968 (2022).

Contini, D., Di Tommaso, M. L., Muratori, C., Piazzalunga, D. & Schiavon, L. Who lost the most? Mathematics achievement during the COVID-19 pandemic. BE J. Econ. Anal. Policy 22 , 399–408 (2022).

Engzell, P., Frey, A. & Verhagen, M. D. Learning loss due to school closures during the COVID-19 pandemic. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.2022376118 (2021).

Haelermans, C. Learning Growth and Inequality in Primary Education: Policy Lessons from the COVID-19 Crisis (The European Liberal Forum (ELF)-FORES, 2021).

Haelermans, C. et al. A Full Year COVID-19 Crisis with Interrupted Learning and Two School Closures: The Effects on Learning Growth and Inequality in Primary Education (Maastricht Univ., Research Centre for Education and the Labour Market (ROA), 2021).

Haelermans, C. et al. Sharp increase in inequality in education in times of the COVID-19-pandemic. PLoS ONE 17 , e0261114 (2022).

Schuurman, T. M., Henrichs, L. F., Schuurman, N. K., Polderdijk, S. & Hornstra, L. Learning loss in vulnerable student populations after the first COVID-19 school closure in the Netherlands. Scand. J. Educ. Res. https://doi.org/10.1080/00313831.2021.2006307 (2021).

Arenas, A. & Gortazar, L. Learning Loss One Year after School Closures (Esade Working Paper, 2022).

Hallin, A. E., Danielsson, H., Nordström, T. & Fälth, L. No learning loss in Sweden during the pandemic evidence from primary school reading assessments. Int. J. Educ. Res. 114 , 102011 (2022).

Blainey, K. & Hannay, T. The Impact of School Closures on Autumn 2020 Attainment (RS Assessment from Hodder Education and SchoolDash, 2021).

Blainey, K. & Hannay, T. The Impact of School Closures on Spring 2021 Attainment (RS Assessment from Hodder Education and SchoolDash, 2021).

Blainey, K. & Hannay, T. The Effects of Educational Disruption on Primary School Attainment in Summer 2021 (RS Assessment from Hodder Education and SchoolDash, 2021).

Understanding Progress in the 2020/21 Academic Year: Complete Findings from the Autumn Term (London: Department for Education, 2021).

Understanding Progress in the 2020/21 Academic Year: Initial Findings from the Spring Term (London: Department for Education, 2021).

Impact of COVID-19 on Attainment: Initial Analysis (Brentford: GL Assessment, 2021).

Rose, S. et al. Impact of School Closures and Subsequent Support Strategies on Attainment and Socio-emotional Wellbeing in Key Stage 1: Interim Paper 1 (National Foundation for Educational Research (NFER) and Education Endowment Foundation (EEF) , 2021).

Rose, S. et al. Impact of School Closures and Subsequent Support Strategies on Attainment and Socio-emotional Wellbeing in Key Stage 1: Interim Paper 2 (National Foundation for Educational Research (NFER) and Education Endowment Foundation (EEF), 2021).

Weidmann, B. et al. COVID-19 Disruptions: Attainment Gaps and Primary School Responses (Education Endowment Foundation, 2021).

Bielinski, J., Brown, R. & Wagner, K. No Longer a Prediction: What New Data Tell Us About the Effects of 2020 Learning Disruptions (Illuminate Education, 2021).

Domingue, B. W., Hough, H. J., Lang, D. & Yeatman, J. Changing Patterns of Growth in Oral Reading Fluency During the COVID-19 Pandemic. PACE Working Paper (Policy Analysis for California Education, 2021).

Domingue, B. et al. The effect of COVID on oral reading fluency during the 2020–2021 academic year. AERA Open https://doi.org/10.1177/23328584221120254 (2022).

Kogan, V. & Lavertu, S. The COVID-19 Pandemic and Student Achievement on Ohio’s Third-Grade English Language Arts Assessment (Ohio State Univ., 2021).

Kogan, V. & Lavertu, S. How the COVID-19 Pandemic Affected Student Learning in Ohio: Analysis of Spring 2021 Ohio State Tests (Ohio State Univ., 2021).

Kozakowski, W., Gill, B., Lavallee, P., Burnett, A. & Ladinsky, J. Changes in Academic Achievement in Pittsburgh Public Schools during Remote Instruction in the COVID-19 Pandemic (Institute of Education Sciences (IES), US Department of Education, 2020).

Kuhfeld, M. & Lewis, K. Student Achievement in 2021–2022: Cause for Hope and Continued Urgency (NWEA, 2022).

Lewis, K., Kuhfeld, M., Ruzek, E. & McEachin, A. Learning during COVID-19: Reading and Math Achievement in the 2020–21 School Year (NWEA, 2021).

Locke, V. N., Patarapichayatham, C. & Lewis, S. Learning Loss in Reading and Math in US Schools Due to the COVID-19 Pandemic (Istation, 2021).

Pier, L., Christian, M., Tymeson, H. & Meyer, R. H. COVID-19 Impacts on Student Learning: Evidence from Interim Assessments in California. PACE Working Paper (Policy Analysis for California Education, 2021).

Download references

Acknowledgements

Carlsberg Foundation grant CF19-0102 (A.M.B.-M.); Leverhulme Trust Large Centre Grant (P.E.), the Swedish Research Council for Health, Working Life and Welfare (FORTE) grant 2016-07099 (P.E.); the French National Research Agency (ANR) as part of the ‘Investissements d’Avenir’ programme LIEPP (ANR-11-LABX-0091 and ANR-11-IDEX-0005-02) and the Université Paris Cité IdEx (ANR-18-IDEX-0001) (P.E.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and affiliations.

Centre for Research on Social Inequalities (CRIS), Sciences Po, Paris, France

Bastian A. Betthäuser

Department of Social Policy and Intervention, University of Oxford, Oxford, UK

Bastian A. Betthäuser & Anders M. Bach-Mortensen

Nuffield College, University of Oxford, Oxford, UK

Bastian A. Betthäuser & Per Engzell

Social Research Institute, University College London, London, UK

Per Engzell

Swedish Institute for Social Research, Stockholm University, Stockholm, Sweden

You can also search for this author in PubMed   Google Scholar

Contributions

B.A.B., A.M.B.-M. and P.E. designed the study; B.A.B., A.M.B.-M. and P.E. planned and implemented the search and screened studies; B.A.B., A.M.B.-M. and P.E. extracted relevant data from studies; B.A.B., A.M.B.-M. and P.E. conducted the quality appraisal; B.A.B., A.M.B.-M. and P.E. conducted the data analysis and visualization; B.A.B., A.M.B.-M. and P.E. wrote the manuscript.

Corresponding author

Correspondence to Bastian A. Betthäuser .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Human Behaviour thanks Guilherme Lichand, Sébastien Goudeau and Christoph König for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Supplementary methods, results, figures, tables, PRISMA Checklist and references.

Reporting Summary

Peer review file, rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Betthäuser, B.A., Bach-Mortensen, A.M. & Engzell, P. A systematic review and meta-analysis of the evidence on learning during the COVID-19 pandemic. Nat Hum Behav 7 , 375–385 (2023). https://doi.org/10.1038/s41562-022-01506-4

Download citation

Received : 24 June 2022

Accepted : 30 November 2022

Published : 30 January 2023

Issue Date : March 2023

DOI : https://doi.org/10.1038/s41562-022-01506-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Investigating the effect of covid-19 disruption in education using reds data.

  • Alice Bertoletti
  • Zbigniew Karpiński

Large-scale Assessments in Education (2024)

A democratic curriculum for the challenges of post-truth

  • David Nally

Curriculum Perspectives (2024)

SGS: SqueezeNet-guided Gaussian-kernel SVM for COVID-19 Diagnosis

  • Fanfeng Shi
  • Vishnuvarthanan Govindaraj

Mobile Networks and Applications (2024)

Electrochemical determination of budesonide: a common corticosteroid used to treat respiratory diseases such as COVID-19 and asthma

  • Katarzyna Jedlińska
  • Katarzyna Trojanowska
  • Bogusław Baś

Journal of Applied Electrochemistry (2024)

“Follow the Science” in COVID-19 Policy: A Scoping Review

  • Jacob R. Greenmyer

HEC Forum (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

what is meta analysis in education research

  • Subject List
  • Take a Tour
  • For Authors
  • Subscriber Services
  • Publications
  • African American Studies
  • African Studies
  • American Literature
  • Anthropology
  • Architecture Planning and Preservation
  • Art History
  • Atlantic History
  • Biblical Studies
  • British and Irish Literature
  • Childhood Studies
  • Chinese Studies
  • Cinema and Media Studies
  • Communication
  • Criminology
  • Environmental Science
  • Evolutionary Biology
  • International Law
  • International Relations
  • Islamic Studies
  • Jewish Studies
  • Latin American Studies
  • Latino Studies
  • Linguistics
  • Literary and Critical Theory
  • Medieval Studies
  • Military History
  • Political Science
  • Public Health
  • Renaissance and Reformation
  • Social Work
  • Urban Studies
  • Victorian Literature
  • Browse All Subjects

How to Subscribe

  • Free Trials

In This Article Expand or collapse the "in this article" section Meta-Analysis and Research Synthesis in Education

Introduction, general overview.

  • Statistical Integration of Findings Across Studies
  • Research Integration Recognized as a Scholarly Endeavor
  • Meta-analysis in the 1980s
  • Comprehensive Books on Meta-analysis
  • Growing Popularity of Meta-analysis
  • Organizations and Software for Supporting Systematic Reviews
  • Critiques of Systematic Reviews
  • Synthesizing Qualitative Research
  • Methodologically Inclusive Discussions of Research Synthesis
  • Research Synthesis in Scholarly Journals

Related Articles Expand or collapse the "related articles" section about

About related articles close popup.

Lorem Ipsum Sit Dolor Amet

Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam ligula odio, euismod ut aliquam et, vestibulum nec risus. Nulla viverra, arcu et iaculis consequat, justo diam ornare tellus, semper ultrices tellus nunc eu tellus.

  • Educational Statistics for Longitudinal Research
  • Literature Reviews
  • Methodologies for Conducting Education Research
  • Mixed Methods Research
  • Statistical Assumptions

Other Subject Areas

Forthcoming articles expand or collapse the "forthcoming articles" section.

  • Gender, Power, and Politics in the Academy
  • Girls' Education in the Developing World
  • Non-Formal & Informal Environmental Education
  • Find more forthcoming articles...
  • Export Citations
  • Share This Facebook LinkedIn Twitter

Meta-Analysis and Research Synthesis in Education by Harsh Suri , John Hattie LAST REVIEWED: 29 July 2020 LAST MODIFIED: 24 July 2013 DOI: 10.1093/obo/9780199756810-0091

This bibliography begins with this introduction followed by a general overview of meta-analysis and research synthesis methods in education. The third section cites references to illustrate how the concept of statistical integration of research findings dates back to early 20th century. Citations in the fourth section highlight early calls from educational researchers to recognize the process of synthesizing research as a scholarly endeavor in its own right. As evident from the citations in the fifth section, it was in the 1980s when monographs exclusively devoted to research synthesis methods started to be published. Since then, a number of books have been published on research synthesis methods. The sixth section cites some of the most comprehensive books on research synthesis methods and includes contributions from key players of meta-analysis current at the time. The citations in the seventh section illustrate how meta-analysis has become very popular over time. Several large organizations have been set up and software developed to support systematic reviews of research. These are cited in the eighth section. Citations in the ninth section illustrate critiques of systematic reviews. As qualitative research is becoming popular in education, sophisticated discussions of issues associated with synthesizing qualitative research have also been published, some of which are cited in the tenth section. The eleventh section presents a methodologically inclusive account of current developments in research synthesis methods. The final section cites examples of journals exclusively devoted to publishing research reviews and exemplary research synthesis with different methodologies. Individual methods of research synthesis are discussed chronologically as they became popular in educational research. Accordingly, this bibliography starts with a discussion of statistical methods of integrating research that parallels the dominance of quantitative research in education until the 1970s. As the popularity and diversity of qualitative research methods have been increasing in educational research, more methodologically inclusive discussions of research synthesis methods are becoming popular as described in the later sections of this bibliography.

Suri and Clarke 2009 comprehensively maps how meta-analysis and research synthesis methods have advanced in education. As noted in most texts on meta-analysis (many of which are listed in later sections), the authors observe that meta-analysts have played an important role in formalizing the methodology of research synthesis. To facilitate comparisons across studies, meta-analysis has allowed the bringing together of many seemingly diverse findings from individual primary research studies onto a common metric called an effect size. After appropriate statistical adjustments of individual effect sizes according to their respective study features, a cumulative effect size can be calculated that is indicative of the direction and the magnitude of the overall trend of findings. This is followed by tests to identify substantive, methodological, or contextual study features that may be potential moderators of the effect size. Meta-analysts have systematized the entire process of research synthesis by identifying the main tasks in each phase of a research synthesis, highlighting critical decision points within each phase, and allowing discussion of the relative merits of different choices at each decision point. They advocate explicit statement and justification of the decisions made at each stage of the research synthesis from hypothesis formulation, data selection, evaluation, analysis, and interpretation to public presentation. There are several types of sensitivity analyses that can examine the dependence of the findings on the assumptions made about the nature of the data. Over the past three decades, meta-analysts have conducted numerous investigations to examine the robustness of their techniques and have explored ways of refining these techniques, as well as examining many substantive uses in the field of education. Meta-analysis now has become but one (important) method in integrative research synthesis. As a large proportion of primary research in education does not lend itself to a meta-analysis, there is a growing interest in methodologically inclusive discussions of quantitative and qualitative research synthesis in education. The field is now rich in methods, debates, and uses of research synthesis of primary literature. It is further continuing to become more diverse as more researchers are drawing insights from interpretive, participatory, and critical traditions to the process of synthesizing research.

Suri, Harsh, and David Clarke. 2009. Advancements in research synthesis methods: From a methodologically inclusive perspective. Review of Educational Research 79.1: 395–430.

DOI: 10.3102/0034654308326349

In addition to summarizing the key advancements in research synthesis methods from different methodological persuasions, the authors also identify suitable domains of applicability and present common critiques raised over these developments. They deliberately highlight the contributions that are less frequently cited in the dominant literature on meta-analysis. Available online for purchase or by subscription.

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login .

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here .

  • About Education »
  • Meet the Editorial Board »
  • Academic Achievement
  • Academic Audit for Universities
  • Academic Freedom and Tenure in the United States
  • Action Research in Education
  • Adjuncts in Higher Education in the United States
  • Administrator Preparation
  • Adolescence
  • Advanced Placement and International Baccalaureate Courses
  • Advocacy and Activism in Early Childhood
  • African American Racial Identity and Learning
  • Alaska Native Education
  • Alternative Certification Programs for Educators
  • Alternative Schools
  • American Indian Education
  • Animals in Environmental Education
  • Art Education
  • Artificial Intelligence and Learning
  • Assessing School Leader Effectiveness
  • Assessment, Behavioral
  • Assessment, Educational
  • Assessment in Early Childhood Education
  • Assistive Technology
  • Augmented Reality in Education
  • Beginning-Teacher Induction
  • Bilingual Education and Bilingualism
  • Black Undergraduate Women: Critical Race and Gender Perspe...
  • Blended Learning
  • Case Study in Education Research
  • Changing Professional and Academic Identities
  • Character Education
  • Children’s and Young Adult Literature
  • Children's Beliefs about Intelligence
  • Children's Rights in Early Childhood Education
  • Citizenship Education
  • Civic and Social Engagement of Higher Education
  • Classroom Learning Environments: Assessing and Investigati...
  • Classroom Management
  • Coherent Instructional Systems at the School and School Sy...
  • College Admissions in the United States
  • College Athletics in the United States
  • Community Relations
  • Comparative Education
  • Computer-Assisted Language Learning
  • Computer-Based Testing
  • Conceptualizing, Measuring, and Evaluating Improvement Net...
  • Continuous Improvement and "High Leverage" Educational Pro...
  • Counseling in Schools
  • Critical Approaches to Gender in Higher Education
  • Critical Perspectives on Educational Innovation and Improv...
  • Critical Race Theory
  • Crossborder and Transnational Higher Education
  • Cross-National Research on Continuous Improvement
  • Cross-Sector Research on Continuous Learning and Improveme...
  • Cultural Diversity in Early Childhood Education
  • Culturally Responsive Leadership
  • Culturally Responsive Pedagogies
  • Culturally Responsive Teacher Education in the United Stat...
  • Curriculum Design
  • Data Collection in Educational Research
  • Data-driven Decision Making in the United States
  • Deaf Education
  • Desegregation and Integration
  • Design Thinking and the Learning Sciences: Theoretical, Pr...
  • Development, Moral
  • Dialogic Pedagogy
  • Digital Age Teacher, The
  • Digital Citizenship
  • Digital Divides
  • Disabilities
  • Distance Learning
  • Distributed Leadership
  • Doctoral Education and Training
  • Early Childhood Education and Care (ECEC) in Denmark
  • Early Childhood Education and Development in Mexico
  • Early Childhood Education in Aotearoa New Zealand
  • Early Childhood Education in Australia
  • Early Childhood Education in China
  • Early Childhood Education in Europe
  • Early Childhood Education in Sub-Saharan Africa
  • Early Childhood Education in Sweden
  • Early Childhood Education Pedagogy
  • Early Childhood Education Policy
  • Early Childhood Education, The Arts in
  • Early Childhood Mathematics
  • Early Childhood Science
  • Early Childhood Teacher Education
  • Early Childhood Teachers in Aotearoa New Zealand
  • Early Years Professionalism and Professionalization Polici...
  • Economics of Education
  • Education For Children with Autism
  • Education for Sustainable Development
  • Education Leadership, Empirical Perspectives in
  • Education of Native Hawaiian Students
  • Education Reform and School Change
  • Educator Partnerships with Parents and Families with a Foc...
  • Emotional and Affective Issues in Environmental and Sustai...
  • Emotional and Behavioral Disorders
  • Environmental and Science Education: Overlaps and Issues
  • Environmental Education
  • Environmental Education in Brazil
  • Epistemic Beliefs
  • Equity and Improvement: Engaging Communities in Educationa...
  • Equity, Ethnicity, Diversity, and Excellence in Education
  • Ethical Research with Young Children
  • Ethics and Education
  • Ethics of Teaching
  • Ethnic Studies
  • Evidence-Based Communication Assessment and Intervention
  • Family and Community Partnerships in Education
  • Family Day Care
  • Federal Government Programs and Issues
  • Feminization of Labor in Academia
  • Finance, Education
  • Financial Aid
  • Formative Assessment
  • Future-Focused Education
  • Gender and Achievement
  • Gender and Alternative Education
  • Gender-Based Violence on University Campuses
  • Gifted Education
  • Global Mindedness and Global Citizenship Education
  • Global University Rankings
  • Governance, Education
  • Grounded Theory
  • Growth of Effective Mental Health Services in Schools in t...
  • Higher Education and Globalization
  • Higher Education and the Developing World
  • Higher Education Faculty Characteristics and Trends in the...
  • Higher Education Finance
  • Higher Education Governance
  • Higher Education Graduate Outcomes and Destinations
  • Higher Education in Africa
  • Higher Education in China
  • Higher Education in Latin America
  • Higher Education in the United States, Historical Evolutio...
  • Higher Education, International Issues in
  • Higher Education Management
  • Higher Education Policy
  • Higher Education Research
  • Higher Education Student Assessment
  • High-stakes Testing
  • History of Early Childhood Education in the United States
  • History of Education in the United States
  • History of Technology Integration in Education
  • Homeschooling
  • Inclusion in Early Childhood: Difference, Disability, and ...
  • Inclusive Education
  • Indigenous Education in a Global Context
  • Indigenous Learning Environments
  • Indigenous Students in Higher Education in the United Stat...
  • Infant and Toddler Pedagogy
  • Inservice Teacher Education
  • Integrating Art across the Curriculum
  • Intelligence
  • Intensive Interventions for Children and Adolescents with ...
  • International Perspectives on Academic Freedom
  • Intersectionality and Education
  • Knowledge Development in Early Childhood
  • Leadership Development, Coaching and Feedback for
  • Leadership in Early Childhood Education
  • Leadership Training with an Emphasis on the United States
  • Learning Analytics in Higher Education
  • Learning Difficulties
  • Learning, Lifelong
  • Learning, Multimedia
  • Learning Strategies
  • Legal Matters and Education Law
  • LGBT Youth in Schools
  • Linguistic Diversity
  • Linguistically Inclusive Pedagogy
  • Literacy Development and Language Acquisition
  • Mathematics Identity
  • Mathematics Instruction and Interventions for Students wit...
  • Mathematics Teacher Education
  • Measurement for Improvement in Education
  • Measurement in Education in the United States
  • Meta-Analysis and Research Synthesis in Education
  • Methodological Approaches for Impact Evaluation in Educati...
  • Mindfulness, Learning, and Education
  • Motherscholars
  • Multiliteracies in Early Childhood Education
  • Multiple Documents Literacy: Theory, Research, and Applica...
  • Multivariate Research Methodology
  • Museums, Education, and Curriculum
  • Music Education
  • Narrative Research in Education
  • Native American Studies
  • Note-Taking
  • Numeracy Education
  • One-to-One Technology in the K-12 Classroom
  • Online Education
  • Open Education
  • Organizing for Continuous Improvement in Education
  • Organizing Schools for the Inclusion of Students with Disa...
  • Outdoor Play and Learning
  • Outdoor Play and Learning in Early Childhood Education
  • Pedagogical Leadership
  • Pedagogy of Teacher Education, A
  • Performance Objectives and Measurement
  • Performance-based Research Assessment in Higher Education
  • Performance-based Research Funding
  • Phenomenology in Educational Research
  • Philosophy of Education
  • Physical Education
  • Podcasts in Education
  • Policy Context of United States Educational Innovation and...
  • Politics of Education
  • Portable Technology Use in Special Education Programs and ...
  • Post-humanism and Environmental Education
  • Pre-Service Teacher Education
  • Problem Solving
  • Productivity and Higher Education
  • Professional Development
  • Professional Learning Communities
  • Program Evaluation
  • Programs and Services for Students with Emotional or Behav...
  • Psychology Learning and Teaching
  • Psychometric Issues in the Assessment of English Language ...
  • Qualitative Data Analysis Techniques
  • Qualitative, Quantitative, and Mixed Methods Research Samp...
  • Qualitative Research Design
  • Quantitative Research Designs in Educational Research
  • Queering the English Language Arts (ELA) Writing Classroom
  • Race and Affirmative Action in Higher Education
  • Reading Education
  • Refugee and New Immigrant Learners
  • Relational and Developmental Trauma and Schools
  • Relational Pedagogies in Early Childhood Education
  • Reliability in Educational Assessments
  • Religion in Elementary and Secondary Education in the Unit...
  • Researcher Development and Skills Training within the Cont...
  • Research-Practice Partnerships in Education within the Uni...
  • Response to Intervention
  • Restorative Practices
  • Risky Play in Early Childhood Education
  • Scale and Sustainability of Education Innovation and Impro...
  • Scaling Up Research-based Educational Practices
  • School Accreditation
  • School Choice
  • School Culture
  • School District Budgeting and Financial Management in the ...
  • School Improvement through Inclusive Education
  • School Reform
  • Schools, Private and Independent
  • School-Wide Positive Behavior Support
  • Science Education
  • Secondary to Postsecondary Transition Issues
  • Self-Regulated Learning
  • Self-Study of Teacher Education Practices
  • Service-Learning
  • Severe Disabilities
  • Single Salary Schedule
  • Single-sex Education
  • Single-Subject Research Design
  • Social Context of Education
  • Social Justice
  • Social Network Analysis
  • Social Pedagogy
  • Social Science and Education Research
  • Social Studies Education
  • Sociology of Education
  • Standards-Based Education
  • Student Access, Equity, and Diversity in Higher Education
  • Student Assignment Policy
  • Student Engagement in Tertiary Education
  • Student Learning, Development, Engagement, and Motivation ...
  • Student Participation
  • Student Voice in Teacher Development
  • Sustainability Education in Early Childhood Education
  • Sustainability in Early Childhood Education
  • Sustainability in Higher Education
  • Teacher Beliefs and Epistemologies
  • Teacher Collaboration in School Improvement
  • Teacher Evaluation and Teacher Effectiveness
  • Teacher Preparation
  • Teacher Training and Development
  • Teacher Unions and Associations
  • Teacher-Student Relationships
  • Teaching Critical Thinking
  • Technologies, Teaching, and Learning in Higher Education
  • Technology Education in Early Childhood
  • Technology, Educational
  • Technology-based Assessment
  • The Bologna Process
  • The Regulation of Standards in Higher Education
  • Theories of Educational Leadership
  • Three Conceptions of Literacy: Media, Narrative, and Gamin...
  • Tracking and Detracking
  • Traditions of Quality Improvement in Education
  • Transformative Learning
  • Transitions in Early Childhood Education
  • Tribally Controlled Colleges and Universities in the Unite...
  • Understanding the Psycho-Social Dimensions of Schools and ...
  • University Faculty Roles and Responsibilities in the Unite...
  • Using Ethnography in Educational Research
  • Value of Higher Education for Students and Other Stakehold...
  • Virtual Learning Environments
  • Vocational and Technical Education
  • Wellness and Well-Being in Education
  • Women's and Gender Studies
  • Young Children and Spirituality
  • Young Children's Learning Dispositions
  • Young Children's Working Theories
  • Privacy Policy
  • Cookie Policy
  • Legal Notice
  • Accessibility

Powered by:

  • [66.249.64.20|185.126.86.119]
  • 185.126.86.119

What Is Meta-Analysis? Definition, Research & Examples

Appinio Research · 01.02.2024 · 38min read

What Is Meta-Analysis Definition Research Examples

Are you looking to harness the power of data and uncover meaningful insights from a multitude of research studies? In a world overflowing with information, meta-analysis emerges as a guiding light, offering a systematic and quantitative approach to distilling knowledge from a sea of research.

This guide will demystify the art and science of meta-analysis, walking you through the process, from defining your research question to interpreting the results. Whether you're an academic researcher, a policymaker, or a curious mind eager to explore the depths of data, this guide will equip you with the tools and understanding needed to undertake robust and impactful meta-analyses.

What is a Meta Analysis?

Meta-analysis is a quantitative research method that involves the systematic synthesis and statistical analysis of data from multiple individual studies on a particular topic or research question. It aims to provide a comprehensive and robust summary of existing evidence by pooling the results of these studies, often leading to more precise and generalizable conclusions.

The primary purpose of meta-analysis is to:

  • Quantify Effect Sizes:  Determine the magnitude and direction of an effect or relationship across studies.
  • Evaluate Consistency:  Assess the consistency of findings among studies and identify sources of heterogeneity.
  • Enhance Statistical Power:  Increase the statistical power to detect significant effects by combining data from multiple studies.
  • Generalize Results:  Provide more generalizable results by analyzing a more extensive and diverse sample of participants or contexts.
  • Examine Subgroup Effects:  Explore whether the effect varies across different subgroups or study characteristics.

Importance of Meta-Analysis

Meta-analysis plays a crucial role in scientific research and evidence-based decision-making. Here are key reasons why meta-analysis is highly valuable:

  • Enhanced Precision:  By pooling data from multiple studies, meta-analysis provides a more precise estimate of the effect size, reducing the impact of random variation.
  • Increased Statistical Power:  The combination of numerous studies enhances statistical power, making it easier to detect small but meaningful effects.
  • Resolution of Inconsistencies:  Meta-analysis can help resolve conflicting findings in the literature by systematically analyzing and synthesizing evidence.
  • Identification of Moderators:  It allows for the identification of factors that may moderate the effect, helping to understand when and for whom interventions or treatments are most effective.
  • Evidence-Based Decision-Making:  Policymakers, clinicians, and researchers use meta-analysis to inform evidence-based decision-making, leading to more informed choices in healthcare , education, and other fields.
  • Efficient Use of Resources:  Meta-analysis can guide future research by identifying gaps in knowledge, reducing duplication of efforts, and directing resources to areas with the most significant potential impact.

Types of Research Questions Addressed

Meta-analysis can address a wide range of research questions across various disciplines. Some common types of research questions that meta-analysis can tackle include:

  • Treatment Efficacy:  Does a specific medical treatment, therapy, or intervention have a significant impact on patient outcomes or symptoms?
  • Intervention Effectiveness:  How effective are educational programs, training methods, or interventions in improving learning outcomes or skills?
  • Risk Factors and Associations:  What are the associations between specific risk factors, such as smoking or diet, and the likelihood of developing certain diseases or conditions?
  • Impact of Policies:  What is the effect of government policies, regulations, or interventions on social, economic, or environmental outcomes?
  • Psychological Constructs:  How do psychological constructs, such as self-esteem, anxiety, or motivation, influence behavior or mental health outcomes?
  • Comparative Effectiveness:  Which of two or more competing interventions or treatments is more effective for a particular condition or population?
  • Dose-Response Relationships:  Is there a dose-response relationship between exposure to a substance or treatment and the likelihood or severity of an outcome?

Meta-analysis is a versatile tool that can provide valuable insights into a wide array of research questions, making it an indispensable method in evidence synthesis and knowledge advancement.

Meta-Analysis vs. Systematic Review

In evidence synthesis and research aggregation, meta-analysis and systematic reviews are two commonly used methods, each serving distinct purposes while sharing some similarities. Let's explore the differences and similarities between these two approaches.

Meta-Analysis

  • Purpose:  Meta-analysis is a statistical technique used to combine and analyze quantitative data from multiple individual studies that address the same research question. The primary aim of meta-analysis is to provide a single summary effect size that quantifies the magnitude and direction of an effect or relationship across studies.
  • Data Synthesis:  In meta-analysis, researchers extract and analyze numerical data, such as means, standard deviations, correlation coefficients, or odds ratios, from each study. These effect size estimates are then combined using statistical methods to generate an overall effect size and associated confidence interval.
  • Quantitative:  Meta-analysis is inherently quantitative, focusing on numerical data and statistical analyses to derive a single effect size estimate.
  • Main Outcome:  The main outcome of a meta-analysis is the summary effect size, which provides a quantitative estimate of the research question's answer.

Systematic Review

  • Purpose:  A systematic review is a comprehensive and structured overview of the available evidence on a specific research question. While systematic reviews may include meta-analysis, their primary goal is to provide a thorough and unbiased summary of the existing literature.
  • Data Synthesis:  Systematic reviews involve a meticulous process of literature search, study selection, data extraction, and quality assessment. Researchers may narratively synthesize the findings, providing a qualitative summary of the evidence.
  • Qualitative:  Systematic reviews are often qualitative in nature, summarizing and synthesizing findings in a narrative format. They do not always involve statistical analysis .
  • Main Outcome:  The primary outcome of a systematic review is a comprehensive narrative summary of the existing evidence. While some systematic reviews include meta-analyses, not all do so.

Key Differences

  • Nature of Data:  Meta-analysis primarily deals with quantitative data and statistical analysis, while systematic reviews encompass both quantitative and qualitative data, often presenting findings in a narrative format.
  • Focus on Effect Size:  Meta-analysis focuses on deriving a single, quantitative effect size estimate, whereas systematic reviews emphasize providing a comprehensive overview of the literature, including study characteristics, methodologies, and key findings.
  • Synthesis Approach:  Meta-analysis is a quantitative synthesis method, while systematic reviews may use both quantitative and qualitative synthesis approaches.

Commonalities

  • Structured Process:  Both meta-analyses and systematic reviews follow a structured and systematic process for literature search, study selection, data extraction, and quality assessment.
  • Evidence-Based:  Both approaches aim to provide evidence-based answers to specific research questions, offering valuable insights for decision-making in various fields.
  • Transparency:  Both meta-analyses and systematic reviews prioritize transparency and rigor in their methodologies to minimize bias and enhance the reliability of their findings.

While meta-analysis and systematic reviews share the overarching goal of synthesizing research evidence, they differ in their approach and main outcomes. Meta-analysis is quantitative, focusing on effect sizes, while systematic reviews provide comprehensive overviews, utilizing both quantitative and qualitative data to summarize the literature. Depending on the research question and available data, one or both of these methods may be employed to provide valuable insights for evidence-based decision-making.

How to Conduct a Meta-Analysis?

Planning a meta-analysis is a critical phase that lays the groundwork for a successful and meaningful study. We will explore each component of the planning process in more detail, ensuring you have a solid foundation before diving into data analysis.

How to Formulate Research Questions?

Your research questions are the guiding compass of your meta-analysis. They should be precise and tailored to the topic you're investigating. To craft effective research questions:

  • Clearly Define the Problem:  Start by identifying the specific problem or topic you want to address through meta-analysis.
  • Specify Key Variables:  Determine the essential variables or factors you'll examine in the included studies.
  • Frame Hypotheses:  If applicable, create clear hypotheses that your meta-analysis will test.

For example, if you're studying the impact of a specific intervention on patient outcomes, your research question might be: "What is the effect of Intervention X on Patient Outcome Y in published clinical trials?"

Eligibility Criteria

Eligibility criteria define the boundaries of your meta-analysis. By establishing clear criteria, you ensure that the studies you include are relevant and contribute to your research objectives. Key considerations for eligibility criteria include:

  • Study Types:  Decide which types of studies will be considered (e.g., randomized controlled trials, cohort studies, case-control studies).
  • Publication Time Frame:  Specify the publication date range for included studies.
  • Language:  Determine whether studies in languages other than your primary language will be included.
  • Geographic Region:  If relevant, define any geographic restrictions.

Your eligibility criteria should strike a balance between inclusivity and relevance. Excluding certain studies based on valid criteria ensures the quality and relevance of the data you analyze.

Search Strategy

A robust search strategy is fundamental to identifying all relevant studies. To create an effective search strategy:

  • Select Databases:  Choose appropriate databases that cover your research area (e.g., PubMed, Scopus, Web of Science).
  • Keywords and Search Terms:  Develop a comprehensive list of relevant keywords and search terms related to your research questions.
  • Search Filters:  Utilize search filters and Boolean operators (AND, OR) to refine your search queries.
  • Manual Searches:  Consider conducting hand-searches of key journals and reviewing the reference lists of relevant studies for additional sources.

Remember that the goal is to cast a wide net while maintaining precision to capture all relevant studies.

Data Extraction

Data extraction is the process of systematically collecting information from each selected study. It involves retrieving key data points, including:

  • Study Characteristics:  Author(s), publication year, study design, sample size, duration, and location.
  • Outcome Data:  Effect sizes, standard errors, confidence intervals, p-values, and any other relevant statistics.
  • Methodological Details:  Information on study quality, risk of bias, and potential sources of heterogeneity.

Creating a standardized data extraction form is essential to ensure consistency and accuracy throughout this phase. Spreadsheet software, such as Microsoft Excel, is commonly used for data extraction.

Quality Assessment

Assessing the quality of included studies is crucial to determine their reliability and potential impact on your meta-analysis. Various quality assessment tools and checklists are available, depending on the study design. Some commonly used tools include:

  • Newcastle-Ottawa Scale:  Used for assessing the quality of non-randomized studies (e.g., cohort, case-control studies).
  • Cochrane Risk of Bias Tool:  Designed for evaluating randomized controlled trials.

Quality assessment typically involves evaluating aspects such as study design, sample size, data collection methods, and potential biases. This step helps you weigh the contribution of each study to the overall analysis.

How to Conduct a Literature Review?

Conducting a thorough literature review is a critical step in the meta-analysis process. We will explore the essential components of a literature review, from designing a comprehensive search strategy to establishing clear inclusion and exclusion criteria and, finally, the study selection process.

Comprehensive Search

To ensure the success of your meta-analysis, it's imperative to cast a wide net when searching for relevant studies. A comprehensive search strategy involves:

  • Selecting Relevant Databases:  Identify databases that cover your research area comprehensively, such as PubMed, Scopus, Web of Science, or specialized databases specific to your field.
  • Creating a Keyword List:  Develop a list of relevant keywords and search terms related to your research questions. Think broadly and consider synonyms, acronyms, and variations.
  • Using Boolean Operators:  Utilize Boolean operators (AND, OR) to combine keywords effectively and refine your search.
  • Applying Filters:  Employ search filters (e.g., publication date range, study type) to narrow down results based on your eligibility criteria.

Remember that the goal is to leave no relevant stone unturned, as missing key studies can introduce bias into your meta-analysis.

Inclusion and Exclusion Criteria

Clearly defined inclusion and exclusion criteria are the gatekeepers of your meta-analysis. These criteria ensure that the studies you include meet your research objectives and maintain the quality of your analysis. Consider the following factors when establishing criteria:

  • Study Types:  Determine which types of studies are eligible for inclusion (e.g., randomized controlled trials, observational studies, case reports).
  • Publication Time Frame:  Specify the time frame within which studies must have been published.
  • Language:  Decide whether studies in languages other than your primary language will be included or excluded.
  • Geographic Region:  If applicable, define any geographic restrictions.
  • Relevance to Research Questions:  Ensure that selected studies align with your research questions and objectives.

Your inclusion and exclusion criteria should strike a balance between inclusivity and relevance. Rigorous criteria help maintain the quality and applicability of the studies included in your meta-analysis.

Study Selection Process

The study selection process involves systematically screening and evaluating each potential study to determine whether it meets your predefined inclusion criteria. Here's a step-by-step guide:

  • Screen Titles and Abstracts:  Begin by reviewing the titles and abstracts of the retrieved studies. Exclude studies that clearly do not meet your inclusion criteria.
  • Full-Text Assessment:  Assess the full text of potentially relevant studies to confirm their eligibility. Pay attention to study design, sample size, and other specific criteria.
  • Data Extraction:  For studies that meet your criteria, extract the necessary data, including study characteristics, effect sizes, and other relevant information.
  • Record Exclusions:  Keep a record of the reasons for excluding studies. This transparency is crucial for the reproducibility of your meta-analysis.
  • Resolve Discrepancies:  If multiple reviewers are involved, resolve any disagreements through discussion or a third-party arbitrator.

Maintaining a clear and organized record of your study selection process is essential for transparency and reproducibility. Software tools like EndNote or Covidence can facilitate the screening and data extraction process.

By following these systematic steps in conducting a literature review, you ensure that your meta-analysis is built on a solid foundation of relevant and high-quality studies.

Data Extraction and Management

As you progress in your meta-analysis journey, the data extraction and management phase becomes paramount. We will delve deeper into the critical aspects of this phase, including the data collection process, data coding and transformation, and how to handle missing data effectively.

Data Collection Process

The data collection process is the heart of your meta-analysis, where you systematically extract essential information from each selected study. To ensure accuracy and consistency:

  • Create a Data Extraction Form:  Develop a standardized data extraction form that includes all the necessary fields for collecting relevant data. This form should align with your research questions and inclusion criteria.
  • Data Extractors:  Assign one or more reviewers to extract data from the selected studies. Ensure they are familiar with the form and the specific data points to collect.
  • Double-Check Accuracy:  Implement a verification process where a second reviewer cross-checks a random sample of data extractions to identify discrepancies or errors.
  • Extract All Relevant Information:  Collect data on study characteristics, participant demographics, outcome measures, effect sizes, confidence intervals, and any additional information required for your analysis.
  • Maintain Consistency:  Use clear guidelines and definitions for data extraction to ensure uniformity across studies.
To optimize your data collection process and streamline the extraction and management of crucial information, consider leveraging innovative solutions like Appinio . With Appinio, you can effortlessly collect real-time consumer insights, ensuring your meta-analysis benefits from the latest data trends and user perspectives.   Ready to learn more? Book a demo today and unlock a world of data-driven possibilities!

Book a demo EN US faces

Get a free demo and see the Appinio platform in action!

Data Coding and Transformation

After data collection, you may need to code and transform the extracted data to ensure uniformity and compatibility across studies. This process involves:

  • Coding Categorical Variables:  If studies report data differently, code categorical variables consistently . For example, ensure that categories like "male" and "female" are coded consistently across studies.
  • Standardizing Units of Measurement:  Convert all measurements to a common unit if studies use different measurement units. For instance, if one study reports height in inches and another in centimeters, standardize to one unit for comparability.
  • Calculating Effect Sizes:  Calculate effect sizes and their standard errors or variances if they are not directly reported in the studies. Common effect size measures include Cohen's d, odds ratio (OR), and hazard ratio (HR).
  • Data Transformation:  Transform data if necessary to meet assumptions of statistical tests. Common transformations include log transformation for skewed data or arcsine transformation for proportions.
  • Heterogeneity Adjustment:  Consider using transformation methods to address heterogeneity among studies, such as applying the Freeman-Tukey double arcsine transformation for proportions.

The goal of data coding and transformation is to make sure that data from different studies are compatible and can be effectively synthesized during the analysis phase. Spreadsheet software like Excel or statistical software like R can be used for these tasks.

Handling Missing Data

Missing data is a common challenge in meta-analysis, and how you handle it can impact the validity and precision of your results. Strategies for handling missing data include:

  • Contact Authors:  If feasible, contact the authors of the original studies to request missing data or clarifications.
  • Imputation:  Consider using appropriate imputation methods to estimate missing values, but exercise caution and report the imputation methods used.
  • Sensitivity Analysis:  Conduct sensitivity analyses to assess the impact of missing data on your results by comparing the main analysis to alternative scenarios.

Remember that transparency in reporting how you handled missing data is crucial for the credibility of your meta-analysis.

By following these steps in data extraction and management, you will ensure the integrity and reliability of your meta-analysis dataset.

Meta-Analysis Example

Meta-analysis is a versatile research method that can be applied to various fields and disciplines, providing valuable insights by synthesizing existing evidence.

Example 1: Analyzing the Impact of Advertising Campaigns on Sales

Background:  A market research agency is tasked with assessing the effectiveness of advertising campaigns on sales outcomes for a range of consumer products. They have access to multiple studies and reports conducted by different companies, each analyzing the impact of advertising on sales revenue.

Meta-Analysis Approach:

  • Study Selection:  Identify relevant studies that meet specific inclusion criteria, such as the type of advertising campaign (e.g., TV commercials, social media ads), the products examined, and the sales metrics assessed.
  • Data Extraction:  Collect data from each study, including details about the advertising campaign (e.g., budget, duration), sales data (e.g., revenue, units sold), and any reported effect sizes or correlations.
  • Effect Size Calculation:  Calculate effect sizes (e.g., correlation coefficients) based on the data provided in each study, quantifying the strength and direction of the relationship between advertising and sales.
  • Data Synthesis:  Employ meta-analysis techniques to combine the effect sizes from the selected studies. Compute a summary effect size and its confidence interval to estimate the overall impact of advertising on sales.
  • Publication Bias Assessment:  Use funnel plots and statistical tests to assess the potential presence of publication bias, ensuring that the meta-analysis results are not unduly influenced by selective reporting.

Findings:  Through meta-analysis, the market research agency discovers that advertising campaigns have a statistically significant and positive impact on sales across various product categories. The findings provide evidence for the effectiveness of advertising efforts and assist companies in making data-driven decisions regarding their marketing strategies.

These examples illustrate how meta-analysis can be applied in diverse domains, from tech startups seeking to optimize user engagement to market research agencies evaluating the impact of advertising campaigns. By systematically synthesizing existing evidence, meta-analysis empowers decision-makers with valuable insights for informed choices and evidence-based strategies.

How to Assess Study Quality and Bias?

Ensuring the quality and reliability of the studies included in your meta-analysis is essential for drawing accurate conclusions. We'll show you how you can assess study quality using specific tools, evaluate potential bias, and address publication bias.

Quality Assessment Tools

Quality assessment tools provide structured frameworks for evaluating the methodological rigor of each included study. The choice of tool depends on the study design. Here are some commonly used quality assessment tools:

For Randomized Controlled Trials (RCTs):

  • Cochrane Risk of Bias Tool:  This tool assesses the risk of bias in RCTs based on six domains: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, and selective reporting.
  • Jadad Scale:  A simpler tool specifically for RCTs, the Jadad Scale focuses on randomization, blinding, and the handling of withdrawals and dropouts.

For Observational Studies:

  • Newcastle-Ottawa Scale (NOS):  The NOS assesses the quality of cohort and case-control studies based on three categories: selection, comparability, and outcome.
  • ROBINS-I:  Designed for non-randomized studies of interventions, the Risk of Bias in Non-randomized Studies of Interventions tool evaluates bias in domains such as confounding, selection bias, and measurement bias.
  • MINORS:  The Methodological Index for Non-Randomized Studies (MINORS) assesses non-comparative studies and includes items related to study design, reporting, and statistical analysis.

Bias Assessment

Evaluating potential sources of bias is crucial to understanding the limitations of the included studies. Common sources of bias include:

  • Selection Bias:  Occurs when the selection of participants is not random or representative of the target population.
  • Performance Bias:  Arises when participants or researchers are aware of the treatment or intervention status, potentially influencing outcomes.
  • Detection Bias:  Occurs when outcome assessors are not blinded to the treatment groups.
  • Attrition Bias:  Results from incomplete data or differential loss to follow-up between treatment groups.
  • Reporting Bias:  Involves selective reporting of outcomes, where only positive or statistically significant results are published.

To assess bias, reviewers often use the quality assessment tools mentioned earlier, which include domains related to bias, or they may specifically address bias concerns in the narrative synthesis.

We'll move on to the core of meta-analysis: data synthesis. We'll explore different effect size measures, fixed-effect versus random-effects models, and techniques for assessing and addressing heterogeneity among studies.

Data Synthesis

Now that you've gathered data from multiple studies and assessed their quality, it's time to synthesize this information effectively.

Effect Size Measures

Effect size measures quantify the magnitude of the relationship or difference you're investigating in your meta-analysis. The choice of effect size measure depends on your research question and the type of data provided by the included studies. Here are some commonly used effect size measures:

Continuous Outcome Data:

  • Cohen's d:  Measures the standardized mean difference between two groups. It's suitable for continuous outcome variables.
  • Hedges' g:  Similar to Cohen's d but incorporates a correction factor for small sample sizes.

Binary Outcome Data:

  • Odds Ratio (OR):  Used for dichotomous outcomes, such as success/failure or presence/absence.
  • Risk Ratio (RR):  Similar to OR but used when the outcome is relatively common.

Time-to-Event Data:

  • Hazard Ratio (HR):  Used in survival analysis to assess the risk of an event occurring over time.
  • Risk Difference (RD):  Measures the absolute difference in event rates between two groups.

Selecting the appropriate effect size measure depends on the nature of your data and the research question. When effect sizes are not directly reported in the studies, you may need to calculate them using available data, such as means, standard deviations, and sample sizes.

Formula for Cohen's d:

d = (Mean of Group A - Mean of Group B) / Pooled Standard Deviation

Fixed-Effect vs. Random-Effects Models

In meta-analysis, you can choose between fixed-effect and random-effects models to combine the results of individual studies:

Fixed-Effect Model:

  • Assumes that all included studies share a common true effect size.
  • Accounts for only within-study variability (sampling error).
  • Appropriate when studies are very similar or when there's minimal heterogeneity.

Random-Effects Model:

  • Acknowledges that there may be variability in effect sizes across studies.
  • Accounts for both within-study variability (sampling error) and between-study variability (real differences between studies).
  • More conservative and applicable when there's substantial heterogeneity.

The choice between these models should be guided by the degree of heterogeneity observed among the included studies. If heterogeneity is significant, the random-effects model is often preferred, as it provides a more robust estimate of the overall effect.

Forest Plots

Forest plots are graphical representations commonly used in meta-analysis to display the results of individual studies along with the combined summary estimate. Key components of a forest plot include:

  • Vertical Line:  Represents the null effect (e.g., no difference or no effect).
  • Horizontal Lines:  Represent the confidence intervals for each study's effect size estimate.
  • Diamond or Square:  Represents the summary effect size estimate, with its width indicating the confidence interval around the summary estimate.
  • Study Names:  Listed on the left side of the plot, identifying each study.

Forest plots help visualize the distribution of effect sizes across studies and provide insights into the consistency and direction of the findings.

Heterogeneity Assessment

Heterogeneity refers to the variability in effect sizes among the included studies. It's important to assess and understand heterogeneity as it can impact the interpretation of your meta-analysis results. Standard methods for assessing heterogeneity include:

  • Cochran's Q Test:  A statistical test that assesses whether there is significant heterogeneity among the effect sizes of the included studies.
  • I² Statistic:  A measure that quantifies the proportion of total variation in effect sizes that is due to heterogeneity. I² values range from 0% to 100%, with higher values indicating greater heterogeneity.

Assessing heterogeneity is crucial because it informs your choice of meta-analysis model (fixed-effect vs. random-effects) and whether subgroup analyses or sensitivity analyses are warranted to explore potential sources of heterogeneity.

How to Interpret Meta-Analysis Results?

With the data synthesis complete, it's time to make sense of the results of your meta-analysis.

Meta-Analytic Summary

The meta-analytic summary is the culmination of your efforts in data synthesis. It provides a consolidated estimate of the effect size and its confidence interval, combining the results of all included studies. To interpret the meta-analytic summary effectively:

  • Effect Size Estimate:  Understand the primary effect size estimate, such as Cohen's d, odds ratio, or hazard ratio, and its associated confidence interval.
  • Significance:  Determine whether the summary effect size is statistically significant. This is indicated when the confidence interval does not include the null value (e.g., 0 for Cohen's d or 1 for odds ratio).
  • Magnitude:  Assess the magnitude of the effect size. Is it large, moderate, or small, and what are the practical implications of this magnitude?
  • Direction:  Consider the direction of the effect. Is it in the hypothesized direction, or does it contradict the expected outcome?
  • Clinical or Practical Significance:  Reflect on the clinical or practical significance of the findings. Does the effect size have real-world implications?
  • Consistency:  Evaluate the consistency of the findings across studies. Are most studies in agreement with the summary effect size estimate, or are there outliers?

Subgroup Analyses

Subgroup analyses allow you to explore whether the effect size varies across different subgroups of studies or participants. This can help identify potential sources of heterogeneity or assess whether the intervention's effect differs based on specific characteristics. Steps for conducting subgroup analyses:

  • Define Subgroups:  Clearly define the subgroups you want to investigate based on relevant study characteristics (e.g., age groups, study design , intervention type).
  • Analyze Subgroups:  Calculate separate summary effect sizes for each subgroup and compare them to the overall summary effect.
  • Assess Heterogeneity:  Evaluate whether subgroup differences are statistically significant. If so, this suggests that the effect size varies significantly among subgroups.
  • Interpretation:  Interpret the subgroup findings in the context of your research question. Are there meaningful differences in the effect across subgroups? What might explain these differences?

Subgroup analyses can provide valuable insights into the factors influencing the overall effect size and help tailor recommendations for specific populations or conditions.

Sensitivity Analyses

Sensitivity analyses are conducted to assess the robustness of your meta-analysis results by exploring how different choices or assumptions might affect the findings. Common sensitivity analyses include:

  • Exclusion of Low-Quality Studies:  Repeating the meta-analysis after excluding studies with low quality or a high risk of bias.
  • Changing Effect Size Measure:  Re-running the analysis using a different effect size measure to assess whether the choice of measure significantly impacts the results.
  • Publication Bias Adjustment:  Applying methods like the trim-and-fill procedure to adjust for potential publication bias.
  • Subsample Analysis:  Analyzing a subset of studies based on specific criteria or characteristics to investigate their impact on the summary effect.

Sensitivity analyses help assess the robustness and reliability of your meta-analysis results, providing a more comprehensive understanding of the potential influence of various factors.

Reporting and Publication

The final stages of your meta-analysis involve preparing your findings for publication.

Manuscript Preparation

When preparing your meta-analysis manuscript, consider the following:

  • Structured Format:  Organize your manuscript following a structured format, including sections such as introduction, methods, results, discussion, and conclusions.
  • Clarity and Conciseness:  Write your findings clearly and concisely, avoiding jargon or overly technical language. Use tables and figures to enhance clarity.
  • Transparent Methods:  Provide detailed descriptions of your methods, including eligibility criteria, search strategy, data extraction, and statistical analysis.
  • Incorporate Tables and Figures:  Present your meta-analysis results using tables and forest plots to visually convey key findings.
  • Interpretation:  Interpret the implications of your findings, discussing the clinical or practical significance and limitations.

Transparent Reporting Guidelines

Adhering to transparent reporting guidelines ensures that your meta-analysis is transparent, reproducible, and credible. Some widely recognized guidelines include:

  • PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses):  PRISMA provides a checklist and flow diagram for reporting systematic reviews and meta-analyses, enhancing transparency and rigor.
  • MOOSE (Meta-analysis of Observational Studies in Epidemiology):  MOOSE guidelines are designed for meta-analyses of observational studies and provide a framework for transparent reporting.
  • ROBINS-I:  If your meta-analysis involves non-randomized studies, follow the Risk Of Bias In Non-randomized Studies of Interventions guidelines for reporting.

Adhering to these guidelines ensures that your meta-analysis is transparent, reproducible, and credible. It enhances the quality of your research and aids readers and reviewers in assessing the rigor of your study.

PRISMA Statement

The PRISMA statement is a valuable resource for conducting and reporting systematic reviews and meta-analyses. Key elements of PRISMA include:

  • Title:  Clearly indicate that your paper is a systematic review or meta-analysis.
  • Structured Abstract:  Provide a structured summary of your study, including objectives, methods, results, and conclusions.
  • Transparent Reporting:  Follow the PRISMA checklist, which covers items such as the rationale, eligibility criteria, search strategy, data extraction, and risk of bias assessment.
  • Flow Diagram:  Include a flow diagram illustrating the study selection process.

By adhering to the PRISMA statement, you enhance the transparency and credibility of your meta-analysis, facilitating its acceptance for publication and aiding readers in evaluating the quality of your research.

Conclusion for Meta-Analysis

Meta-analysis is a powerful tool that allows you to combine and analyze data from multiple studies to find meaningful patterns and make informed decisions. It helps you see the bigger picture and draw more accurate conclusions than individual studies alone. Whether you're in healthcare, education, business, or any other field, the principles of meta-analysis can be applied to enhance your research and decision-making processes. Remember that conducting a successful meta-analysis requires careful planning, attention to detail, and transparency in reporting. By following the steps outlined in this guide, you can embark on your own meta-analysis journey with confidence, contributing to the advancement of knowledge and evidence-based practices in your area of interest.

How to Elevate Your Meta-Analysis With Real-Time Insights?

Introducing Appinio , the real-time market research platform that brings a new level of excitement to your meta-analysis journey. With Appinio, you can seamlessly collect your own market research data in minutes, empowering your meta-analysis with fresh, real-time consumer insights.

Here's why Appinio is your ideal partner for efficient data collection:

  • From Questions to Insights in Minutes:  Appinio's lightning-fast platform ensures you get the answers you need when you need them, accelerating your meta-analysis process.
  • No Research PhD Required:  Our intuitive platform is designed for everyone, eliminating the need for specialized research skills and putting the power of data collection in your hands.
  • Global Reach, Minimal Time:  With an average field time of less than 23 minutes for 1,000 respondents and access to over 90 countries, you can define precise target groups and gather data swiftly.

Register now EN

Get free access to the platform!

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

What is Predictive Modeling Definition Types Techniques

21.03.2024 | 28min read

What is Predictive Modeling? Definition, Types, Techniques

What is Brand Equity Definition Model Measurement Examples

18.03.2024 | 29min read

What is Brand Equity? Definition, Measurement, Examples

Discrete vs Continuous Data Differences and Examples

14.03.2024 | 23min read

Discrete vs. Continuous Data: Differences and Examples

  • How it works

Meta-Analysis – Guide with Definition, Steps & Examples

Published by Owen Ingram at April 26th, 2023 , Revised On April 26, 2023

“A meta-analysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. “

Meta-analysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning their research work, they are advised to begin from the top of the evidence pyramid. The evidence available in the form of meta-analysis or systematic reviews addressing important questions is significant in academics because it informs decision-making.

What is Meta-Analysis  

Meta-analysis estimates the absolute effect of individual independent research studies by systematically synthesising or merging the results. Meta-analysis isn’t only about achieving a wider population by combining several smaller studies. It involves systematic methods to evaluate the inconsistencies in participants, variability (also known as heterogeneity), and findings to check how sensitive their findings are to the selected systematic review protocol.   

When Should you Conduct a Meta-Analysis?

Meta-analysis has become a widely-used research method in medical sciences and other fields of work for several reasons. The technique involves summarising the results of independent systematic review studies. 

The Cochrane Handbook explains that “an important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention” (section 10.2).

A researcher or a practitioner should choose meta-analysis when the following outcomes are desirable. 

For generating new hypotheses or ending controversies resulting from different research studies. Quantifying and evaluating the variable results and identifying the extent of conflict in literature through meta-analysis is possible. 

To find research gaps left unfilled and address questions not posed by individual studies. Primary research studies involve specific types of participants and interventions. A review of these studies with variable characteristics and methodologies can allow the researcher to gauge the consistency of findings across a wider range of participants and interventions. With the help of meta-analysis, the reasons for differences in the effect can also be explored. 

To provide convincing evidence. Estimating the effects with a larger sample size and interventions can provide convincing evidence. Many academic studies are based on a very small dataset, so the estimated intervention effects in isolation are not fully reliable.

Elements of a Meta-Analysis

Deeks et al. (2019), Haidilch (2010), and Grant & Booth (2009) explored the characteristics, strengths, and weaknesses of conducting the meta-analysis. They are briefly explained below. 

Characteristics: 

  • A systematic review must be completed before conducting the meta-analysis because it provides a summary of the findings of the individual studies synthesised. 
  • You can only conduct a meta-analysis by synthesising studies in a systematic review. 
  • The studies selected for statistical analysis for the purpose of meta-analysis should be similar in terms of comparison, intervention, and population. 

Strengths: 

  • A meta-analysis takes place after the systematic review. The end product is a comprehensive quantitative analysis that is complicated but reliable. 
  • It gives more value and weightage to existing studies that do not hold practical value on their own. 
  • Policy-makers and academicians cannot base their decisions on individual research studies. Meta-analysis provides them with a complex and solid analysis of evidence to make informed decisions. 

Criticisms: 

  • The meta-analysis uses studies exploring similar topics. Finding similar studies for the meta-analysis can be challenging.
  • When and if biases in the individual studies or those related to reporting and specific research methodologies are involved, the meta-analysis results could be misleading.

Steps of Conducting the Meta-Analysis 

The process of conducting the meta-analysis has remained a topic of debate among researchers and scientists. However, the following 5-step process is widely accepted. 

Step 1: Research Question

The first step in conducting clinical research involves identifying a research question and proposing a hypothesis . The potential clinical significance of the research question is then explained, and the study design and analytical plan are justified.

Step 2: Systematic Review 

The purpose of a systematic review (SR) is to address a research question by identifying all relevant studies that meet the required quality standards for inclusion. While established journals typically serve as the primary source for identified studies, it is important to also consider unpublished data to avoid publication bias or the exclusion of studies with negative results.

While some meta-analyses may limit their focus to randomized controlled trials (RCTs) for the sake of obtaining the highest quality evidence, other experimental and quasi-experimental studies may be included if they meet the specific inclusion/exclusion criteria established for the review.

Step 3: Data Extraction

After selecting studies for the meta-analysis, researchers extract summary data or outcomes, as well as sample sizes and measures of data variability for both intervention and control groups. The choice of outcome measures depends on the research question and the type of study, and may include numerical or categorical measures.

For instance, numerical means may be used to report differences in scores on a questionnaire or changes in a measurement, such as blood pressure. In contrast, risk measures like odds ratios (OR) or relative risks (RR) are typically used to report differences in the probability of belonging to one category or another, such as vaginal birth versus cesarean birth.

Step 4: Standardisation and Weighting Studies

After gathering all the required data, the fourth step involves computing suitable summary measures from each study for further examination. These measures are typically referred to as Effect Sizes and indicate the difference in average scores between the control and intervention groups. For instance, it could be the variation in blood pressure changes between study participants who used drug X and those who used a placebo.

Since the units of measurement often differ across the included studies, standardization is necessary to create comparable effect size estimates. Standardization is accomplished by determining, for each study, the average score for the intervention group, subtracting the average score for the control group, and dividing the result by the relevant measure of variability in that dataset.

In some cases, the results of certain studies must carry more significance than others. Larger studies, as measured by their sample sizes, are deemed to produce more precise estimates of effect size than smaller studies. Additionally, studies with less variability in data, such as smaller standard deviation or narrower confidence intervals, are typically regarded as higher quality in study design. A weighting statistic that aims to incorporate both of these factors, known as inverse variance, is commonly employed.

Step 5: Absolute Effect Estimation

The ultimate step in conducting a meta-analysis is to choose and utilize an appropriate model for comparing Effect Sizes among diverse studies. Two popular models for this purpose are the Fixed Effects and Random Effects models. The Fixed Effects model relies on the premise that each study is evaluating a common treatment effect, implying that all studies would have estimated the same Effect Size if sample variability were equal across all studies.

Conversely, the Random Effects model posits that the true treatment effects in individual studies may vary from each other, and endeavors to consider this additional source of interstudy variation in Effect Sizes. The existence and magnitude of this latter variability is usually evaluated within the meta-analysis through a test for ‘heterogeneity.’

Forest Plot

The results of a meta-analysis are often visually presented using a “Forest Plot”. This type of plot displays, for each study, included in the analysis, a horizontal line that indicates the standardized Effect Size estimate and 95% confidence interval for the risk ratio used. Figure A provides an example of a hypothetical Forest Plot in which drug X reduces the risk of death in all three studies.

However, the first study was larger than the other two, and as a result, the estimates for the smaller studies were not statistically significant. This is indicated by the lines emanating from their boxes, including the value of 1. The size of the boxes represents the relative weights assigned to each study by the meta-analysis. The combined estimate of the drug’s effect, represented by the diamond, provides a more precise estimate of the drug’s effect, with the diamond indicating both the combined risk ratio estimate and the 95% confidence interval limits.

odds ratio

Figure-A: Hypothetical Forest Plot

Relevance to Practice and Research 

  Evidence Based Nursing commentaries often include recently published systematic reviews and meta-analyses, as they can provide new insights and strengthen recommendations for effective healthcare practices. Additionally, they can identify gaps or limitations in current evidence and guide future research directions.

The quality of the data available for synthesis is a critical factor in the strength of conclusions drawn from meta-analyses, and this is influenced by the quality of individual studies and the systematic review itself. However, meta-analysis cannot overcome issues related to underpowered or poorly designed studies.

Therefore, clinicians may still encounter situations where the evidence is weak or uncertain, and where higher-quality research is required to improve clinical decision-making. While such findings can be frustrating, they remain important for informing practice and highlighting the need for further research to fill gaps in the evidence base.

Methods and Assumptions in Meta-Analysis 

Ensuring the credibility of findings is imperative in all types of research, including meta-analyses. To validate the outcomes of a meta-analysis, the researcher must confirm that the research techniques used were accurate in measuring the intended variables. Typically, researchers establish the validity of a meta-analysis by testing the outcomes for homogeneity or the degree of similarity between the results of the combined studies.

Homogeneity is preferred in meta-analyses as it allows the data to be combined without needing adjustments to suit the study’s requirements. To determine homogeneity, researchers assess heterogeneity, the opposite of homogeneity. Two widely used statistical methods for evaluating heterogeneity in research results are Cochran’s-Q and I-Square, also known as I-2 Index.

Difference Between Meta-Analysis and Systematic Reviews

Meta-analysis and systematic reviews are both research methods used to synthesise evidence from multiple studies on a particular topic. However, there are some key differences between the two.

Systematic reviews involve a comprehensive and structured approach to identifying, selecting, and critically appraising all available evidence relevant to a specific research question. This process involves searching multiple databases, screening the identified studies for relevance and quality, and summarizing the findings in a narrative report.

Meta-analysis, on the other hand, involves using statistical methods to combine and analyze the data from multiple studies, with the aim of producing a quantitative summary of the overall effect size. Meta-analysis requires the studies to be similar enough in terms of their design, methodology, and outcome measures to allow for meaningful comparison and analysis.

Therefore, systematic reviews are broader in scope and summarize the findings of all studies on a topic, while meta-analyses are more focused on producing a quantitative estimate of the effect size of an intervention across multiple studies that meet certain criteria. In some cases, a systematic review may be conducted without a meta-analysis if the studies are too diverse or the quality of the data is not sufficient to allow for statistical pooling.

Software Packages For Meta-Analysis

Meta-analysis can be done through software packages, including free and paid options. One of the most commonly used software packages for meta-analysis is RevMan by the Cochrane Collaboration.

Assessing the Quality of Meta-Analysis 

Assessing the quality of a meta-analysis involves evaluating the methods used to conduct the analysis and the quality of the studies included. Here are some key factors to consider:

  • Study selection: The studies included in the meta-analysis should be relevant to the research question and meet predetermined criteria for quality.
  • Search strategy: The search strategy should be comprehensive and transparent, including databases and search terms used to identify relevant studies.
  • Study quality assessment: The quality of included studies should be assessed using appropriate tools, and this assessment should be reported in the meta-analysis.
  • Data extraction: The data extraction process should be systematic and clearly reported, including any discrepancies that arose.
  • Analysis methods: The meta-analysis should use appropriate statistical methods to combine the results of the included studies, and these methods should be transparently reported.
  • Publication bias: The potential for publication bias should be assessed and reported in the meta-analysis, including any efforts to identify and include unpublished studies.
  • Interpretation of results: The results should be interpreted in the context of the study limitations and the overall quality of the evidence.
  • Sensitivity analysis: Sensitivity analysis should be conducted to evaluate the impact of study quality, inclusion criteria, and other factors on the overall results.

Overall, a high-quality meta-analysis should be transparent in its methods and clearly report the included studies’ limitations and the evidence’s overall quality.

Hire an Expert Writer

Orders completed by our expert writers are

  • Formally drafted in an academic style
  • Free Amendments and 100% Plagiarism Free – or your money back!
  • 100% Confidential and Timely Delivery!
  • Free anti-plagiarism report
  • Appreciated by thousands of clients. Check client reviews

Hire an Expert Writer

Examples of Meta-Analysis

  • STANLEY T.D. et JARRELL S.B. (1989), « Meta-regression analysis : a quantitative method of literature surveys », Journal of Economics Surveys, vol. 3, n°2, pp. 161-170.
  • DATTA D.K., PINCHES G.E. et NARAYANAN V.K. (1992), « Factors influencing wealth creation from mergers and acquisitions : a meta-analysis », Strategic Management Journal, Vol. 13, pp. 67-84.
  • GLASS G. (1983), « Synthesising empirical research : Meta-analysis » in S.A. Ward and L.J. Reed (Eds), Knowledge structure and use : Implications for synthesis and interpretation, Philadelphia : Temple University Press.
  • WOLF F.M. (1986), Meta-analysis : Quantitative methods for research synthesis, Sage University Paper n°59.
  • HUNTER J.E., SCHMIDT F.L. et JACKSON G.B. (1982), « Meta-analysis : cumulating research findings across studies », Beverly Hills, CA : Sage.

Frequently Asked Questions

What is a meta-analysis in research.

Meta-analysis is a statistical method used to combine results from multiple studies on a specific topic. By pooling data from various sources, meta-analysis can provide a more precise estimate of the effect size of a treatment or intervention and identify areas for future research.

Why is meta-analysis important?

Meta-analysis is important because it combines and summarizes results from multiple studies to provide a more precise and reliable estimate of the effect of a treatment or intervention. This helps clinicians and policymakers make evidence-based decisions and identify areas for further research.

What is an example of a meta-analysis?

A meta-analysis of studies evaluating physical exercise’s effect on depression in adults is an example. Researchers gathered data from 49 studies involving a total of 2669 participants. The studies used different types of exercise and measures of depression, which made it difficult to compare the results.

Through meta-analysis, the researchers calculated an overall effect size and determined that exercise was associated with a statistically significant reduction in depression symptoms. The study also identified that moderate-intensity aerobic exercise, performed three to five times per week, was the most effective. The meta-analysis provided a more comprehensive understanding of the impact of exercise on depression than any single study could provide.

What is the definition of meta-analysis in clinical research?

Meta-analysis in clinical research is a statistical technique that combines data from multiple independent studies on a particular topic to generate a summary or “meta” estimate of the effect of a particular intervention or exposure.

This type of analysis allows researchers to synthesise the results of multiple studies, potentially increasing the statistical power and providing more precise estimates of treatment effects. Meta-analyses are commonly used in clinical research to evaluate the effectiveness and safety of medical interventions and to inform clinical practice guidelines.

Is meta-analysis qualitative or quantitative?

Meta-analysis is a quantitative method used to combine and analyze data from multiple studies. It involves the statistical synthesis of results from individual studies to obtain a pooled estimate of the effect size of a particular intervention or treatment. Therefore, meta-analysis is considered a quantitative approach to research synthesis.

You May Also Like

In correlational research, a researcher measures the relationship between two or more variables or sets of scores without having control over the variables.

Thematic analysis is commonly used for qualitative data. Researchers give preference to thematic analysis when analysing audio or video transcripts.

This article provides the key advantages of primary research over secondary research so you can make an informed decision.

USEFUL LINKS

LEARNING RESOURCES

DMCA.com Protection Status

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Korean J Anesthesiol
  • v.71(2); 2018 Apr

Introduction to systematic review and meta-analysis

1 Department of Anesthesiology and Pain Medicine, Inje University Seoul Paik Hospital, Seoul, Korea

2 Department of Anesthesiology and Pain Medicine, Chung-Ang University College of Medicine, Seoul, Korea

Systematic reviews and meta-analyses present results by combining and analyzing data from different studies conducted on similar research topics. In recent years, systematic reviews and meta-analyses have been actively performed in various fields including anesthesiology. These research methods are powerful tools that can overcome the difficulties in performing large-scale randomized controlled trials. However, the inclusion of studies with any biases or improperly assessed quality of evidence in systematic reviews and meta-analyses could yield misleading results. Therefore, various guidelines have been suggested for conducting systematic reviews and meta-analyses to help standardize them and improve their quality. Nonetheless, accepting the conclusions of many studies without understanding the meta-analysis can be dangerous. Therefore, this article provides an easy introduction to clinicians on performing and understanding meta-analyses.

Introduction

A systematic review collects all possible studies related to a given topic and design, and reviews and analyzes their results [ 1 ]. During the systematic review process, the quality of studies is evaluated, and a statistical meta-analysis of the study results is conducted on the basis of their quality. A meta-analysis is a valid, objective, and scientific method of analyzing and combining different results. Usually, in order to obtain more reliable results, a meta-analysis is mainly conducted on randomized controlled trials (RCTs), which have a high level of evidence [ 2 ] ( Fig. 1 ). Since 1999, various papers have presented guidelines for reporting meta-analyses of RCTs. Following the Quality of Reporting of Meta-analyses (QUORUM) statement [ 3 ], and the appearance of registers such as Cochrane Library’s Methodology Register, a large number of systematic literature reviews have been registered. In 2009, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [ 4 ] was published, and it greatly helped standardize and improve the quality of systematic reviews and meta-analyses [ 5 ].

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f1.jpg

Levels of evidence.

In anesthesiology, the importance of systematic reviews and meta-analyses has been highlighted, and they provide diagnostic and therapeutic value to various areas, including not only perioperative management but also intensive care and outpatient anesthesia [6–13]. Systematic reviews and meta-analyses include various topics, such as comparing various treatments of postoperative nausea and vomiting [ 14 , 15 ], comparing general anesthesia and regional anesthesia [ 16 – 18 ], comparing airway maintenance devices [ 8 , 19 ], comparing various methods of postoperative pain control (e.g., patient-controlled analgesia pumps, nerve block, or analgesics) [ 20 – 23 ], comparing the precision of various monitoring instruments [ 7 ], and meta-analysis of dose-response in various drugs [ 12 ].

Thus, literature reviews and meta-analyses are being conducted in diverse medical fields, and the aim of highlighting their importance is to help better extract accurate, good quality data from the flood of data being produced. However, a lack of understanding about systematic reviews and meta-analyses can lead to incorrect outcomes being derived from the review and analysis processes. If readers indiscriminately accept the results of the many meta-analyses that are published, incorrect data may be obtained. Therefore, in this review, we aim to describe the contents and methods used in systematic reviews and meta-analyses in a way that is easy to understand for future authors and readers of systematic review and meta-analysis.

Study Planning

It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical methods on estimates from two or more different studies to form a pooled estimate [ 1 ]. Following a systematic review, if it is not possible to form a pooled estimate, it can be published as is without progressing to a meta-analysis; however, if it is possible to form a pooled estimate from the extracted data, a meta-analysis can be attempted. Systematic reviews and meta-analyses usually proceed according to the flowchart presented in Fig. 2 . We explain each of the stages below.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f2.jpg

Flowchart illustrating a systematic review.

Formulating research questions

A systematic review attempts to gather all available empirical research by using clearly defined, systematic methods to obtain answers to a specific question. A meta-analysis is the statistical process of analyzing and combining results from several similar studies. Here, the definition of the word “similar” is not made clear, but when selecting a topic for the meta-analysis, it is essential to ensure that the different studies present data that can be combined. If the studies contain data on the same topic that can be combined, a meta-analysis can even be performed using data from only two studies. However, study selection via a systematic review is a precondition for performing a meta-analysis, and it is important to clearly define the Population, Intervention, Comparison, Outcomes (PICO) parameters that are central to evidence-based research. In addition, selection of the research topic is based on logical evidence, and it is important to select a topic that is familiar to readers without clearly confirmed the evidence [ 24 ].

Protocols and registration

In systematic reviews, prior registration of a detailed research plan is very important. In order to make the research process transparent, primary/secondary outcomes and methods are set in advance, and in the event of changes to the method, other researchers and readers are informed when, how, and why. Many studies are registered with an organization like PROSPERO ( http://www.crd.york.ac.uk/PROSPERO/ ), and the registration number is recorded when reporting the study, in order to share the protocol at the time of planning.

Defining inclusion and exclusion criteria

Information is included on the study design, patient characteristics, publication status (published or unpublished), language used, and research period. If there is a discrepancy between the number of patients included in the study and the number of patients included in the analysis, this needs to be clearly explained while describing the patient characteristics, to avoid confusing the reader.

Literature search and study selection

In order to secure proper basis for evidence-based research, it is essential to perform a broad search that includes as many studies as possible that meet the inclusion and exclusion criteria. Typically, the three bibliographic databases Medline, Embase, and Cochrane Central Register of Controlled Trials (CENTRAL) are used. In domestic studies, the Korean databases KoreaMed, KMBASE, and RISS4U may be included. Effort is required to identify not only published studies but also abstracts, ongoing studies, and studies awaiting publication. Among the studies retrieved in the search, the researchers remove duplicate studies, select studies that meet the inclusion/exclusion criteria based on the abstracts, and then make the final selection of studies based on their full text. In order to maintain transparency and objectivity throughout this process, study selection is conducted independently by at least two investigators. When there is a inconsistency in opinions, intervention is required via debate or by a third reviewer. The methods for this process also need to be planned in advance. It is essential to ensure the reproducibility of the literature selection process [ 25 ].

Quality of evidence

However, well planned the systematic review or meta-analysis is, if the quality of evidence in the studies is low, the quality of the meta-analysis decreases and incorrect results can be obtained [ 26 ]. Even when using randomized studies with a high quality of evidence, evaluating the quality of evidence precisely helps determine the strength of recommendations in the meta-analysis. One method of evaluating the quality of evidence in non-randomized studies is the Newcastle-Ottawa Scale, provided by the Ottawa Hospital Research Institute 1) . However, we are mostly focusing on meta-analyses that use randomized studies.

If the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) system ( http://www.gradeworkinggroup.org/ ) is used, the quality of evidence is evaluated on the basis of the study limitations, inaccuracies, incompleteness of outcome data, indirectness of evidence, and risk of publication bias, and this is used to determine the strength of recommendations [ 27 ]. As shown in Table 1 , the study limitations are evaluated using the “risk of bias” method proposed by Cochrane 2) . This method classifies bias in randomized studies as “low,” “high,” or “unclear” on the basis of the presence or absence of six processes (random sequence generation, allocation concealment, blinding participants or investigators, incomplete outcome data, selective reporting, and other biases) [ 28 ].

The Cochrane Collaboration’s Tool for Assessing the Risk of Bias [ 28 ]

Data extraction

Two different investigators extract data based on the objectives and form of the study; thereafter, the extracted data are reviewed. Since the size and format of each variable are different, the size and format of the outcomes are also different, and slight changes may be required when combining the data [ 29 ]. If there are differences in the size and format of the outcome variables that cause difficulties combining the data, such as the use of different evaluation instruments or different evaluation timepoints, the analysis may be limited to a systematic review. The investigators resolve differences of opinion by debate, and if they fail to reach a consensus, a third-reviewer is consulted.

Data Analysis

The aim of a meta-analysis is to derive a conclusion with increased power and accuracy than what could not be able to achieve in individual studies. Therefore, before analysis, it is crucial to evaluate the direction of effect, size of effect, homogeneity of effects among studies, and strength of evidence [ 30 ]. Thereafter, the data are reviewed qualitatively and quantitatively. If it is determined that the different research outcomes cannot be combined, all the results and characteristics of the individual studies are displayed in a table or in a descriptive form; this is referred to as a qualitative review. A meta-analysis is a quantitative review, in which the clinical effectiveness is evaluated by calculating the weighted pooled estimate for the interventions in at least two separate studies.

The pooled estimate is the outcome of the meta-analysis, and is typically explained using a forest plot ( Figs. 3 and ​ and4). 4 ). The black squares in the forest plot are the odds ratios (ORs) and 95% confidence intervals in each study. The area of the squares represents the weight reflected in the meta-analysis. The black diamond represents the OR and 95% confidence interval calculated across all the included studies. The bold vertical line represents a lack of therapeutic effect (OR = 1); if the confidence interval includes OR = 1, it means no significant difference was found between the treatment and control groups.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f3.jpg

Forest plot analyzed by two different models using the same data. (A) Fixed-effect model. (B) Random-effect model. The figure depicts individual trials as filled squares with the relative sample size and the solid line as the 95% confidence interval of the difference. The diamond shape indicates the pooled estimate and uncertainty for the combined effect. The vertical line indicates the treatment group shows no effect (OR = 1). Moreover, if the confidence interval includes 1, then the result shows no evidence of difference between the treatment and control groups.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f4.jpg

Forest plot representing homogeneous data.

Dichotomous variables and continuous variables

In data analysis, outcome variables can be considered broadly in terms of dichotomous variables and continuous variables. When combining data from continuous variables, the mean difference (MD) and standardized mean difference (SMD) are used ( Table 2 ).

Summary of Meta-analysis Methods Available in RevMan [ 28 ]

The MD is the absolute difference in mean values between the groups, and the SMD is the mean difference between groups divided by the standard deviation. When results are presented in the same units, the MD can be used, but when results are presented in different units, the SMD should be used. When the MD is used, the combined units must be shown. A value of “0” for the MD or SMD indicates that the effects of the new treatment method and the existing treatment method are the same. A value lower than “0” means the new treatment method is less effective than the existing method, and a value greater than “0” means the new treatment is more effective than the existing method.

When combining data for dichotomous variables, the OR, risk ratio (RR), or risk difference (RD) can be used. The RR and RD can be used for RCTs, quasi-experimental studies, or cohort studies, and the OR can be used for other case-control studies or cross-sectional studies. However, because the OR is difficult to interpret, using the RR and RD, if possible, is recommended. If the outcome variable is a dichotomous variable, it can be presented as the number needed to treat (NNT), which is the minimum number of patients who need to be treated in the intervention group, compared to the control group, for a given event to occur in at least one patient. Based on Table 3 , in an RCT, if x is the probability of the event occurring in the control group and y is the probability of the event occurring in the intervention group, then x = c/(c + d), y = a/(a + b), and the absolute risk reduction (ARR) = x − y. NNT can be obtained as the reciprocal, 1/ARR.

Calculation of the Number Needed to Treat in the Dichotomous table

Fixed-effect models and random-effect models

In order to analyze effect size, two types of models can be used: a fixed-effect model or a random-effect model. A fixed-effect model assumes that the effect of treatment is the same, and that variation between results in different studies is due to random error. Thus, a fixed-effect model can be used when the studies are considered to have the same design and methodology, or when the variability in results within a study is small, and the variance is thought to be due to random error. Three common methods are used for weighted estimation in a fixed-effect model: 1) inverse variance-weighted estimation 3) , 2) Mantel-Haenszel estimation 4) , and 3) Peto estimation 5) .

A random-effect model assumes heterogeneity between the studies being combined, and these models are used when the studies are assumed different, even if a heterogeneity test does not show a significant result. Unlike a fixed-effect model, a random-effect model assumes that the size of the effect of treatment differs among studies. Thus, differences in variation among studies are thought to be due to not only random error but also between-study variability in results. Therefore, weight does not decrease greatly for studies with a small number of patients. Among methods for weighted estimation in a random-effect model, the DerSimonian and Laird method 6) is mostly used for dichotomous variables, as the simplest method, while inverse variance-weighted estimation is used for continuous variables, as with fixed-effect models. These four methods are all used in Review Manager software (The Cochrane Collaboration, UK), and are described in a study by Deeks et al. [ 31 ] ( Table 2 ). However, when the number of studies included in the analysis is less than 10, the Hartung-Knapp-Sidik-Jonkman method 7) can better reduce the risk of type 1 error than does the DerSimonian and Laird method [ 32 ].

Fig. 3 shows the results of analyzing outcome data using a fixed-effect model (A) and a random-effect model (B). As shown in Fig. 3 , while the results from large studies are weighted more heavily in the fixed-effect model, studies are given relatively similar weights irrespective of study size in the random-effect model. Although identical data were being analyzed, as shown in Fig. 3 , the significant result in the fixed-effect model was no longer significant in the random-effect model. One representative example of the small study effect in a random-effect model is the meta-analysis by Li et al. [ 33 ]. In a large-scale study, intravenous injection of magnesium was unrelated to acute myocardial infarction, but in the random-effect model, which included numerous small studies, the small study effect resulted in an association being found between intravenous injection of magnesium and myocardial infarction. This small study effect can be controlled for by using a sensitivity analysis, which is performed to examine the contribution of each of the included studies to the final meta-analysis result. In particular, when heterogeneity is suspected in the study methods or results, by changing certain data or analytical methods, this method makes it possible to verify whether the changes affect the robustness of the results, and to examine the causes of such effects [ 34 ].

Heterogeneity

Homogeneity test is a method whether the degree of heterogeneity is greater than would be expected to occur naturally when the effect size calculated from several studies is higher than the sampling error. This makes it possible to test whether the effect size calculated from several studies is the same. Three types of homogeneity tests can be used: 1) forest plot, 2) Cochrane’s Q test (chi-squared), and 3) Higgins I 2 statistics. In the forest plot, as shown in Fig. 4 , greater overlap between the confidence intervals indicates greater homogeneity. For the Q statistic, when the P value of the chi-squared test, calculated from the forest plot in Fig. 4 , is less than 0.1, it is considered to show statistical heterogeneity and a random-effect can be used. Finally, I 2 can be used [ 35 ].

I 2 , calculated as shown above, returns a value between 0 and 100%. A value less than 25% is considered to show strong homogeneity, a value of 50% is average, and a value greater than 75% indicates strong heterogeneity.

Even when the data cannot be shown to be homogeneous, a fixed-effect model can be used, ignoring the heterogeneity, and all the study results can be presented individually, without combining them. However, in many cases, a random-effect model is applied, as described above, and a subgroup analysis or meta-regression analysis is performed to explain the heterogeneity. In a subgroup analysis, the data are divided into subgroups that are expected to be homogeneous, and these subgroups are analyzed. This needs to be planned in the predetermined protocol before starting the meta-analysis. A meta-regression analysis is similar to a normal regression analysis, except that the heterogeneity between studies is modeled. This process involves performing a regression analysis of the pooled estimate for covariance at the study level, and so it is usually not considered when the number of studies is less than 10. Here, univariate and multivariate regression analyses can both be considered.

Publication bias

Publication bias is the most common type of reporting bias in meta-analyses. This refers to the distortion of meta-analysis outcomes due to the higher likelihood of publication of statistically significant studies rather than non-significant studies. In order to test the presence or absence of publication bias, first, a funnel plot can be used ( Fig. 5 ). Studies are plotted on a scatter plot with effect size on the x-axis and precision or total sample size on the y-axis. If the points form an upside-down funnel shape, with a broad base that narrows towards the top of the plot, this indicates the absence of a publication bias ( Fig. 5A ) [ 29 , 36 ]. On the other hand, if the plot shows an asymmetric shape, with no points on one side of the graph, then publication bias can be suspected ( Fig. 5B ). Second, to test publication bias statistically, Begg and Mazumdar’s rank correlation test 8) [ 37 ] or Egger’s test 9) [ 29 ] can be used. If publication bias is detected, the trim-and-fill method 10) can be used to correct the bias [ 38 ]. Fig. 6 displays results that show publication bias in Egger’s test, which has then been corrected using the trim-and-fill method using Comprehensive Meta-Analysis software (Biostat, USA).

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f5.jpg

Funnel plot showing the effect size on the x-axis and sample size on the y-axis as a scatter plot. (A) Funnel plot without publication bias. The individual plots are broader at the bottom and narrower at the top. (B) Funnel plot with publication bias. The individual plots are located asymmetrically.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f6.jpg

Funnel plot adjusted using the trim-and-fill method. White circles: comparisons included. Black circles: inputted comparisons using the trim-and-fill method. White diamond: pooled observed log risk ratio. Black diamond: pooled inputted log risk ratio.

Result Presentation

When reporting the results of a systematic review or meta-analysis, the analytical content and methods should be described in detail. First, a flowchart is displayed with the literature search and selection process according to the inclusion/exclusion criteria. Second, a table is shown with the characteristics of the included studies. A table should also be included with information related to the quality of evidence, such as GRADE ( Table 4 ). Third, the results of data analysis are shown in a forest plot and funnel plot. Fourth, if the results use dichotomous data, the NNT values can be reported, as described above.

The GRADE Evidence Quality for Each Outcome

N: number of studies, ROB: risk of bias, PON: postoperative nausea, POV: postoperative vomiting, PONV: postoperative nausea and vomiting, CI: confidence interval, RR: risk ratio, AR: absolute risk.

When Review Manager software (The Cochrane Collaboration, UK) is used for the analysis, two types of P values are given. The first is the P value from the z-test, which tests the null hypothesis that the intervention has no effect. The second P value is from the chi-squared test, which tests the null hypothesis for a lack of heterogeneity. The statistical result for the intervention effect, which is generally considered the most important result in meta-analyses, is the z-test P value.

A common mistake when reporting results is, given a z-test P value greater than 0.05, to say there was “no statistical significance” or “no difference.” When evaluating statistical significance in a meta-analysis, a P value lower than 0.05 can be explained as “a significant difference in the effects of the two treatment methods.” However, the P value may appear non-significant whether or not there is a difference between the two treatment methods. In such a situation, it is better to announce “there was no strong evidence for an effect,” and to present the P value and confidence intervals. Another common mistake is to think that a smaller P value is indicative of a more significant effect. In meta-analyses of large-scale studies, the P value is more greatly affected by the number of studies and patients included, rather than by the significance of the results; therefore, care should be taken when interpreting the results of a meta-analysis.

When performing a systematic literature review or meta-analysis, if the quality of studies is not properly evaluated or if proper methodology is not strictly applied, the results can be biased and the outcomes can be incorrect. However, when systematic reviews and meta-analyses are properly implemented, they can yield powerful results that could usually only be achieved using large-scale RCTs, which are difficult to perform in individual studies. As our understanding of evidence-based medicine increases and its importance is better appreciated, the number of systematic reviews and meta-analyses will keep increasing. However, indiscriminate acceptance of the results of all these meta-analyses can be dangerous, and hence, we recommend that their results be received critically on the basis of a more accurate understanding.

1) http://www.ohri.ca .

2) http://methods.cochrane.org/bias/assessing-risk-bias-included-studies .

3) The inverse variance-weighted estimation method is useful if the number of studies is small with large sample sizes.

4) The Mantel-Haenszel estimation method is useful if the number of studies is large with small sample sizes.

5) The Peto estimation method is useful if the event rate is low or one of the two groups shows zero incidence.

6) The most popular and simplest statistical method used in Review Manager and Comprehensive Meta-analysis software.

7) Alternative random-effect model meta-analysis that has more adequate error rates than does the common DerSimonian and Laird method, especially when the number of studies is small. However, even with the Hartung-Knapp-Sidik-Jonkman method, when there are less than five studies with very unequal sizes, extra caution is needed.

8) The Begg and Mazumdar rank correlation test uses the correlation between the ranks of effect sizes and the ranks of their variances [ 37 ].

9) The degree of funnel plot asymmetry as measured by the intercept from the regression of standard normal deviates against precision [ 29 ].

10) If there are more small studies on one side, we expect the suppression of studies on the other side. Trimming yields the adjusted effect size and reduces the variance of the effects by adding the original studies back into the analysis as a mirror image of each study.

eLife logo

  • Feature Article
  • Biochemistry and Chemical Biology

Meta-Research: Blinding reduces institutional prestige bias during initial review of applications for a young investigator award

Is a corresponding author

  • Nicole MF Patras
  • Jenna Hicks
  • Arnold and Mabel Beckman Foundation, United States ;
  • Health Research Alliance, United States ;
  • Open access
  • Copyright information
  • Comment Open annotations (there are currently 0 annotations on this page).

Share this article

Cite this article.

  • Anne E Hultgren
  • Copy to clipboard
  • Download BibTeX
  • Download .RIS
  • Figures and data

Introduction

Materials and methods, data availability, decision letter, author response, article and author information.

Organizations that fund research are keen to ensure that their grant selection processes are fair and equitable for all applicants. In 2020, the Arnold and Mabel Beckman Foundation introduced blinding to the first stage of the process used to review applications for Beckman Young Investigator (BYI) awards: applicants were instructed to blind the technical proposal in their initial Letter of Intent by omitting their name, gender, gender-identifying pronouns, and institutional information. Here we examine the impact of this change by comparing the data on gender and institutional prestige of the applicants in the first four years of the new policy (BYI award years 2021–2024) with data on the last four years of the old policy (2017–2020). We find that under the new policy, the distribution of applicants invited to submit a full application shifted from those affiliated with institutions regarded as more prestigious to those outside of this group, and that this trend continued through to the final program awards. We did not find evidence of a shift in the distribution of applicants with respect to gender.

Studies on the impact of blinding in peer review, including studies that either fully remove the identities of applicants or use other methods to mask or change perceived applicant identity, have shown mixed results as to the benefits of blinding with respect to bias against certain populations. With respect to gender, the range in outcomes include studies that show a reduction in bias with blinded reviews ( Johnson and Kirk, 2020 ; Goldin and Rouse, 2000 ), those that show no effect between blinded or unblinded reviews ( Tomkins et al., 2017 ; Forscher et al., 2019 ; Marsh et al., 2008 ; Ross et al., 2006 ), and those that find unblinded reviews have less bias in their outcomes ( Ersoy and Pate, 2022 ). Studies evaluating outcomes with respect to race similarly show a range of outcomes from reduction in bias with blinded reviews ( Nakamura et al., 2021 ), to no effect between blinded or unblinded reviews ( Forscher et al., 2019 ).

Other studies have examined if blinding can reduce bias with respect to author institutional affiliation, and several studies have shown reduction in bias towards highly prestigious institutions and authors ( Tomkins et al., 2017 ; Ross et al., 2006 ; Ersoy and Pate, 2022 ; Nakamura et al., 2021 ; Sun et al., 2022 ). Additionally, one study that examined a process in which an initial blinded review was followed with an unblinded review, found that the reviewers were more likely to increase their scores in the second (unblinded) stage if the author was considered to be from an institution with additional resources and have an author with extensive prior experience, as evidenced through the comments collected from the reviewers ( Solans-Domènech et al., 2017 ). Notably, these studies have used different proxies for ‘institutional prestige’, including published ranked lists of institutions from independent entities ( Tomkins et al., 2017 ; Ross et al., 2006 ), age of the institution itself ( Marsh et al., 2008 ), list of institutions chosen by the study authors ( Ersoy and Pate, 2022 ), publication records ( Sun et al., 2022 ), and size of the institution by student population ( Murray et al., 2016 ).

The mission of the Arnold and Mabel Beckman Foundation (AMBF) is to fund innovative research projects, especially those that open new avenues of research in chemistry and life sciences. In particular, the foundation’s Beckman Young Investigators (BYI) program seeks to satisfy Dr Arnold O Beckman’s directive to ‘support young scientists that do not yet have the clout to receive major federal research grants,’ and we strive to support outstanding young scientists who are moving into new, transformative areas of scientific inquiry. It stands to reason that our mission could be more effectively fulfilled if we ensure that our review process is insulated as much as possible from implicit or explicit gender or institutional prestige bias by reviewers.

In this article we report the results of a multi-year study to assess the impact of blinding gender and institutional affiliation in the first stage of the application process for BYI awards, which involves applicants submitting a technical proposal as part of an initial Letter of Intent (LOI). In 2020 applicants were instructed to blind their technical proposals by omitting their name, gender, gender-identifying pronouns, and institutional information. We have examined the impact of this change in policy by comparing data on all stages of process (from the initial review through to the award of grants) for the four years before the new policy was introduced (2017–2020) and the first four years of the new policy (2021-2024). We were not able to perform a similar analysis with regards to race as our application process during unblinded review years did not collect race or ethnicity information from applicants.

Throughout the years included in this study, 2017–2024, the criteria for applicant eligibility as well as the overall review process were not substantively changed. Thus, in the absence of institutional prestige or gender bias in our review process, the distribution of advancing LOIs and program awards before and after blinding should be similar. This was the case when considering gender distribution, providing no evidence of gender bias in either the unblinded or the blinded reviews. However, upon blinding at the LOI stage, we did find a reduction in the relative advantage of more prestigious institutions in advancing to a full application invitation and in receiving a program award. We therefore conclude that there was an institutional prestige bias in our review process, and that blinding helped to reduce the impact of that bias. This reduction in bias brings us closer to our goal of the equitable allocation of research funding resources based on scientific merit that is not influenced by implicit or explicit institutional prestige bias from reviewers.

Study limitations

The application and review processes described in this study were conducted during the normal operations of AMBF, which introduced several limitations to the study. We neither requested from applicants, nor created ourselves, blinded and unblinded versions of the same applications to review the same set of research proposals in both formats to compare outcomes. We also did not explicitly test if the institution had a direct impact on reviews by, for example, taking a research proposal from a highly ranked institution and changing the affiliation of the applicant to a lower ranked institution, and vice-versa, in order to compare review outcomes with the original and modified institutional affiliations.

In addition, we appoint a new set of reviewers to assist with the BYI review process every year, and therefore the mix of reviewer technical expertise and potential biases varies from year to year, and we do not have multi-year data using the same set of reviewers. We also did not request reviewers to provide a list of institutions that they perceive to have high prestige and therefore the ranked lists we used in this study may or may not accurately reflect the implicit biases of our reviewers.

Finally, our unblinded applications did not request applicants to self-report their gender and so the data used in the unblinded gender analysis was curated manually from the applicant’s name which may have led to some errors in gender assignments. In 2020, AMBF began collecting self-reported gender from applicants for our internal use and those data were used for the later years of this study.

The total number of reviewed LOIs, Full Applications invitations, and BYI Program Awards in program years 2017–2020 (unblinded) and 2021 – 2024 (blinded) is presented in Table 1 ; the 2024 Program Awards have not been completed as of manuscript preparation.

Numbers of Letters of Intent (LOIs), Full Application Invitations, and BYI Program Awards for 2017–2024.

Institutional prestige.

To examine the extent to which institutional prestige bias may have influenced our LOI review process, we developed eight institutional ranking schema, further divided into five institutional categories, based on published rankings of institutions from four independent organizations as well as from an internal analysis of historical AMBF funding trends, described further in the Methods section. We then calculated the Relative Advantage–Full Application across institutional categories for each schema: a ratio of the percentage of LOIs submitted in a particular institutional category that received an invitation to submit a full application to the percentage of all LOIs that received a full application invitation. If there was no implicit or explicit institutional prestige bias from our reviewers in the LOI review process, then we would expect the Relative Advantage–Full Application would be the same in the unblinded and blinded reviews. Therefore, we examined the difference in Relative Advantage–Full Application between the unblinded and blinded reviews to determine if there was institutional prestige bias in our review process. We repeated the analysis to determine the Relative Advantage–Award for each institutional category as a ratio of the percentage of LOIs submitted in a particular institutional category that received a program award to the percentage of all LOIs that received a program award. Again, we examined the difference in Relative Advantage–Award from unblinded and blinded reviews to determine if there was institutional prestige bias in our review process.

Figure 1 shows the average Relative Advantage–Full Application afforded to an LOI applicant to receive an invitation to submit a full application within an institutional category for each of the eight ranked institutional lists used in this study, when comparing the unblinded and blinded reviews. During the unblinded reviews, the range of the Relative Advantage–Full Application for a full application invitation for an LOI from the ‘1–10’ institutional categories was 1.4–1.7 times higher than the average percentage, and the range for the ‘11–25’ institutional categories was 1.2–1.4 times higher, illustrating a consistent bias in favor of more prestigious institutions. After requiring the submission of blinded LOIs, the Relative Advantage–Full Application for an invitation for a full application from both the ‘1–10’ and ‘11–25’ institutional categories decreased to 1.0–1.3 times the average. Importantly, the change in the Relative Advantage–Full Application of receiving a full application invitation for those from the ‘Other’ institutional category increased from 0.60 to 0.83 times the average during unblinded reviews to 0.81–0.93 with the blinded reviews. This increase is significant as the potentially transformative ideas from junior faculty at these institutions, which might otherwise have been lost to the institutional prestige bias in the unblinded review, are now being considered at the full application review stage. Finally, we found that the ranges for the Relative Advantage–Full Application of receiving a full application invitation for applicants in the ‘26–50’ and ‘51–100’ categories remained unchanged between the blinded and unblinded review processes, being 0.83–1.0 times and 0.80–0.95 times lower than the average percentage respectively.

what is meta analysis in education research

Relative Advantage–Full Application.

Ratio of the percentage of LOI applicants in different institutional categories receiving an invitation to submit a Full Application, compared to the percentage of any LOI applicant receiving an invitation to submit a Full Application during unblinded reviews (solid bars, left) and blinded reviews (hatched bars, right). The eight different institutional rankings used in the study were: ( A ) NCSES/NSF-2018; ( B ) NCSES/NSF-2020; ( C ) Shanghai Ranking-2018; ( D ) Shanghai Ranking-2023; ( E ) Times Higher-2018; ( F ) Times Higher-2023; ( G ) CWTS Leiden: 2018–2021; ( H ) AMBF historical funding: 1990–2018.

Figure 1—source data 1

BYI LOIs, Full Application Invitations, and Program Awards by institutional category.

All LOIs Received, Full Application Invitations, and Program Awards by institutional category from 2017 to 2020 (unblinded) and 2021–2024 (blinded).

To test for systemic bias, the average value and ranges of the Relative Advantage–Full Applications by institutional category and the results of Chi-squared tests of association between institutional category and full application invitation status within the blinded and unblinded processes for each institutional ranking schema are presented in Table 2 . The relationship between institutional category and full application invitation status was statistically significant when reviews were unblinded, across all institutional ranking schema. This association represents a medium sized effect (as measured by Cramer’s V; range = 0.14–0.23, average = 0.20). After changing to the blinded review process, the relationship between institution category and full application invitation status is not consistently statistically significant (depending on the institutional ranking schema), and the effect size of the association is decreased compared to that under the unblinded review process (as measured by Cramer’s V; range 0.08 – 0.15; average = 0.13). We find that the change to the blinded review process resulted in a consistent decrease in the Chi-squared as well as the Cramer’s V statistic for each institutional ranking list, and we conclude that blinded review reduced the impact of institutional prestige bias on full application invitation rates.

The average value and ranges by institutional category, with Chi-squared association test and Cramer’s V statistic of unblinded and blinded LOI reviews for full application invitations, for the eight institutional rankings used in the study.

In addition to the individual institutional ranking lists used above, we also created a Consensus Institutional Ranking list by averaging the ranking of the 96 institutions that appeared on at least five of the individual lists. We reasoned that this consensus list might best mirror how our reviewers experience the reading of these individual lists over time and how they consolidate this information into their own heuristic of institutional prestige. We then repeated the calculation of the Relative Advantage–Full Application against the Consensus Institutional Ranking list.

Figure 2 shows the average Relative Advantage–Full Application afforded to an LOI applicant to receive an invitation to submit a full application within a category in the consensus list, when comparing the unblinded and blinded reviews. During the unblinded reviews, the range of the Relative Advantage–Full Application for a full application invitation for an LOI from the ‘1–10’ and ‘11–25’ categories in the consensus list was 1.6 times and 1.3 times higher than the average percentage respectively, confirming a consistent bias in favor of more prestigious institutions. After requiring the submission of blinded LOIs, the Relative Advantage–Full Application for an invitation for a full application from the ‘1 – 10’ and ‘11 – 25’ categories both decreased to 1.2 times the average. Importantly, the change in the Relative Advantage–Full Application of receiving a full application invitation for those from the ‘26–51’ and ‘Other’ categories again increased from 0.85 times and 0.70 times the average during unblinded reviews, to 1.0 times and 0.88 times the average with the blinded reviews. Finally, consistent with the findings in the individual lists, we found that the Relative Advantage–Full Application of receiving a full application invitation for applicants in the ‘51–96’ category in the consensus list decreased from the blinded and unblinded review processes, from 0.80 times to 0.73 times lower than the average percentage respectively.

what is meta analysis in education research

Relative Advantage–Full Application with Consensus Institutional Ranking.

Ratio of the percentage of LOI applicants in each category in the consensus listing receiving an invitation to submit a Full Application, relative to the percentage of any LOI applicant receiving an invitation to submit a Full Application during unblinded reviews (solid bars, left) and blinded reviews (hatched bars, right; three years of data (2021–2023)).

The average value and ranges of the Relative Advantage–Full Applications by institutional category in the consensus list, and the results of Chi-squared tests of association between institutional category and full application invitation status within the blinded and unblinded processes, are presented in Table 3 . As before, the relationship between institutional category and full application invitation status was statistically significant when reviews were unblinded. This association represents a medium sized effect as measured by Cramer’s V=0.22. After changing to the blinded review process, the effect size of the association is decreased compared to that under the unblinded review process with Cramer’s V=0.15. We find that the change to the blinded review process resulted in a consistent decrease in the Chi-squared as well as the Cramer’s V statistic. Thus, we conclude that the analysis with the Consensus Institutional Ranking list also shows that the blinded review reduced the impact of institutional prestige bias on full application invitation rates.

The average value and ranges by category in the consensus ranking, with Chi-squared association test with Cramer’s V statistic of unblinded and blinded LOI reviews for full application invitations.

We continued the analysis of our review outcomes relative to institutional prestige through to program awards for the study years to determine if the change in institutional representation present in the submitted full applications would extend also to program awards. We calculated the Relative Advantage–Award across all institutional category schema described above: a ratio of the percentage of LOIs submitted in a particular institutional category that received a program award to the percentage of all LOIs that received a program award. As before, if institutional prestige bias did not influence the awardee selection process, then the Relative Advantage–Award would be the same between unblinded and blinded reviews, and we examined the difference in Relative Advantage–Award to determine if there was institutional prestige bias in our selection process. Due to the limited number of program awards that are selected each year, we would expect an average of two awards per category each year if there is no institutional prestige bias. The analysis of program awards with the Consensus Institutional Ranking list is presented here in Figure 3 and Table 4 , and the results of the analysis for all institutional categories is presented in Figure 1—figure supplement 1 and Table S1 in Supplementary file 1 . In addition, the blinded review data has only three years of awardees, as the fourth year of awardee selection has not been finalized as of manuscript preparation.

what is meta analysis in education research

Relative Advantage–Awards with Consensus Institutional Ranking.

Ratio of the percentage of LOI applicants in each category in the consensus lisiting receiving a Program Award, compared to the percentage of any LOI applicant receiving a Program Award during unblinded reviews (solid bars, left) and blinded reviews (hatched bars, right; three years of data (2021–2023)).

Relative Advantage–Award with Consensus Institutional Ranking.

The average value and ranges for consensus categories, with Chi-squared association test and Cramer’s V statistic of unblinded and blinded LOI reviews through program awards. Analysis of blinded reviews relied on three years of data (2021–2023).

Figure 3 shows the average Relative Advantage–Award afforded to an LOI applicant to receive program award within the categories in the consensus list, when comparing the unblinded and blinded reviews. During the unblinded reviews, the average of the Relative Advantage–Award for an LOI from the ‘1–10’ and ‘11–25’ categories was 2.5 times and 2.0 times higher than the average percentage respectively. With this relative advantage, on average 75% of AMBF’s annual program awards were to the top 25 institutions ranked on this list, out of the 287 institutions who applied to the BYI Program during the study years. After requiring the submission of blinded LOIs, the Relative Advantage–Award for a program award from the ‘1–10’ and ‘11–25’ categories decreased to 1.8 times and 1.4 times the average respectively, which represents an average of 45% of the annual program awards. While still an advantage for the top ranked institutions, the decrease in awards to these top institutions open opportunities for the highly rated candidates from other institutions to receive a BYI Program award. The change in the Relative Advantage–Award for those from the ‘26–50’ category increased with the blinded reviews from 0.23 to 0.91 times the average percentage, and the change for the ‘Other’ category was from 0.42 to 0.83 times the average. Finally, we found that the Relative Advantage – Award for applicants in the ‘51 – 96’ category decreased moderately from 0.54 to 0.45 times lower than the average percentage during blinded reviews.

Again, the average value and ranges of the Relative Advantage–Award by category in the consensus list, and the results of Chi-squared tests of association between institutional category and program award status, within the blinded and unblinded processes is presented in Table 4 . The full set of average values and ranges for each institutional category schema is presented in Table S1 in Supplementary file 1 . The relationship between category in the consensus list and program award status was statistically significant when reviews were unblinded, with a medium sized effect as measured by Cramer’s V=0.15. After changing to the blinded review process, the relationship between category and program award status is not statistically significant, and the effect size of the association is decreased compared to that under the unblinded review process. We find that the change to the blinded review process resulted in a decrease in the Chi-squared as well as the Cramer’s V statistic for the consensus list, and we conclude that blinded review at the LOI stage also contributed to reducing the impact of institutional prestige bias on program award rates.

In addition to institutional prestige bias, we assessed our review process relative to potential gender bias, with assignment of applicants into female or male gender categories as described in the Methods section. Figure 4 shows the percentage of female applicants that submitted an LOI, received a full application invitation, and received a program award over the years of the study. The percentage of full application invitations to female applicants is consistent with the percentage of LOIs received from female applicants over the years studied, to include unblinded and blinded reviews. The percentage of program awards has a higher variability, but the average of the study years again tracks the percentage of female LOI applicants with no evidence of gender bias in the review process.

what is meta analysis in education research

Outcomes for female applications.

Percentage of female LOI applicants to receive a Full Application invitation and Program Award by year. Between 2017 and 2020 the reviews of initial LOIs were not blinded; from 2021 onwards the reviews of initial LOIs were blinded; Program Awards for 2024 had not been finalized as of manuscript preparation.

Figure 4—source data 1

BYI Letters of Intent (LOIs), Full Application Invitations, and Program Awards by gender category.

All LOIs Received, Full Application Invitations, and Program Awards by gender category for 2017–2020 (unblinded) and 2021–2024 (blinded).

Using the same methodology as the analysis of institutional prestige, we calculated the Relative Advantage–Gender as the ratio of the percentage of each gender to receive a full application invitation or program award relative to the percentage of any applicant to receive a full application invitation or program award. Table 5 presents the average value and ranges of the Relative Advantage – Gender for the two gender categories and the results of Chi-squared tests of association between gender category and full application invitation or program award status within the blinded and unblinded processes. The relationship between gender category and full application invitation or program award status is not statistically significant, and there is no effect of gender bias from the review process as seen by the Cramer’s V of 0.00–0.03 across all conditions studied. We therefore conclude that our review process and subsequent program awards did not demonstrate gender bias, either before or after blinding the LOI reviews.

Relative Advantage–Gender.

The average value and ranges for gender categories, with Chi-squared association test and Cramer’s V statistic of unblinded and blinded LOI reviews in full application invitations and program awards. Analysis of blinded reviews relied on three years of data (2021–2023).

Overall, this work illustrates one of the major challenges AMBF and other funders face when supporting young scientists. We seek to find the exceptional individuals conducting transformative science at all institutions, not just those that are at institutions perceived by reviewers to have higher prestige. However, in the drive to support the most exciting and innovative researchers and ideas, it is apparent that the qualitative metric of institutional affiliation (whether explicit or implicit) often used in applicant assessment may not be a good proxy for scientific innovation. The bias we found towards prestigious institutions in the unblinded reviews reveals that faculty at institutions with a lesser reputation are not afforded the same allowances from reviewers as those applicants at institutions with a greater reputation.

We observed that the institutional prestige in our review process was reduced, but not eliminated, by blinding the LOI applications, indicating that there may be other bona fide measurable differences between institutional categories. The origin of this difference may be a combination of the quality of the candidates themselves, the physical research infrastructure of their universities, and the resources and support available to junior faculty at well-resourced institutions, including in submitting applications to funding opportunities. There also may have been additional factors that influenced our results between the application cycles. For example, we did receive fewer overall applications in blinded years of our study due to COVID-19 research disruptions and hiring freezes across institutions. However, LOIs were received in each institutional category consistently across the studied years indicating that our initial populations were composed of LOIs with similar institutional diversity, as shown in Table 6 . Additionally, we informed our reviewers at the beginning of the review process that the purpose in blinding the LOIs was in part to study if there was institutional prestige bias in our reviews. This awareness may have impacted the scoring of proposals if the reviewers were more willing to advance LOIs or full applications that had weaknesses attributed to lack of mentorship or access to resources, which might indicate that the applicant was from an institution that did not have the same level of research support infrastructure.

Percentage of LOIs Received per Institutional Category for the eight institutional ranking and the consensus ranking.

There is no universally accepted ranking of institutions and therefore there are many possible methods that can be used to define and measure the conceptual quality of institutional prestige. Every reviewer brings to the review process their own unique perception of institutional prestige based on the qualities that they value most which may also influence any implicit or explicit bias. For this study, we chose several lists that used different measurable variables in their institutional rankings, including federal research funding received, research publications and citations by faculty, prestigious awards received by faculty, and student outcomes, among other variables. These variables were likely important to our reviewers and are relevant to our mission of supporting basic science, however they may or may not have covered all possible variables or may have been too restrictive. For example, the Leiden ranking list had the lowest correlation with our review outcomes, possibly because the filter settings we used for that list included publications within biomedical and related journals, which may not have accurately reflected our total applicant and reviewer pool that come from disciplines across chemistry, biology, and life sciences. Also, the ranked list based on historical AMBF funding was not provided to any reviewers or previously published but was included because of our own internal reviewer recruitment practices.

Anecdotally, reviewers who participated in the BYI LOI reviews reported that the blinded materials were easier to read, that there was a significant reduction in their workload to review the same number of LOIs, and the review meetings were shorter as it was easier to focus solely on the merits of the technical proposal during the discussion. With these observed benefits in reducing institutional prestige bias and the positive comments from our reviewers, AMBF plans to continue to use the blinded review for the BYI LOI application step, and we have also adopted these same review methods to our Postdoctoral Fellowship program application process. In sharing these results, AMBF hopes this information will be informative for others moving forward with evaluation of their application review processes with the goal of instituting more equitable practices ( Franko et al., 2022 ), especially for those organizations with missions and funding programs similar to our own.

Future directions

AMBF will continue to monitor our application metrics as we strive to continuously improve our own internal methods and processes for a fair evaluation process for any applicant to our programs. AMBF is also increasing our outreach and materials available to assist applicants and encouraging applicants interested in our programs from institutions that may not have the same level of internal resources to consider applying. Finally, AMBF will continue to diversify our reviewer cohort to ensure multiple perspectives are included at all levels of the review process.

Experimental design

The AMBF BYI Program application process, shown in Figure 5 , starts with an open call for LOIs from junior faculty who are within the first four years of a tenure-track appointment at US institutions, followed by a limited number of invited full applications selected from the submitted LOIs. The Foundation broadly distributes an announcement to research institutions annually when the program opens, and all US institutions that have tenure track, or equivalent, positions are encouraged to apply. The LOI review step was selected as the focus of this analysis as it is the first gate in the application process, and it is the step with the most applications and therefore the highest workload for our reviewers. Past studies have shown that implicit bias is strongest when reviewer workload is high ( Kahneman, 2011 ). The unblinded LOI included a three-page technical proposal, the applicant’s bio sketch, and a chart of external funding either pending or received. During unblinded reviews, all three documents were provided to reviewers. When we transitioned to blinded reviews, the LOI was extended to a four-page blinded technical proposal which was shared with reviewers, and the applicant’s bio sketch, self-reported gender and ethnicity information, and external funding charts were collected by the Foundation and used internally only.

what is meta analysis in education research

Schematic of the BYI Application Review Process.

For the blinding process, we provided instructions for applicants to blind their own technical proposals prior to submission by not including applicant name, gender, gender-identifying pronouns, or institutional information in their technical proposal, along with specific formatting to follow for referencing publications ( AMBF, 2023 ). Submitted LOIs were reviewed by Foundation staff for eligibility and compliance with the blinding process prior to assigning LOIs to review panels through our online portal. This method was used to reduce the administrative burden on AMBF staff members; staff did not edit or modify any submitted LOIs to comply with the blinding requirements. We found that compliance with the blinded technical proposal preparation was very high, and those applicants who blatantly did not follow the blinding rules were generally ineligible to apply based on other Foundation criteria. All eligible LOIs were assigned a random 4-digit number to be used as the proposal identifier for the reviewers throughout the LOI review process and discussions.

In compliance with the Foundation’s standard practice, reviewers were recruited from the population of tenured researchers at US institutions, also including past BYI award recipients. Panels composed of three reviewers were assigned sets of eligible LOIs to review, while ensuring that no reviewer was assigned an applicant from their same institution. For all program years in this study, written reviewer scores and comments were collected from each reviewer independently and then LOI review panel calls were held via teleconference to discuss the merits and concerns of the LOIs and select those to advance to full application. Each panel was instructed on the number of LOIs they could recommend advancing to the Full Application stage and each panel completed their selections independently of the other panels. The outcome of the LOI review each year was to select about 100 total applicants who were then invited to submit full applications.

During the blinded review process, reviewers were asked to assess the merits of the technical proposal as well as to confirm that the blinding rules were followed by the applicant. If any reviewer identified an applicant as not complying with the blinding rules, that LOI was discussed at the start of the review panel meeting and could be disqualified from consideration with a majority vote of the review panel. Notably, an application was not disqualified if a reviewer believed they could infer the identity of the author as long as the blinding rules were followed. Only four LOIs were eliminated from consideration by review panel vote in the years of blinded review.

The process of full application submission and review remained similar during all years of the study. The selected LOI applicants were invited to submit a full application which consisted of an updated six-page research proposal, proposed budget and timeline, bio sketch, letters of recommendation from advisors and colleagues, and institutional support forms. The full applications were reviewed by four panels of three reviewers each, composed of a subset of the reviewers who had assisted in the LOI review stage. Two of the panels reviewed full applications in biology related fields and the other two panels reviewed full applications in chemistry related fields. Written reviewer scores and comments were collected from each reviewer independently and then the biology and chemistry panels convened for a day and a half to discuss each proposal. On the first day, each of the four panels selected their top eight full applications, for a total of 16 advancing biology full applications and 16 advancing chemistry full applications. On the second day, these full applications were discussed within combined biology and chemistry panels to select the top 16 candidates to invite to an interview. The interviews with the candidates were conducted with members of the BYI Program Executive Committee, in-person through 2019 and virtually in subsequent years. The Interview Committee provided recommendations for program awards to the AMBF Board of Directors for final approval.

Institutional categories

To study if there was a change in the institutional prestige of selected LOIs before and after blinding the review process, we developed a method to assign each LOI into an institutional category related to the institutional prestige of the applicant’s affiliated institution, including associated medical schools and research institutes. There are several rankings and lists of universities and research institutions that use metrics that could be associated with prestige within the scientific research community. We examined the impact of our review process against seven different ranked lists, over multiple years, as well as a ranking of institutions that have historically received funding from AMBF. We reasoned that these lists, which encompass research funding, publications, faulty awards, and teaching, would include the universities and colleges with chemistry, life sciences, engineering, and interdisciplinary research programs that had recently applied for and received science funding through similar peer review process as ours and are therefore institutions likely identified as having prestigious reputations by our reviewers.

Table S2 in Supplementary file 2 contains, in rank order, the top 100 institutions for each of the eight institutional lists used in this study:

Lists based on data released by the National Center for Science and Engineering Statistics (NCSES), which is part of the National Science Foundation (NSF). The data we used was for federal obligations for science and engineering funding to universities and colleges in 2018 ( NSF, 2018a ) and in 2020 ( NSF, 2020 ). Data comes from an annual survey in which federal funding agencies are asked to report their obligations for science and engineering funding ( NSF, 2018b ).

The Shanghai Academic Ranking of World Universities, which annually ranks universities by several academic or research performance indicators, including alumni and staff winning Nobel Prizes and Fields Medals, highly cited researchers, papers published in Nature and Science , papers indexed in major citation indices, and the per capita academic performance of an institution ( Shanghai Ranking, 2023 ). The list used in this study was filtered for universities in the US, and for years 2018 and US, and 2023. The top 100 ranked universities (including ties) were included.

The Times Higher Education World University Rankings are based on 13 calibrated performance indicators that measure an institution’s performance across four areas: teaching, research, knowledge transfer and international outlook ( Times Higher Education, 2023 ). The list used in this study was filtered for Research Universities in the United States, and for years 2018 and 2023. The top 100 ranked universities were included.

The Leiden Ranking, published by Centre for Science and Technology Studies at Leiden University, is based on bibliographic data on scientific publications, in particular on articles published in scientific journals using Web of Science as the primary data source ( CWTS, 2023 ). The list used in the study was filtered for time period 2018 – 2021, discipline of ‘Biomedical and health sciences’, United States, and sorted by scientific impact.

A ranked list of funding received by Institution from AMBF in the years 1990–2018, including funding from the BYI Program, the Beckman Scholars Program, and the Arnold O Beckman Postdoctoral Fellows Program. We often recruit reviewers from our past awardees, which may have introduced an implicit bias to these institutions from the structure of our reviewer pool.

For the NCSES, Shanghai and Times Higher lists, the ranks for the universities did not change appreciably between the years examined. Among the lists, there were some universities with consistent ranks, such as Johns Hopkins University with ranks between 1 and 15, and University of Virginia with ranks between 46 and 62. However, some universities had much larger discrepancies among the lists, such as California Institute of Technology with ranks of 2, 7, 24, 53, 66 and not appearing in the top 100 on one list. As an additional ranked list to use in the study, we created a ‘consensus’ list by averaging the ranks for the 96 institutions that appeared on a majority (at least five) of the selected lists (see Table S2 in Supplementary file 2 ).We divided each list into five institutional categories, with the top category including just 10% of the ranked institutions, as the top institutions disproportionately secure most of the research funding ( Lauer and Roychowdhury, 2021 ; NSF, 2018a ; NSF, 2020 ):

‘1–10’: The first ten institutions in the ranked list.

‘11–25’: The next 15 institutions in the ranked list.

‘26–50’: The next 25 institutions in the ranked list.

‘51–100’: The remaining 50 institutions in the ranked list; or 51–96 for the consensus list.

’Other’: The institutions that applied to the AMBF BYI Program during the study years that were not included in the categories above.

As we often receive more LOI applications from the institutions ranked highly on these lists, the five categories also divides the LOIs received into each of these categories. Table 6 presents the percentage of LOIs received in each institutional category, averaged across all study years, by ranking list.

After defining the institutional categories, we assigned each LOI into an institutional category based on the applicant’s affiliation and then we analyzed the number of LOIs received in each institutional category, the number of full application invitations in each institutional category, and the number of program awards made by the Foundation in each institutional category. The full list of LOIs, Full Application Invitations, and Program Awards by year is included in Table S3 in Supplementary file 3 , sorted alphabetically by the 287 institutions that submitted LOIs during the study years. If there was no bias towards institutional prestige in our reviews, then we would expect that the percentage of LOIs that advance to a full application invitation and to a program award within each institutional category relative to the total percentage of LOIs that advance to a full application invitation and to a program award would be the same before and after the blinded reviews.

To study if there was a gender bias based on applicant characteristics in our review process before and after blinding the reviews, we assigned each LOI to a category of female or male. For the unblinded years, applicants were not asked to self-identify their gender in the application materials and the data presented are from AMBF research of applicant name and affiliation to associate each LOI into a category of female or male. For the blinded application years, applicants did self-identify their gender during LOI submission, and the ‘female’ category includes applicants that identify as female, and the ‘male’ category includes applicants that identify as male, trans-male (self-reported by one LOI applicant), and non-binary (self-reported by five LOI applicants, all of whom have traditionally male first names). If there was no bias towards gender in our reviews, then we would expect that the percent of LOIs that advance to a full application invitation and to a program award within each gender category would equal the total percentage of LOIs that advance to a full application invitation and to a program award.

Statistical analysis

A Chi-squared test for independence ( McHugh, 2013 ) was used to examine the association between institutional category and invitation status (invited to submit a full application, or not invited to submit a full application) and award status (awarded, or not awarded), as well as gender and invitation status (invited to submit a full application, or not invited to submit a full application) and award status (awarded, or not awarded). Observed frequencies were compared to the expected frequencies, calculated based on the average invitation and award rate for all LOIs. Results of statistical analyses were considered statistically significant at P <0.05. Cramer’s V statistic is also provided as a measurement of the effect size for the Chi-squared tests.

All data generated or analysed during this study are included in the manuscript and associated source data files.

  • Google Scholar
  • Forscher PS
  • Escobar-Alvarez S
  • Redman Rivera L
  • Kahneman DT
  • Roychowdhury D
  • Jayasinghe UW
  • Nakamura RK
  • Braithwaite J
  • Hachinski VC
  • Krumholz HM
  • Shanghai Ranking
  • Solans-Domènech M
  • Guillamón I
  • Ferreira-González I
  • Permanyer-Miralda G
  • Barry Danfa J
  • Teplitskiy M
  • Times Higher Education
  • Peter Rodgers Senior and Reviewing Editor; eLife, United Kingdom

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "Blinding Reduces Institutional Prestige Bias During Initial Review of Applications for a Young Investigator Award" to eLife for consideration as a Feature Article. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by Peter Rodgers of the eLife Features Team. Both reviewers have opted to remain anonymous.

The reviewers and editors have discussed the reviews and we have drafted this decision letter to help you prepare a revised submission. Addressing the comments listed below will require substantial work, so you may prefer to submit the manuscript to a different journal: if you decide to do this, please let me know.

In this manuscript the authors report that blinding the identity of applicants during the first round of review for applications for a Beckman Young Investigator Award led to a reduction in prestige bias. This topic is of central importance for fair scientific funding allocation. The manuscript is clearly written, and some of the key limitations are openly discussed. However, there are a number of concerns that need to be addressed, notably concerning the proxy for prestige used by the authors. Moreover, the authors' dataset could also be used to explore possible gender bias in the review for these awards.

Essential revisions

1. The manuscript, as it stands, does not bring much more to the topic than what has been published in previous papers, such as Solans-Domenech et al., 2017, which analyzed a larger sample (and is not cited by the authors – see point 4 below). In order to improve the manuscript I would advise the authors to expand the scope as follows:

1a. The option and justification given by the authors of not studying the effect of gender is not convincing. Although applicants were not asked about their gender before 2021, it could still be possible to perform an analysis based on a proxy where first names would be associated to gender.

1b. The following point is optional: It would be interesting to check the final outcome of the evaluation, at the second step, when the proposal is not blinded anymore, and to assess if the final set of successful candidates (in terms of affiliations) differ from those before the blinding step was introduced.

2. The prestige ranking is based on total funding size, which is certainly correlated with the prestige of the institutions. However, this measure is also highly correlated with the size of the institutions, which might not accurately reflect the prestige: for example, Harvard, Yale and MIT are in the second tier; Chicago, Berkeley and Caltech are in the third tier; and there are also some very prestigious institutions (such as Rockefeller University, Fred Hutchinson Cancer Research Center, and the Mayo Clinic) in the bottom tier.

This point could be addressed by using a modified prestige ranking obtained by normalizing the funding size with respect to the size of the institutions (e.g., the number of researchers) might be more appropriate.

Another approach would be to use other university rankings (eg Shanghai, THE, Leiden, etc).

Our suggestion is that several different proxies for prestige should be used in the study (eg two or more of NSF, NSF normalized, Shanghai, THE, Leiden, and maybe even the proxy based on previous performance when applying to Beckman), and that the results of all the proxies studied should be reported.

3. There are a number of points regarding the statistical analysis that need to be addressed

3a. Regarding the NSF ranking: the 2018 data has been used, but as the study covers 8 years, the authors should at least have tested if rankings from different years would give different results.

3b. The division of prestige categories into four groups based on equal partition of total national funding, while reasonable, still seems a bit arbitrary. It might be of interest to test the robustness of the results with respect to the number of categories chosen.
3c. The sample size in the treatment group (data from 2021-2023) is substantially smaller than the sample size in the control group (data from 2016-2020). Therefore, the fact that the p-value becomes insignificant might simply be because of reduced sample size instead of reduced prestige bias. Down-sampling the institutions for 2016-2020 might make the comparison more fair. Or alternatively, a difference-in-difference design might be able to show the effect more clearly.

4. The discussion of the existing literature on bias in research grant evaluation needs to be improved. I recommend that the authors cite and discuss the following articles:

Forscher, P.S., Cox, W.T.L., Brauer, M. et al. (2019) Little race or gender bias in an experiment of initial review of NIH R01 grant proposals. Nat Hum Behav 3:257-264.

Marsh et al. (2008) Improving the peer-review process for grant applications: Reliability, validity, bias, and generalizability. American Psychologist, 63:160-168.

Solans-Domenech et al. (2017) Blinding applicants in a first-stage peer-review process of biomedical research grants: An observational study. Res. Eval. 26:181-189. DOI:10.1093/reseval/rvx021

I also recommend that the authors cite and discuss the following article about prestige bias in the peer review of papers submitted to a conference:

Tomkins et al. (2017) Reviewer bias in single-versus double-blind peer review. PNAS 114:12708-12713.

Essential revisions 1. The manuscript, as it stands, does not bring much more to the topic than what has been published in previous papers, such as Solans-Domenech et al., 2017, which analyzed a larger sample (and is not cited by the authors – see point 4 below). In order to improve the manuscript I would advise the authors to expand the scope as follows: 1a. The option and justification given by the authors of not studying the effect of gender is not convincing. Although applicants were not asked about their gender before 2021, it could still be possible to perform an analysis based on a proxy where first names would be associated to gender.

The analysis on gender distribution in the LOI, Full Application invitations, and Program Awards before and after blinding is included in the revision. In collecting this data, we found that there was no evidence of gender bias in the Full Application invitations or Program Awards either before or after blinding.

We extended our analysis to include Program Awards based on institutional affiliation and included the results in the discussion.

2. The prestige ranking is based on total funding size, which is certainly correlated with the prestige of the institutions. However, this measure is also highly correlated with the size of the institutions, which might not accurately reflect the prestige: for example, Harvard, Yale and MIT are in the second tier; Chicago, Berkeley and Caltech are in the third tier; and there are also some very prestigious institutions (such as Rockefeller University, Fred Hutchinson Cancer Research Center, and the Mayo Clinic) in the bottom tier. This point could be addressed by using a modified prestige ranking obtained by normalizing the funding size with respect to the size of the institutions (e.g., the number of researchers) might be more appropriate. Another approach would be to use other university rankings (eg Shanghai, THE, Leiden, etc). Our suggestion is that several different proxies for prestige should be used in the study (eg two or more of NSF, NSF normalized, Shanghai, THE, Leiden, and maybe even the proxy based on previous performance when applying to Beckman), and that the results of all the proxies studied should be reported.

We implemented the suggestion presented here from the reviewers and expanded our analysis to include eight different lists of institutional ranks, as well as then developed a “Consensus List” based on reviewing the trends and differences between these lists. The analysis results using all nine of these lists are included in the revised manuscript.

With regards to normalizing the NCSES Federal funding data by institution size, we did consult with the program team at NSF NCSES to ask if they also collect data that could help with this analysis. While one of their surveys they send to institutions annually (the HERD survey) does include a question about the number of faculty engaged in scientific research, they commented that the compliance with response to this question is lower than usual, and not all of the institutions on the top 100 list would be included in this HERD survey (private communication). With the inclusion of the other university rankings, we did not pursue this question about normalization any further.

3. There are a number of points regarding the statistical analysis that need to be addressed 3a. Regarding the NSF ranking: the 2018 data has been used, but as the study covers 8 years, the authors should at least have tested if rankings from different years would give different results.

We included additional years of rankings in our study and discussion.

We respectfully disagree that this division was based only on the concentration of funding received at the top institutions, although that is a real effect that is reported in the references cited. The number of categories was also based on the division of our LOIs into these categories (see Table 6 in the Methods section). We also added Supplemental Table 3 with the full data set of number of LOIs, Full Application invitations and Program Awards by institution to help the reader see how the applications are distributed among the institutions each year.

For this revision, we were able to include an additional year of blinded review analysis, and we balanced the study with 4 years of unblinded and 4 years of blinded data. We did still receive a smaller number of LOIs during some of the blinded study years (which we attribute to COVID disruptions). We considered this difference in sample size in developing the “Relative Advantage” metric that we used for our analysis. If you down-sample the LOIs received in the unblinded years across the Institutional Categories, the “Relative Advantage” ratio remains the same.

4. The discussion of the existing literature on bias in research grant evaluation needs to be improved. I recommend that the authors cite and discuss the following articles: Forscher, P.S., Cox, W.T.L., Brauer, M. et al. (2019) Little race or gender bias in an experiment of initial review of NIH R01 grant proposals. Nat Hum Behav 3:257-264. Marsh et al. (2008) Improving the peer-review process for grant applications: Reliability, validity, bias, and generalizability. American Psychologist, 63:160-168. Solans-Domenech et al. (2017) Blinding applicants in a first-stage peer-review process of biomedical research grants: An observational study. Res. Eval. 26:181-189. DOI:10.1093/reseval/rvx021 I also recommend that the authors cite and discuss the following article about prestige bias in the peer review of papers submitted to a conference: Tomkins et al. (2017) Reviewer bias in single-versus double-blind peer review. PNAS 114:12708-12713.

Thank you for the additional references, which have been consulted and included in the revised manuscript.

Author details

Anne E Hultgren is at the Arnold and Mabel Beckman Foundation, Irvine, United States

Contribution

For correspondence, competing interests.

ORCID icon

Nicole MF Patras is at the Arnold and Mabel Beckman Foundation, Irvine, United States

Jenna Hicks is at the Health Research Alliance, Research Park, United States

No external funding was received for this work.

Acknowledgements

We thank everyone who has reviewed applications for the BYI. We also thank the Board of Directors of the AMBF for supporting the trial reported here, and our Scientific Advisory Council and the BYI Executive Committee for their help and support. We also thank Amanda Casey and Dr Kim Orth (University of Texas Southwestern), and Dr Michael May (Dana Point Analytics) for assistance with data analysis and for visualization recommendations.

Publication history

  • Received: August 31, 2023
  • Accepted: February 29, 2024
  • Version of Record published: March 25, 2024 (version 1)

© 2024, Hultgren et al.

This article is distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use and redistribution provided that the original author and source are credited.

  • 1 Downloads
  • 0 Citations

Article citation count generated by polling the highest count across the following sources: Crossref , PubMed Central , Scopus .

Download links

Downloads (link to download the article as pdf).

  • Article PDF
  • Figures PDF

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools), categories and tags.

  • meta-research
  • institutional prestige bias
  • peer review
  • gender bias
  • double-blind peer review
  • Part of Collection

Meta-research

Meta-Research: A Collection of Articles

Further reading.

The study of science itself is a growing field of research.

  • Structural Biology and Molecular Biophysics

Phosphorylation, disorder, and phase separation govern the behavior of Frequency in the fungal circadian clock

Circadian clocks are composed of transcription-translation negative feedback loops that pace rhythms of gene expression to the diurnal cycle. In the filamentous fungus Neurospora crassa, the proteins F requency ( F RQ), the F RQ-interacting RNA helicase (FRH), and C asein-Kinase I (CK1) form the FFC complex that represses expression of genes activated by the white-collar complex (WCC). FRQ orchestrates key molecular interactions of the clock despite containing little predicted tertiary structure. Spin labeling and pulse-dipolar electron spin resonance spectroscopy provide domain-specific structural insights into the 989-residue intrinsically disordered FRQ and the FFC. FRQ contains a compact core that associates and organizes FRH and CK1 to coordinate their roles in WCC repression. FRQ phosphorylation increases conformational flexibility and alters oligomeric state, but the changes in structure and dynamics are non-uniform. Full-length FRQ undergoes liquid–liquid phase separation (LLPS) to sequester FRH and CK1 and influence CK1 enzymatic activity. Although FRQ phosphorylation favors LLPS, LLPS feeds back to reduce FRQ phosphorylation by CK1 at higher temperatures. Live imaging of Neurospora hyphae reveals FRQ foci characteristic of condensates near the nuclear periphery. Analogous clock repressor proteins in higher organisms share little position-specific sequence identity with FRQ; yet, they contain amino acid compositions that promote LLPS. Hence, condensate formation may be a conserved feature of eukaryotic clocks.

  • Microbiology and Infectious Disease

Purine nucleosides replace cAMP in allosteric regulation of PKA in trypanosomatid pathogens

Cyclic nucleotide binding domains (CNB) confer allosteric regulation by cAMP or cGMP to many signaling proteins, including PKA and PKG. PKA of phylogenetically distant Trypanosoma is the first exception as it is cyclic nucleotide-independent and responsive to nucleoside analogues (Bachmaier et al., 2019). Here, we show that natural nucleosides inosine, guanosine and adenosine are nanomolar affinity CNB ligands and activators of PKA orthologs of the important tropical pathogens Trypanosoma brucei , Trypanosoma cruzi, and Leishmania . The sequence and structural determinants of binding affinity, -specificity and kinase activation of PKAR were established by structure-activity relationship (SAR) analysis, co-crystal structures and mutagenesis. Substitution of two to three amino acids in the binding sites is sufficient for conversion of CNB domains from nucleoside to cyclic nucleotide specificity. In addition, a trypanosomatid-specific C-terminal helix (αD) is required for high affinity binding to CNB-B. The αD helix functions as a lid of the binding site that shields ligands from solvent. Selectivity of guanosine for CNB-B and of adenosine for CNB-A results in synergistic kinase activation at low nanomolar concentration. PKA pulldown from rapid lysis establishes guanosine as the predominant ligand in vivo in T. brucei bloodstream forms, whereas guanosine and adenosine seem to synergize in the procyclic developmental stage in the insect vector. We discuss the versatile use of CNB domains in evolution and recruitment of PKA for novel nucleoside-mediated signaling.

Be the first to read new articles from eLife

Howard Hughes Medical Institute

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Community based...

Community based complex interventions to sustain independence in older people: systematic review and network meta-analysis

Linked editorial.

Maintaining independence in older adults

  • Related content
  • Peer review
  • Thomas F Crocker , associate professor 1 ,
  • Joie Ensor , associate professor 2 3 ,
  • Natalie Lam , research fellow 1 ,
  • Magda Jordão , research fellow 1 ,
  • Ram Bajpai , lecturer in biostatistics 3 ,
  • Matthew Bond , medical statistics research assistant 3 ,
  • Anne Forster , professor of ageing and stroke research 1 ,
  • Richard D Riley , professor of biostatistics 2 3 ,
  • Deirdre Andre , library research support advisor 4 ,
  • Caroline Brundle , elderly care researcher 1 ,
  • Alison Ellwood , aged care researcher 1 ,
  • John Green , rehabilitation research programme manager 1 ,
  • Matthew Hale , geriatric academic fellow 1 ,
  • Lubena Mirza , elderly care researcher 1 ,
  • Jessica Morgan , geriatric medicine doctor 5 ,
  • Ismail Patel , elderly care researcher 1 ,
  • Eleftheria Patetsini , ageing research assistant 1 ,
  • Matthew Prescott , trial manager 1 ,
  • Ridha Ramiz , ageing research assistant 1 ,
  • Oliver Todd , clinical lecturer 1 ,
  • Rebecca Walford , anaesthetics doctor 5 ,
  • John Gladman , professor of medicine in older people , consultant geriatrician 6 7 ,
  • Andrew Clegg , professor of geriatric medicine 1
  • 1 Academic Unit for Ageing and Stroke Research (University of Leeds), Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
  • 2 Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
  • 3 Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
  • 4 Research Support Team, Leeds University Library, University of Leeds, Leeds, UK
  • 5 Geriatric Medicine, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
  • 6 Centre for Rehabilitation and Ageing Research, Academic Unit of Injury, Inflammation and Recovery Sciences, University of Nottingham, Nottingham, UK
  • 7 Health Care of Older People, Nottingham University Hospitals NHS Trust, Nottingham, UK
  • Correspondence to: T F Crocker medtcro{at}leeds.ac.uk (or @AUASResearch on Twitter/X)
  • Accepted 14 February 2024

Objective To synthesise evidence of the effectiveness of community based complex interventions, grouped according to their intervention components, to sustain independence for older people.

Design Systematic review and network meta-analysis.

Data sources Medline, Embase, CINAHL, PsycINFO, CENTRAL, clinicaltrials.gov, and International Clinical Trials Registry Platform from inception to 9 August 2021 and reference lists of included studies.

Eligibility criteria Randomised controlled trials or cluster randomised controlled trials with ≥24 weeks’ follow-up studying community based complex interventions for sustaining independence in older people (mean age ≥65 years) living at home, with usual care, placebo, or another complex intervention as comparators.

Main outcomes Living at home, activities of daily living (personal/instrumental), care home placement, and service/economic outcomes at 12 months.

Data synthesis Interventions were grouped according to a specifically developed typology. Random effects network meta-analysis estimated comparative effects; Cochrane’s revised tool (RoB 2) structured risk of bias assessment. Grading of recommendations assessment, development and evaluation (GRADE) network meta-analysis structured certainty assessment.

Results The review included 129 studies (74 946 participants). Nineteen intervention components, including “multifactorial action from individualised care planning” (a process of multidomain assessment and management leading to tailored actions), were identified in 63 combinations. For living at home, compared with no intervention/placebo, evidence favoured multifactorial action from individualised care planning including medication review and regular follow-ups (routine review) (odds ratio 1.22, 95% confidence interval 0.93 to 1.59; moderate certainty); multifactorial action from individualised care planning including medication review without regular follow-ups (2.55, 0.61 to 10.60; low certainty); combined cognitive training, medication review, nutritional support, and exercise (1.93, 0.79 to 4.77; low certainty); and combined activities of daily living training, nutritional support, and exercise (1.79, 0.67 to 4.76; low certainty). Risk screening or the addition of education and self-management strategies to multifactorial action from individualised care planning and routine review with medication review may reduce odds of living at home. For instrumental activities of daily living, evidence favoured multifactorial action from individualised care planning and routine review with medication review (standardised mean difference 0.11, 95% confidence interval 0.00 to 0.21; moderate certainty). Two interventions may reduce instrumental activities of daily living: combined activities of daily living training, aids, and exercise; and combined activities of daily living training, aids, education, exercise, and multifactorial action from individualised care planning and routine review with medication review and self-management strategies. For personal activities of daily living, evidence favoured combined exercise, multifactorial action from individualised care planning, and routine review with medication review and self-management strategies (0.16, −0.51 to 0.82; low certainty). For homecare recipients, evidence favoured addition of multifactorial action from individualised care planning and routine review with medication review (0.60, 0.32 to 0.88; low certainty). High risk of bias and imprecise estimates meant that most evidence was low or very low certainty. Few studies contributed to each comparison, impeding evaluation of inconsistency and frailty.

Conclusions The intervention most likely to sustain independence is individualised care planning including medicines optimisation and regular follow-up reviews resulting in multifactorial action. Homecare recipients may particularly benefit from this intervention. Unexpectedly, some combinations may reduce independence. Further research is needed to investigate which combinations of interventions work best for different participants and contexts.

Registration PROSPERO CRD42019162195.

Contributors: AC, TC, JGl, and RRi conceived of the study. AC, TC, JE, AF, JGl, MJ, NL, EP, and RRi designed the study. DA, TC, and NL developed the search strategy. DA executed the database and trial register searches. TC, JGr, MJ, NL, JM, EP, RRa, and RW selected the studies. RB, CB, TC, JGr, MH, MJ, NL, JM, LM, EP, IP, RRa, OT, and RW extracted data. AC and JGl assessed frailty. AC, TC, AE, AF, JGl, MJ, and NL did intervention grouping. CB, TC, MJ, NL, and MP assessed risk of bias. MJ liaised with patient and public involvement groups. RB, MB, and JE prepared and analysed effectiveness data. JE and RRi supervised the meta-analyses. TC and NL assessed the certainty of the evidence. TC and NL did economic and narrative synthesis. AC, TC, JE, AF, JGl, NL, RRi, and OT interpreted the findings. TC wrote the first draft of the manuscript with input from AC, JE, AF, and JGl. All authors contributed to critically reviewing or revising the manuscript. CB, MB, RB, AC, TC, JE, JGl, JGr, MH, MJ, NL, JM, LM, EP, IP, MP, RRa, OT, and RW collectively accessed and verified the underlying data reported in the manuscript. JE and NL contributed equally. TC is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: This project was funded by the National Institute for Health Research (NIHR) Health Technology Assessment programme (NIHR128862) and will be published in full in Health Technology Assessment. 176 The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. The grant applicants designed the overarching systematic review and network meta-analysis; the funders approved the protocol. The funders have not been involved in any aspect of the collection, analysis, or interpretation of data; in the writing of the report; or in the decision to submit the article for publication.

Competing interests: All authors have completed the ICMJE uniform disclosure form at https://www.icmje.org/disclosure-of-interest/ and declare: AC, TC, JE, AF, JGl, MJ, NL, and RRi had financial support from the NIHR Health Technology Assessment Programme for the submitted work; MB had financial support from the PhD Graduate Teaching Fund at the University of Liverpool for the submitted work; DA declares payment made to her employer, University of Leeds Library, from the Academic Unit for Ageing and Stroke Research, Bradford Institute for Health Research, for services that included contributions to the submitted work; TC, AC, and AF received research funding from NIHR Programme Grants for Applied Research; AC and AF also received research funding from NIHR HSDR Programme; AC also received research funding from Health Data Research UK, NIHR ARC Yorkshire and Humber, NIHR Leeds BRC, and Dunhill Medical Trust; AF also declares NIHR Senior Investigator award, National Institute for Health (USA) payment for panel membership in 2021 and 2022, and University of Leeds Governor representative on the Governors Board of Bradford Teaching Hospitals NHS Foundation Trust; MB and MP received NIHR pre-doctoral fellowship funding; RB is supported by matched funding awarded to the NIHR Applied Research Collaboration (West Midlands) and is a member of the data monitoring committee for the Predict and Prevent AECOPD Trial and College of Experts, Versus Arthritis; AC is a member of NIHR HTA Commissioned Research Funding Committee and Dunhill Medical Trust Research Grants Committee; RRi received personal payments for training courses provided in-house to universities (Leeds, Aberdeen, Exeter, LSHTM) and other organisations (Roche), has received personal payments from the BMJ and BMJ Medicine as their statistical editor, is a co-convenor of the Cochrane Prognosis Methods Group and on the Editorial Board of Diagnostic and Prognostic Research, and Research Synthesis Methods, but receives no income for these roles, receives personal payment for being the external examiner of the MSc Medical Statistics, London School of Hygiene and Tropical Medicine, was previously an external examiner for the MSc Medical Statistics at University of Leicester, has written two textbooks for which he receives royalties from sales (Prognosis Research in Healthcare, and Individual Participant Data Meta-analysis), is a lead editor on an upcoming book (Cochrane Handbook for Prognosis Reviews, Wiley, 2025), for which he will receive royalties from sales, has received consulting fees for a training course on IPD meta-analysis from Roche in 2018, the NIHR HTA grant paid for travel to Leeds for one meeting, and is a member of the NIHR Doctoral Research Fellowships grant panel, and a member of the MRC Better Methods Better Research grant panel—for the latter, he receives an attendance fee; MH declares NIHR Academic Clinical Fellowship; OT declares NIHR Academic Clinical Lectureship and Dunhill Medical Trust Doctoral Research Fellowship RTF107/0117; no other relationships or activities that could appear to have influenced the submitted work.

Transparency: The lead author (the manuscript's guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned (and, if relevant, registered) have been explained.

Dissemination to participants and related patient and public communities: To ensure that we clearly communicate our findings with patients and member of the public, we spent time discussing the intervention components we had identified with Frailty Oversight Group members. Through this work, we developed and refined our descriptions of the components. Frailty Oversight Group members helped to draft and revise the plain language summary of our findings (see appendix 2). This will be included in the final report to be published in Health Technology Assessment, 176 as well as on our website. We have produced short video presentations of our work that are also available. Our findings will be publicised on social media and at our regular research engagement events in the local community. As this systematic review was secondary research using aggregated, anonymised data, we do not have any contact details for the participants in the primary research to enable us to share the findings directly with them.

Provenance and peer review: Not commissioned; externally peer reviewed.

Data availability statement

The data associated with this paper will be openly available indefinitely upon publication under a Creative Commons attribution license from the University of Leeds Data Repository. Summary effect estimates and findings from network meta-analyses: https://doi.org/10.5518/1377 ; risk of bias judgments: https://doi.org/10.5518/1386 .

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/ .

what is meta analysis in education research

  • Share full article

Advertisement

Supported by

What the Data Says About Pandemic School Closures, Four Years Later

The more time students spent in remote instruction, the further they fell behind. And, experts say, extended closures did little to stop the spread of Covid.

Sarah Mervosh

By Sarah Mervosh ,  Claire Cain Miller and Francesca Paris

Four years ago this month, schools nationwide began to shut down, igniting one of the most polarizing and partisan debates of the pandemic.

Some schools, often in Republican-led states and rural areas, reopened by fall 2020. Others, typically in large cities and states led by Democrats, would not fully reopen for another year.

A variety of data — about children’s academic outcomes and about the spread of Covid-19 — has accumulated in the time since. Today, there is broad acknowledgment among many public health and education experts that extended school closures did not significantly stop the spread of Covid, while the academic harms for children have been large and long-lasting.

While poverty and other factors also played a role, remote learning was a key driver of academic declines during the pandemic, research shows — a finding that held true across income levels.

Source: Fahle, Kane, Patterson, Reardon, Staiger and Stuart, “ School District and Community Factors Associated With Learning Loss During the COVID-19 Pandemic .” Score changes are measured from 2019 to 2022. In-person means a district offered traditional in-person learning, even if not all students were in-person.

“There’s fairly good consensus that, in general, as a society, we probably kept kids out of school longer than we should have,” said Dr. Sean O’Leary, a pediatric infectious disease specialist who helped write guidance for the American Academy of Pediatrics, which recommended in June 2020 that schools reopen with safety measures in place.

There were no easy decisions at the time. Officials had to weigh the risks of an emerging virus against the academic and mental health consequences of closing schools. And even schools that reopened quickly, by the fall of 2020, have seen lasting effects.

But as experts plan for the next public health emergency, whatever it may be, a growing body of research shows that pandemic school closures came at a steep cost to students.

The longer schools were closed, the more students fell behind.

At the state level, more time spent in remote or hybrid instruction in the 2020-21 school year was associated with larger drops in test scores, according to a New York Times analysis of school closure data and results from the National Assessment of Educational Progress , an authoritative exam administered to a national sample of fourth- and eighth-grade students.

At the school district level, that finding also holds, according to an analysis of test scores from third through eighth grade in thousands of U.S. districts, led by researchers at Stanford and Harvard. In districts where students spent most of the 2020-21 school year learning remotely, they fell more than half a grade behind in math on average, while in districts that spent most of the year in person they lost just over a third of a grade.

( A separate study of nearly 10,000 schools found similar results.)

Such losses can be hard to overcome, without significant interventions. The most recent test scores, from spring 2023, show that students, overall, are not caught up from their pandemic losses , with larger gaps remaining among students that lost the most ground to begin with. Students in districts that were remote or hybrid the longest — at least 90 percent of the 2020-21 school year — still had almost double the ground to make up compared with students in districts that allowed students back for most of the year.

Some time in person was better than no time.

As districts shifted toward in-person learning as the year went on, students that were offered a hybrid schedule (a few hours or days a week in person, with the rest online) did better, on average, than those in places where school was fully remote, but worse than those in places that had school fully in person.

Students in hybrid or remote learning, 2020-21

80% of students

Some schools return online, as Covid-19 cases surge. Vaccinations start for high-priority groups.

Teachers are eligible for the Covid vaccine in more than half of states.

Most districts end the year in-person or hybrid.

Source: Burbio audit of more than 1,200 school districts representing 47 percent of U.S. K-12 enrollment. Note: Learning mode was defined based on the most in-person option available to students.

Income and family background also made a big difference.

A second factor associated with academic declines during the pandemic was a community’s poverty level. Comparing districts with similar remote learning policies, poorer districts had steeper losses.

But in-person learning still mattered: Looking at districts with similar poverty levels, remote learning was associated with greater declines.

A community’s poverty rate and the length of school closures had a “roughly equal” effect on student outcomes, said Sean F. Reardon, a professor of poverty and inequality in education at Stanford, who led a district-level analysis with Thomas J. Kane, an economist at Harvard.

Score changes are measured from 2019 to 2022. Poorest and richest are the top and bottom 20% of districts by percent of students on free/reduced lunch. Mostly in-person and mostly remote are districts that offered traditional in-person learning for more than 90 percent or less than 10 percent of the 2020-21 year.

But the combination — poverty and remote learning — was particularly harmful. For each week spent remote, students in poor districts experienced steeper losses in math than peers in richer districts.

That is notable, because poor districts were also more likely to stay remote for longer .

Some of the country’s largest poor districts are in Democratic-leaning cities that took a more cautious approach to the virus. Poor areas, and Black and Hispanic communities , also suffered higher Covid death rates, making many families and teachers in those districts hesitant to return.

“We wanted to survive,” said Sarah Carpenter, the executive director of Memphis Lift, a parent advocacy group in Memphis, where schools were closed until spring 2021 .

“But I also think, man, looking back, I wish our kids could have gone back to school much quicker,” she added, citing the academic effects.

Other things were also associated with worse student outcomes, including increased anxiety and depression among adults in children’s lives, and the overall restriction of social activity in a community, according to the Stanford and Harvard research .

Even short closures had long-term consequences for children.

While being in school was on average better for academic outcomes, it wasn’t a guarantee. Some districts that opened early, like those in Cherokee County, Ga., a suburb of Atlanta, and Hanover County, Va., lost significant learning and remain behind.

At the same time, many schools are seeing more anxiety and behavioral outbursts among students. And chronic absenteeism from school has surged across demographic groups .

These are signs, experts say, that even short-term closures, and the pandemic more broadly, had lasting effects on the culture of education.

“There was almost, in the Covid era, a sense of, ‘We give up, we’re just trying to keep body and soul together,’ and I think that was corrosive to the higher expectations of schools,” said Margaret Spellings, an education secretary under President George W. Bush who is now chief executive of the Bipartisan Policy Center.

Closing schools did not appear to significantly slow Covid’s spread.

Perhaps the biggest question that hung over school reopenings: Was it safe?

That was largely unknown in the spring of 2020, when schools first shut down. But several experts said that had changed by the fall of 2020, when there were initial signs that children were less likely to become seriously ill, and growing evidence from Europe and parts of the United States that opening schools, with safety measures, did not lead to significantly more transmission.

“Infectious disease leaders have generally agreed that school closures were not an important strategy in stemming the spread of Covid,” said Dr. Jeanne Noble, who directed the Covid response at the U.C.S.F. Parnassus emergency department.

Politically, though, there remains some disagreement about when, exactly, it was safe to reopen school.

Republican governors who pushed to open schools sooner have claimed credit for their approach, while Democrats and teachers’ unions have emphasized their commitment to safety and their investment in helping students recover.

“I do believe it was the right decision,” said Jerry T. Jordan, president of the Philadelphia Federation of Teachers, which resisted returning to school in person over concerns about the availability of vaccines and poor ventilation in school buildings. Philadelphia schools waited to partially reopen until the spring of 2021 , a decision Mr. Jordan believes saved lives.

“It doesn’t matter what is going on in the building and how much people are learning if people are getting the virus and running the potential of dying,” he said.

Pandemic school closures offer lessons for the future.

Though the next health crisis may have different particulars, with different risk calculations, the consequences of closing schools are now well established, experts say.

In the future, infectious disease experts said, they hoped decisions would be guided more by epidemiological data as it emerged, taking into account the trade-offs.

“Could we have used data to better guide our decision making? Yes,” said Dr. Uzma N. Hasan, division chief of pediatric infectious diseases at RWJBarnabas Health in Livingston, N.J. “Fear should not guide our decision making.”

Source: Fahle, Kane, Patterson, Reardon, Staiger and Stuart, “ School District and Community Factors Associated With Learning Loss During the Covid-19 Pandemic. ”

The study used estimates of learning loss from the Stanford Education Data Archive . For closure lengths, the study averaged district-level estimates of time spent in remote and hybrid learning compiled by the Covid-19 School Data Hub (C.S.D.H.) and American Enterprise Institute (A.E.I.) . The A.E.I. data defines remote status by whether there was an in-person or hybrid option, even if some students chose to remain virtual. In the C.S.D.H. data set, districts are defined as remote if “all or most” students were virtual.

An earlier version of this article misstated a job description of Dr. Jeanne Noble. She directed the Covid response at the U.C.S.F. Parnassus emergency department. She did not direct the Covid response for the University of California, San Francisco health system.

How we handle corrections

Sarah Mervosh covers education for The Times, focusing on K-12 schools. More about Sarah Mervosh

Claire Cain Miller writes about gender, families and the future of work for The Upshot. She joined The Times in 2008 and was part of a team that won a Pulitzer Prize in 2018 for public service for reporting on workplace sexual harassment issues. More about Claire Cain Miller

Francesca Paris is a Times reporter working with data and graphics for The Upshot. More about Francesca Paris

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

How People Are Really Using GenAI

  • Marc Zao-Sanders

what is meta analysis in education research

The top 100 use cases as reported by users on Reddit, Quora, and other forums.

There are many use cases for generative AI, spanning a vast number of areas of domestic and work life. Looking through thousands of comments on sites such as Reddit and Quora, the author’s team found that the use of this technology is as wide-ranging as the problems we encounter in our lives. The 100 categories they identified can be divided into six top-level themes, which give an immediate sense of what generative AI is being used for: Technical Assistance & Troubleshooting (23%), Content Creation & Editing (22%), Personal & Professional Support (17%), Learning & Education (15%), Creativity & Recreation (13%), Research, Analysis & Decision Making (10%).

It’s been a little over a year since ChatGPT brought generative AI into the mainstream. In that time, we’ve ridden a wave of excitement about the current utility and future impact of large language models (LLMs). These tools already have hundreds of millions of weekly users, analysts are projecting a multi-trillion dollar contribution to the economy, and there’s now a growing array of credible competitors to OpenAI.

what is meta analysis in education research

  • Marc Zao-Sanders is CEO and co-founder of filtered.com , which develops algorithmic technology to make sense of corporate skills and learning content. He’s the author of Timeboxing – The Power of Doing One Thing at a Time . Find Marc on LinkedIn or at www.marczaosanders.com .

Partner Center

IMAGES

  1. PPT

    what is meta analysis in education research

  2. Meta-Analysis Methodology for Basic Research: A Practical Guide

    what is meta analysis in education research

  3. What is a Meta-Analysis? The benefits and challenges

    what is meta analysis in education research

  4. A practical Guide to do Primary research on Meta analysis Methodology

    what is meta analysis in education research

  5. Meta-Analysis Methodology for Basic Research: A Practical Guide

    what is meta analysis in education research

  6. How Is A Meta-Analysis Performed?

    what is meta analysis in education research

VIDEO

  1. Meta Analysis Research (मेटा विश्लेषण अनुसंधान) #ugcnet #ResearchMethodology #educationalbyarun

  2. Lecture Series: Meta-analysis Method for Social Science

  3. How to start my research career

  4. Penggunaan Elicit untuk penulisan systematic literature review (SLR)

  5. META-ANALISIS

  6. Systematic Literature Review and Meta Analysis(literature review)(quantitative analysis)

COMMENTS

  1. How to conduct a meta-analysis in eight steps: a practical guide

    2.1 Step 1: defining the research question. The first step in conducting a meta-analysis, as with any other empirical study, is the definition of the research question. Most importantly, the research question determines the realm of constructs to be considered or the type of interventions whose effects shall be analyzed.

  2. Methodological Guidance Paper: High-Quality Meta-Analysis in a

    The term meta-analysis was first used by Gene Glass (1976) in his presidential address at the AERA (American Educational Research Association) annual meeting, though Pearson (1904) used methods to combine results from studies on the relationship between enteric fever and mortality in 1904. The 1980s was a period of rapid development of statistical methods (Cooper & Hedges, 2009) leading to the ...

  3. Frontiers

    The objective of this study was to conduct a meta analysis of studies conducted in the past on effective teaching and its impact on student learning outcomes. Meta analysis is a powerful tool for synthesizing previous research and empirically validating theoretical frameworks. For this purpose, the dynamic model of educational effectiveness was used as a guiding framework to select and ...

  4. A Review of Meta-Analyses in Education: Methodological Strengths and

    September 2011 with the phrases meta-analysis, research synthesis , or meta as search terms yielded a total of 5,206 publications in education since 1976. This result shows the exponential growth in the number of published meta-analyses over time in social science and education. With the first meta-analysis appearing

  5. A systematic review and meta-analysis of the evidence on learning

    This meta-analysis of 42 studies finds that learning progress has slowed during the coronavirus disease 2019 pandemic, particularly among children from low socio-economic backgrounds and in poorer ...

  6. A Review of Meta-Analyses in Education: Methodological Strengths and

    Research synthesis and meta-analysis: A guide for literature review (4th ed.). Thousand Oaks, CA: Sage. ... (2009). Predicting teacher performance with test scores and grade point average: A meta-analysis. American Educational Research Journal, 46, 146-182. Crossref. ISI. Google Scholar. DeCoster J. (2004). Meta-analysis.

  7. Meta-Analysis and Research Synthesis in Education

    Meta-analysis now has become but one (important) method in integrative research synthesis. As a large proportion of primary research in education does not lend itself to a meta-analysis, there is a growing interest in methodologically inclusive discussions of quantitative and qualitative research synthesis in education.

  8. Higher Education Quality and Student Satisfaction: Meta-Analysis

    Meta-Analysis is a systematic procedure for synthesizing quantitative results of different empirical studies with the aim of exploring the dispersion of effect sizes. Glass 110 brought the technique of Meta-Analysis into the social sciences. This approach was accepted by medical sciences, and there is plenty of medicinal literature based on a ...

  9. PDF Challenges and Opportunities of Meta-Analysis in Education Research

    done meta-analysis. Meta-analysis is a relatively new technique for reviewing research, and it has evolved over the last 20 years. If you read meta-analyses done in the late 90's, they often combine multiple poor-quality studies to produce one mean effect size. While more

  10. Meta-Analyses on Education (Chapter 4)

    The results of meta-analyses and their aggregation to mega-analyses could be of value for educational practitioners if the research was carried out according to the standards of quality, which means based on experiments or at least quasi-experimental studies. Quite often, however, meta-analyses include primary studies of lower quality.

  11. Meta-Analysis in Educational Research. ERIC Digest

    Meta-analysis is a collection of systematic techniques for resolving apparent contradictions in research findings. Meta-analysts translate results from different studies to a common metric and statistically explore the relations between study characteristics and findings. Since G. Glass first used the term "meta-analysis" in 1976, it has become a widely accepted research tool encompassing a ...

  12. The Role of Meta-analysis in Educational Research

    Meta-analysis is now a well-established form of synthesizing literature, and syntheses of meta-analyses are now being produced in educational research. The chapter provides an overview of the method, discusses syntheses of meta-analyses, and argues that these methods can be of particular assistance for establishing benchmarks of comparison about educational outcomes. The literature relating to ...

  13. What Is Meta-Analysis? Definition, Research & Examples

    Meta-analysis is a quantitative research method that involves the systematic synthesis and statistical analysis of data from multiple individual studies on a particular topic or research question. It aims to provide a comprehensive and robust summary of existing evidence by pooling the results of these studies, often leading to more precise and ...

  14. Meta-Analytic Methodology for Basic Research: A Practical Guide

    Meta-analysis refers to the statistical analysis of the data from independent primary studies focused on the same question, which aims to generate a quantitative estimate of the studied phenomenon, for example, the effectiveness of the intervention (Gopalakrishnan and Ganeshkumar, 2013). In clinical research, systematic reviews and meta ...

  15. Meta-Analysis

    Definition. "A meta-analysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. Meta-analysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning ...

  16. Making sense of meta-analysis in medical education research

    Meta-analysis is a statistical analysis that provides for a scholar to combine and synthesise the results of multiple primary research studies in order to minimise uncertainty and disagreement. 4-6 The rationale associated with meta-analysis is to increase overall sample sizes, which in turn could enhance the statistical power analysis as well ...

  17. Introduction to systematic review and meta-analysis

    It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical ...

  18. PDF The Application of Meta-analysis in Educational Research

    A meta-analysis is a synthesis of a research project, including a summary of the results of other studies in a meaningful connected display that enables the derivation of general conclusions. Therefore, meta-analysis units are not subjects, but studies, and this is what characterizes this method.

  19. (PDF) A Review of Meta-Analyses in Education

    Meta-Analyses in Education. e evaluated the 56 educational meta-analyses published. in eight peer-reviewed journals in education in the 2000s by (a) assessing the extent. to which published meta ...

  20. The effect of socio-scientific issues-based intervention studies on

    ABSTRACT. Given little research using meta-analysis for SSI-based interventions and their limitations (e.g. data selection and analysis processes), further study is needed to validate previous findings and provide a broader sense of the effectiveness of SSI-based interventions in promoting scientific literacy.

  21. Education Sciences

    This meta-analysis focused on preschoolers' phonological awareness interventions because this approach is most effective in pre-elementary education and can be extended to children who start elementary grades; however, the educational level was not specified as a moderating variable, since it was an inclusion criterion for the meta-analysis.

  22. Simulation-Based Learning in Higher Education: A Meta-Analysis

    The present meta-analysis aims at advancing this research by summarizing the effects of simulation-based learning on complex skills—going beyond understanding the subject matter and performing technical tasks. In addition, this meta-analysis aims at differentiating the effects for learners on different levels of prior knowledge.

  23. Frontiers

    This approach uses games to teach various subjects and skills, promoting engagement, motivation, and fun. In early childhood education, game-based learning has the potential to promote cognitive, social, and emotional development. This systematic review and meta-analysis aim to summarize the existing literature on the effectiveness of game ...

  24. Meta-Research: Blinding reduces institutional prestige bias during

    The Times Higher Education World University Rankings are based on 13 calibrated performance indicators that measure an institution's performance across four areas: teaching, research, knowledge transfer and international outlook (Times Higher Education, 2023). The list used in this study was filtered for Research Universities in the United ...

  25. Community based complex interventions to sustain independence in older

    Objective To synthesise evidence of the effectiveness of community based complex interventions, grouped according to their intervention components, to sustain independence for older people. Design Systematic review and network meta-analysis. Data sources Medline, Embase, CINAHL, PsycINFO, CENTRAL, clinicaltrials.gov, and International Clinical Trials Registry Platform from inception to 9 ...

  26. What the Data Says About Pandemic School Closures, Four Years Later

    The more time students spent in remote instruction, the further they fell behind. And, experts say, extended closures did little to stop the spread of Covid.

  27. How People Are Really Using GenAI

    There are many use cases for generative AI, spanning a vast number of areas of domestic and work life. Looking through thousands of comments on sites such as Reddit and Quora, the author's team ...