• Privacy Policy

Research Method

Home » Validity – Types, Examples and Guide

Validity – Types, Examples and Guide

Table of Contents

Validity

Definition:

Validity refers to the extent to which a concept, measure, or study accurately represents the intended meaning or reality it is intended to capture. It is a fundamental concept in research and assessment that assesses the soundness and appropriateness of the conclusions, inferences, or interpretations made based on the data or evidence collected.

Research Validity

Research validity refers to the degree to which a study accurately measures or reflects what it claims to measure. In other words, research validity concerns whether the conclusions drawn from a study are based on accurate, reliable and relevant data.

Validity is a concept used in logic and research methodology to assess the strength of an argument or the quality of a research study. It refers to the extent to which a conclusion or result is supported by evidence and reasoning.

How to Ensure Validity in Research

Ensuring validity in research involves several steps and considerations throughout the research process. Here are some key strategies to help maintain research validity:

Clearly Define Research Objectives and Questions

Start by clearly defining your research objectives and formulating specific research questions. This helps focus your study and ensures that you are addressing relevant and meaningful research topics.

Use appropriate research design

Select a research design that aligns with your research objectives and questions. Different types of studies, such as experimental, observational, qualitative, or quantitative, have specific strengths and limitations. Choose the design that best suits your research goals.

Use reliable and valid measurement instruments

If you are measuring variables or constructs, ensure that the measurement instruments you use are reliable and valid. This involves using established and well-tested tools or developing your own instruments through rigorous validation processes.

Ensure a representative sample

When selecting participants or subjects for your study, aim for a sample that is representative of the population you want to generalize to. Consider factors such as age, gender, socioeconomic status, and other relevant demographics to ensure your findings can be generalized appropriately.

Address potential confounding factors

Identify potential confounding variables or biases that could impact your results. Implement strategies such as randomization, matching, or statistical control to minimize the influence of confounding factors and increase internal validity.

Minimize measurement and response biases

Be aware of measurement biases and response biases that can occur during data collection. Use standardized protocols, clear instructions, and trained data collectors to minimize these biases. Employ techniques like blinding or double-blinding in experimental studies to reduce bias.

Conduct appropriate statistical analyses

Ensure that the statistical analyses you employ are appropriate for your research design and data type. Select statistical tests that are relevant to your research questions and use robust analytical techniques to draw accurate conclusions from your data.

Consider external validity

While it may not always be possible to achieve high external validity, be mindful of the generalizability of your findings. Clearly describe your sample and study context to help readers understand the scope and limitations of your research.

Peer review and replication

Submit your research for peer review by experts in your field. Peer review helps identify potential flaws, biases, or methodological issues that can impact validity. Additionally, encourage replication studies by other researchers to validate your findings and enhance the overall reliability of the research.

Transparent reporting

Clearly and transparently report your research methods, procedures, data collection, and analysis techniques. Provide sufficient details for others to evaluate the validity of your study and replicate your work if needed.

Types of Validity

There are several types of validity that researchers consider when designing and evaluating studies. Here are some common types of validity:

Internal Validity

Internal validity relates to the degree to which a study accurately identifies causal relationships between variables. It addresses whether the observed effects can be attributed to the manipulated independent variable rather than confounding factors. Threats to internal validity include selection bias, history effects, maturation of participants, and instrumentation issues.

External Validity

External validity concerns the generalizability of research findings to the broader population or real-world settings. It assesses the extent to which the results can be applied to other individuals, contexts, or timeframes. Factors that can limit external validity include sample characteristics, research settings, and the specific conditions under which the study was conducted.

Construct Validity

Construct validity examines whether a study adequately measures the intended theoretical constructs or concepts. It focuses on the alignment between the operational definitions used in the study and the underlying theoretical constructs. Construct validity can be threatened by issues such as poor measurement tools, inadequate operational definitions, or a lack of clarity in the conceptual framework.

Content Validity

Content validity refers to the degree to which a measurement instrument or test adequately covers the entire range of the construct being measured. It assesses whether the items or questions included in the measurement tool represent the full scope of the construct. Content validity is often evaluated through expert judgment, reviewing the relevance and representativeness of the items.

Criterion Validity

Criterion validity determines the extent to which a measure or test is related to an external criterion or standard. It assesses whether the results obtained from a measurement instrument align with other established measures or outcomes. Criterion validity can be divided into two subtypes: concurrent validity, which examines the relationship between the measure and the criterion at the same time, and predictive validity, which investigates the measure’s ability to predict future outcomes.

Face Validity

Face validity refers to the degree to which a measurement or test appears, on the surface, to measure what it intends to measure. It is a subjective assessment based on whether the items seem relevant and appropriate to the construct being measured. Face validity is often used as an initial evaluation before conducting more rigorous validity assessments.

Importance of Validity

Validity is crucial in research for several reasons:

  • Accurate Measurement: Validity ensures that the measurements or observations in a study accurately represent the intended constructs or variables. Without validity, researchers cannot be confident that their results truly reflect the phenomena they are studying. Validity allows researchers to draw accurate conclusions and make meaningful inferences based on their findings.
  • Credibility and Trustworthiness: Validity enhances the credibility and trustworthiness of research. When a study demonstrates high validity, it indicates that the researchers have taken appropriate measures to ensure the accuracy and integrity of their work. This strengthens the confidence of other researchers, peers, and the wider scientific community in the study’s results and conclusions.
  • Generalizability: Validity helps determine the extent to which research findings can be generalized beyond the specific sample and context of the study. By addressing external validity, researchers can assess whether their results can be applied to other populations, settings, or situations. This information is valuable for making informed decisions, implementing interventions, or developing policies based on research findings.
  • Sound Decision-Making: Validity supports informed decision-making in various fields, such as medicine, psychology, education, and social sciences. When validity is established, policymakers, practitioners, and professionals can rely on research findings to guide their actions and interventions. Validity ensures that decisions are based on accurate and trustworthy information, which can lead to better outcomes and more effective practices.
  • Avoiding Errors and Bias: Validity helps researchers identify and mitigate potential errors and biases in their studies. By addressing internal validity, researchers can minimize confounding factors and alternative explanations, ensuring that the observed effects are genuinely attributable to the manipulated variables. Validity assessments also highlight measurement errors or shortcomings, enabling researchers to improve their measurement tools and procedures.
  • Progress of Scientific Knowledge: Validity is essential for the advancement of scientific knowledge. Valid research contributes to the accumulation of reliable and valid evidence, which forms the foundation for building theories, developing models, and refining existing knowledge. Validity allows researchers to build upon previous findings, replicate studies, and establish a cumulative body of knowledge in various disciplines. Without validity, the scientific community would struggle to make meaningful progress and establish a solid understanding of the phenomena under investigation.
  • Ethical Considerations: Validity is closely linked to ethical considerations in research. Conducting valid research ensures that participants’ time, effort, and data are not wasted on flawed or invalid studies. It upholds the principle of respect for participants’ autonomy and promotes responsible research practices. Validity is also important when making claims or drawing conclusions that may have real-world implications, as misleading or invalid findings can have adverse effects on individuals, organizations, or society as a whole.

Examples of Validity

Here are some examples of validity in different contexts:

  • Example 1: All men are mortal. John is a man. Therefore, John is mortal. This argument is logically valid because the conclusion follows logically from the premises.
  • Example 2: If it is raining, then the ground is wet. The ground is wet. Therefore, it is raining. This argument is not logically valid because there could be other reasons for the ground being wet, such as watering the plants.
  • Example 1: In a study examining the relationship between caffeine consumption and alertness, the researchers use established measures of both variables, ensuring that they are accurately capturing the concepts they intend to measure. This demonstrates construct validity.
  • Example 2: A researcher develops a new questionnaire to measure anxiety levels. They administer the questionnaire to a group of participants and find that it correlates highly with other established anxiety measures. This indicates good construct validity for the new questionnaire.
  • Example 1: A study on the effects of a particular teaching method is conducted in a controlled laboratory setting. The findings of the study may lack external validity because the conditions in the lab may not accurately reflect real-world classroom settings.
  • Example 2: A research study on the effects of a new medication includes participants from diverse backgrounds and age groups, increasing the external validity of the findings to a broader population.
  • Example 1: In an experiment, a researcher manipulates the independent variable (e.g., a new drug) and controls for other variables to ensure that any observed effects on the dependent variable (e.g., symptom reduction) are indeed due to the manipulation. This establishes internal validity.
  • Example 2: A researcher conducts a study examining the relationship between exercise and mood by administering questionnaires to participants. However, the study lacks internal validity because it does not control for other potential factors that could influence mood, such as diet or stress levels.
  • Example 1: A teacher develops a new test to assess students’ knowledge of a particular subject. The items on the test appear to be relevant to the topic at hand and align with what one would expect to find on such a test. This suggests face validity, as the test appears to measure what it intends to measure.
  • Example 2: A company develops a new customer satisfaction survey. The questions included in the survey seem to address key aspects of the customer experience and capture the relevant information. This indicates face validity, as the survey seems appropriate for assessing customer satisfaction.
  • Example 1: A team of experts reviews a comprehensive curriculum for a high school biology course. They evaluate the curriculum to ensure that it covers all the essential topics and concepts necessary for students to gain a thorough understanding of biology. This demonstrates content validity, as the curriculum is representative of the domain it intends to cover.
  • Example 2: A researcher develops a questionnaire to assess career satisfaction. The questions in the questionnaire encompass various dimensions of job satisfaction, such as salary, work-life balance, and career growth. This indicates content validity, as the questionnaire adequately represents the different aspects of career satisfaction.
  • Example 1: A company wants to evaluate the effectiveness of a new employee selection test. They administer the test to a group of job applicants and later assess the job performance of those who were hired. If there is a strong correlation between the test scores and subsequent job performance, it suggests criterion validity, indicating that the test is predictive of job success.
  • Example 2: A researcher wants to determine if a new medical diagnostic tool accurately identifies a specific disease. They compare the results of the diagnostic tool with the gold standard diagnostic method and find a high level of agreement. This demonstrates criterion validity, indicating that the new tool is valid in accurately diagnosing the disease.

Where to Write About Validity in A Thesis

In a thesis, discussions related to validity are typically included in the methodology and results sections. Here are some specific places where you can address validity within your thesis:

Research Design and Methodology

In the methodology section, provide a clear and detailed description of the measures, instruments, or data collection methods used in your study. Discuss the steps taken to establish or assess the validity of these measures. Explain the rationale behind the selection of specific validity types relevant to your study, such as content validity, criterion validity, or construct validity. Discuss any modifications or adaptations made to existing measures and their potential impact on validity.

Measurement Procedures

In the methodology section, elaborate on the procedures implemented to ensure the validity of measurements. Describe how potential biases or confounding factors were addressed, controlled, or accounted for to enhance internal validity. Provide details on how you ensured that the measurement process accurately captures the intended constructs or variables of interest.

Data Collection

In the methodology section, discuss the steps taken to collect data and ensure data validity. Explain any measures implemented to minimize errors or biases during data collection, such as training of data collectors, standardized protocols, or quality control procedures. Address any potential limitations or threats to validity related to the data collection process.

Data Analysis and Results

In the results section, present the analysis and findings related to validity. Report any statistical tests, correlations, or other measures used to assess validity. Provide interpretations and explanations of the results obtained. Discuss the implications of the validity findings for the overall reliability and credibility of your study.

Limitations and Future Directions

In the discussion or conclusion section, reflect on the limitations of your study, including limitations related to validity. Acknowledge any potential threats or weaknesses to validity that you encountered during your research. Discuss how these limitations may have influenced the interpretation of your findings and suggest avenues for future research that could address these validity concerns.

Applications of Validity

Validity is applicable in various areas and contexts where research and measurement play a role. Here are some common applications of validity:

Psychological and Behavioral Research

Validity is crucial in psychology and behavioral research to ensure that measurement instruments accurately capture constructs such as personality traits, intelligence, attitudes, emotions, or psychological disorders. Validity assessments help researchers determine if their measures are truly measuring the intended psychological constructs and if the results can be generalized to broader populations or real-world settings.

Educational Assessment

Validity is essential in educational assessment to determine if tests, exams, or assessments accurately measure students’ knowledge, skills, or abilities. It ensures that the assessment aligns with the educational objectives and provides reliable information about student performance. Validity assessments help identify if the assessment is valid for all students, regardless of their demographic characteristics, language proficiency, or cultural background.

Program Evaluation

Validity plays a crucial role in program evaluation, where researchers assess the effectiveness and impact of interventions, policies, or programs. By establishing validity, evaluators can determine if the observed outcomes are genuinely attributable to the program being evaluated rather than extraneous factors. Validity assessments also help ensure that the evaluation findings are applicable to different populations, contexts, or timeframes.

Medical and Health Research

Validity is essential in medical and health research to ensure the accuracy and reliability of diagnostic tools, measurement instruments, and clinical assessments. Validity assessments help determine if a measurement accurately identifies the presence or absence of a medical condition, measures the effectiveness of a treatment, or predicts patient outcomes. Validity is crucial for establishing evidence-based medicine and informing medical decision-making.

Social Science Research

Validity is relevant in various social science disciplines, including sociology, anthropology, economics, and political science. Researchers use validity to ensure that their measures and methods accurately capture social phenomena, such as social attitudes, behaviors, social structures, or economic indicators. Validity assessments support the reliability and credibility of social science research findings.

Market Research and Surveys

Validity is important in market research and survey studies to ensure that the survey questions effectively measure consumer preferences, buying behaviors, or attitudes towards products or services. Validity assessments help researchers determine if the survey instrument is accurately capturing the desired information and if the results can be generalized to the target population.

Limitations of Validity

Here are some limitations of validity:

  • Construct Validity: Limitations of construct validity include the potential for measurement error, inadequate operational definitions of constructs, or the failure to capture all aspects of a complex construct.
  • Internal Validity: Limitations of internal validity may arise from confounding variables, selection bias, or the presence of extraneous factors that could influence the study outcomes, making it difficult to attribute causality accurately.
  • External Validity: Limitations of external validity can occur when the study sample does not represent the broader population, when the research setting differs significantly from real-world conditions, or when the study lacks ecological validity, i.e., the findings do not reflect real-world complexities.
  • Measurement Validity: Limitations of measurement validity can arise from measurement error, inadequately designed or flawed measurement scales, or limitations inherent in self-report measures, such as social desirability bias or recall bias.
  • Statistical Conclusion Validity: Limitations in statistical conclusion validity can occur due to sampling errors, inadequate sample sizes, or improper statistical analysis techniques, leading to incorrect conclusions or generalizations.
  • Temporal Validity: Limitations of temporal validity arise when the study results become outdated due to changes in the studied phenomena, interventions, or contextual factors.
  • Researcher Bias: Researcher bias can affect the validity of a study. Biases can emerge through the researcher’s subjective interpretation, influence of personal beliefs, or preconceived notions, leading to unintentional distortion of findings or failure to consider alternative explanations.
  • Ethical Validity: Limitations can arise if the study design or methods involve ethical concerns, such as the use of deceptive practices, inadequate informed consent, or potential harm to participants.

Also see  Reliability Vs Validity

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Alternate Forms Reliability

Alternate Forms Reliability – Methods, Examples...

Construct Validity

Construct Validity – Types, Threats and Examples

Internal Validity

Internal Validity – Threats, Examples and Guide

Reliability Vs Validity

Reliability Vs Validity

Internal_Consistency_Reliability

Internal Consistency Reliability – Methods...

Split-Half Reliability

Split-Half Reliability – Methods, Examples and...

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Reliability vs Validity in Research | Differences, Types & Examples

Reliability vs Validity in Research | Differences, Types & Examples

Published on 3 May 2022 by Fiona Middleton . Revised on 10 October 2022.

Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method , technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.

It’s important to consider reliability and validity when you are creating your research design , planning your methods, and writing up your results, especially in quantitative research .

Table of contents

Understanding reliability vs validity, how are reliability and validity assessed, how to ensure validity and reliability in your research, where to write about reliability and validity in a thesis.

Reliability and validity are closely related, but they mean different things. A measurement can be reliable without being valid. However, if a measurement is valid, it is usually also reliable.

What is reliability?

Reliability refers to how consistently a method measures something. If the same result can be consistently achieved by using the same methods under the same circumstances, the measurement is considered reliable.

What is validity?

Validity refers to how accurately a method measures what it is intended to measure. If research has high validity, that means it produces results that correspond to real properties, characteristics, and variations in the physical or social world.

High reliability is one indicator that a measurement is valid. If a method is not reliable, it probably isn’t valid.

However, reliability on its own is not enough to ensure validity. Even if a test is reliable, it may not accurately reflect the real situation.

Validity is harder to assess than reliability, but it is even more important. To obtain useful results, the methods you use to collect your data must be valid: the research must be measuring what it claims to measure. This ensures that your discussion of the data and the conclusions you draw are also valid.

Prevent plagiarism, run a free check.

Reliability can be estimated by comparing different versions of the same measurement. Validity is harder to assess, but it can be estimated by comparing the results to other relevant data or theory. Methods of estimating reliability and validity are usually split up into different types.

Types of reliability

Different types of reliability can be estimated through various statistical methods.

Types of validity

The validity of a measurement can be estimated based on three main types of evidence. Each type can be evaluated through expert judgement or statistical methods.

To assess the validity of a cause-and-effect relationship, you also need to consider internal validity (the design of the experiment ) and external validity (the generalisability of the results).

The reliability and validity of your results depends on creating a strong research design , choosing appropriate methods and samples, and conducting the research carefully and consistently.

Ensuring validity

If you use scores or ratings to measure variations in something (such as psychological traits, levels of ability, or physical properties), it’s important that your results reflect the real variations as accurately as possible. Validity should be considered in the very earliest stages of your research, when you decide how you will collect your data .

  • Choose appropriate methods of measurement

Ensure that your method and measurement technique are of high quality and targeted to measure exactly what you want to know. They should be thoroughly researched and based on existing knowledge.

For example, to collect data on a personality trait, you could use a standardised questionnaire that is considered reliable and valid. If you develop your own questionnaire, it should be based on established theory or the findings of previous studies, and the questions should be carefully and precisely worded.

  • Use appropriate sampling methods to select your subjects

To produce valid generalisable results, clearly define the population you are researching (e.g., people from a specific age range, geographical location, or profession). Ensure that you have enough participants and that they are representative of the population.

Ensuring reliability

Reliability should be considered throughout the data collection process. When you use a tool or technique to collect data, it’s important that the results are precise, stable, and reproducible.

  • Apply your methods consistently

Plan your method carefully to make sure you carry out the same steps in the same way for each measurement. This is especially important if multiple researchers are involved.

For example, if you are conducting interviews or observations, clearly define how specific behaviours or responses will be counted, and make sure questions are phrased the same way each time.

  • Standardise the conditions of your research

When you collect your data, keep the circumstances as consistent as possible to reduce the influence of external factors that might create variation in the results.

For example, in an experimental setup, make sure all participants are given the same information and tested under the same conditions.

It’s appropriate to discuss reliability and validity in various sections of your thesis or dissertation or research paper. Showing that you have taken them into account in planning your research and interpreting the results makes your work more credible and trustworthy.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Middleton, F. (2022, October 10). Reliability vs Validity in Research | Differences, Types & Examples. Scribbr. Retrieved 22 April 2024, from https://www.scribbr.co.uk/research-methods/reliability-or-validity/

Is this article helpful?

Fiona Middleton

Fiona Middleton

Other students also liked, the 4 types of validity | types, definitions & examples, a quick guide to experimental design | 5 steps & examples, sampling methods | types, techniques, & examples.

Research-Methodology

Research validity in surveys relates to the extent at which the survey measures right elements that need to be measured. In simple terms, validity refers to how well an instrument as measures what it is intended to measure.

Reliability alone is not enough, measures need to be reliable, as well as, valid. For example, if a weight measuring scale is wrong by 4kg (it deducts 4 kg of the actual weight), it can be specified as reliable, because the scale displays the same weight every time we measure a specific item. However, the scale is not valid because it does not display the actual weight of the item.

Research validity can be divided into two groups: internal and external. It can be specified that “internal validity refers to how the research findings match reality, while external validity refers to the extend to which the research findings can be replicated to other environments” (Pelissier, 2008, p.12).

Moreover, validity can also be divided into five types:

1. Face Validity is the most basic type of validity and it is associated with a highest level of subjectivity because it is not based on any scientific approach. In other words, in this case a test may be specified as valid by a researcher because it may seem as valid, without an in-depth scientific justification.

Example: questionnaire design for a study that analyses the issues of employee performance can be assessed as valid because each individual question may seem to be addressing specific and relevant aspects of employee performance.

2. Construct Validity relates to assessment of suitability of measurement tool to measure the phenomenon being studied. Application of construct validity can be effectively facilitated with the involvement of panel of ‘experts’ closely familiar with the measure and the phenomenon.

Example: with the application of construct validity the levels of leadership competency in any given organisation can be effectively assessed by devising questionnaire to be answered by operational level employees and asking questions about the levels of their motivation to do their duties in a daily basis.

3. Criterion-Related Validity involves comparison of tests results with the outcome. This specific type of validity correlates results of assessment with another criterion of assessment.

Example: nature of customer perception of brand image of a specific company can be assessed via organising a focus group. The same issue can also be assessed through devising questionnaire to be answered by current and potential customers of the brand. The higher the level of correlation between focus group and questionnaire findings, the high the level of criterion-related validity.

4. Formative Validity refers to assessment of effectiveness of the measure in terms of providing information that can be used to improve specific aspects of the phenomenon.

Example: when developing initiatives to increase the levels of effectiveness of organisational culture if the measure is able to identify specific weaknesses of organisational culture such as employee-manager communication barriers, then the level of formative validity of the measure can be assessed as adequate.

5. Sampling Validity (similar to content validity) ensures that the area of coverage of the measure within the research area is vast. No measure is able to cover all items and elements within the phenomenon, therefore, important items and elements are selected using a specific pattern of sampling method depending on aims and objectives of the study.

Example: when assessing a leadership style exercised in a specific organisation, assessment of decision-making style would not suffice, and other issues related to leadership style such as organisational culture, personality of leaders, the nature of the industry etc. need to be taken into account as well.

My e-book,  The Ultimate Guide to Writing a Dissertation in Business Studies: a step by step assistance  offers practical assistance to complete a dissertation with minimum or no stress. The e-book covers all stages of writing a dissertation starting from the selection to the research area to submitting the completed version of the work within the deadline. John Dudovskiy

Research Validity

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 5: Psychological Measurement

Reliability and Validity of Measurement

Learning Objectives

  • Define reliability, including the different types and how they are assessed.
  • Define validity, including the different types and how they are assessed.
  • Describe the kinds of evidence that would be relevant to assessing the reliability and validity of a particular measure.

Again, measurement involves assigning scores to individuals so that they represent some characteristic of the individuals. But how do researchers know that the scores actually represent the characteristic, especially when it is a construct like intelligence, self-esteem, depression, or working memory capacity? The answer is that they conduct research using the measure to confirm that the scores make sense based on their understanding of the construct being measured. This is an extremely important point. Psychologists do not simply  assume  that their measures work. Instead, they collect data to demonstrate  that they work. If their research does not demonstrate that a measure works, they stop using it.

As an informal example, imagine that you have been dieting for a month. Your clothes seem to be fitting more loosely, and several friends have asked if you have lost weight. If at this point your bathroom scale indicated that you had lost 10 pounds, this would make sense and you would continue to use the scale. But if it indicated that you had gained 10 pounds, you would rightly conclude that it was broken and either fix it or get rid of it. In evaluating a measurement method, psychologists consider two general dimensions: reliability and validity.

Reliability

Reliability  refers to the consistency of a measure. Psychologists consider three types of consistency: over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability).

Test-Retest Reliability

When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time.  Test-retest reliability  is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent.

Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the  same  group of people at a later time, and then looking at  test-retest correlation  between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing Pearson’s  r . Figure 5.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart. Pearson’s r for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.

Score at time 1 is on the x-axis and score at time 2 is on the y-axis, showing fairly consistent scores

Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.

Internal Consistency

A second kind of reliability is  internal consistency , which is the consistency of people’s responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people’s scores on those items should be correlated with each other. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that that they have a number of good qualities. If people’s responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioural and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking. This measure would be internally consistent to the extent that individual participants’ bets were consistently high or low across trials.

Like test-retest reliability, internal consistency can only be assessed by collecting and analyzing data. One approach is to look at a  split-half correlation . This involves splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items. Then a score is computed for each set of items, and the relationship between the two sets of scores is examined. For example, Figure 5.3 shows the split-half correlation between several university students’ scores on the even-numbered items and their scores on the odd-numbered items of the Rosenberg Self-Esteem Scale. Pearson’s  r  for these data is +.88. A split-half correlation of +.80 or greater is generally considered good internal consistency.

Score on even-numbered items is on the x-axis and score on odd-numbered items is on the y-axis, showing fairly consistent scores

Perhaps the most common measure of internal consistency used by researchers in psychology is a statistic called  Cronbach’s α  (the Greek letter alpha). Conceptually, α is the mean of all possible split-half correlations for a set of items. For example, there are 252 ways to split a set of 10 items into two sets of five. Cronbach’s α would be the mean of the 252 split-half correlations. Note that this is not how α is actually computed, but it is a correct way of interpreting the meaning of this statistic. Again, a value of +.80 or greater is generally taken to indicate good internal consistency.

Interrater Reliability

Many behavioural measures involve significant judgment on the part of an observer or a rater.  Inter-rater reliability  is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students’ social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student’s level of social skills. To the extent that each participant does in fact have some level of social skills that can be detected by an attentive observer, different observers’ ratings should be highly correlated with each other. Inter-rater reliability would also have been measured in Bandura’s Bobo doll study. In this case, the observers’ ratings of how many acts of aggression a particular child committed while playing with the Bobo doll should have been highly positively correlated. Interrater reliability is often assessed using Cronbach’s α when the judgments are quantitative or an analogous statistic called Cohen’s κ (the Greek letter kappa) when they are categorical.

Validity  is the extent to which the scores from a measure represent the variable they are intended to. But how do researchers make this judgment? We have already considered one factor that they take into account—reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to. There has to be more to it, however, because a measure can be extremely reliable but have no validity whatsoever. As an absurd example, imagine someone who believes that people’s index finger length reflects their self-esteem and therefore tries to measure self-esteem by holding a ruler up to people’s index fingers. Although this measure would have extremely good test-retest reliability, it would have absolutely no validity. The fact that one person’s index finger is a centimetre longer than another’s would indicate nothing about which one had higher self-esteem.

Discussions of validity usually divide it into several distinct “types.” But a good way to interpret these types is that they are other kinds of evidence—in addition to reliability—that should be taken into account when judging the validity of a measure. Here we consider three basic kinds: face validity, content validity, and criterion validity.

Face Validity

Face validity  is the extent to which a measurement method appears “on its face” to measure the construct of interest. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. So a questionnaire that included these kinds of items would have good face validity. The finger-length method of measuring self-esteem, on the other hand, seems to have nothing to do with self-esteem and therefore has poor face validity. Although face validity can be assessed quantitatively—for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to—it is usually assessed informally.

Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. One reason is that it is based on people’s intuitions about human behaviour, which are frequently wrong. It is also the case that many established measures in psychology work quite well despite lacking face validity. The Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality characteristics and disorders by having people decide whether each of over 567 different statements applies to them—where many of the statements do not have any obvious relationship to the construct that they measure. For example, the items “I enjoy detective or mystery stories” and “The sight of blood doesn’t frighten me or make me sick” both measure the suppression of aggression. In this case, it is not the participants’ literal answers to these questions that are of interest, but rather whether the pattern of the participants’ responses to a series of questions matches those of individuals who tend to suppress their aggression.

Content Validity

Content validity  is the extent to which a measure “covers” the construct of interest. For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings, and actions toward something. By this conceptual definition, a person has a positive attitude toward exercise to the extent that he or she thinks positive thoughts about exercising, feels good about exercising, and actually exercises. So to have good content validity, a measure of people’s attitudes toward exercise would have to reflect all three of these aspects. Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.

Criterion Validity

Criterion validity  is the extent to which people’s scores on a measure are correlated with other variables (known as  criteria ) that one would expect them to be correlated with. For example, people’s scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam. If it were found that people’s scores were in fact negatively correlated with their exam performance, then this would be a piece of evidence that these scores really represent people’s test anxiety. But if it were found that people scored equally well on the exam regardless of their test anxiety scores, then this would cast doubt on the validity of the measure.

A criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them. For example, one would expect test anxiety scores to be negatively correlated with exam performance and course grades and positively correlated with general anxiety and with blood pressure during an exam. Or imagine that a researcher develops a new measure of physical risk taking. People’s scores on this measure should be correlated with their participation in “extreme” activities such as snowboarding and rock climbing, the number of speeding tickets they have received, and even the number of broken bones they have had over the years. When the criterion is measured at the same time as the construct, criterion validity is referred to as concurrent validity ; however, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as predictive validity (because scores on the measure have “predicted” a future outcome).

Criteria can also include other measures of the same construct. For example, one would expect new measures of test anxiety or physical risk taking to be positively correlated with existing measures of the same constructs. This is known as convergent validity .

Assessing convergent validity requires collecting data using the measure. Researchers John Cacioppo and Richard Petty did this when they created their self-report Need for Cognition Scale to measure how much people value and engage in thinking (Cacioppo & Petty, 1982) [1] . In a series of studies, they showed that people’s scores were positively correlated with their scores on a standardized academic achievement test, and that their scores were negatively correlated with their scores on a measure of dogmatism (which represents a tendency toward obedience). In the years since it was created, the Need for Cognition Scale has been used in literally hundreds of studies and has been shown to be correlated with a wide variety of other variables, including the effectiveness of an advertisement, interest in politics, and juror decisions (Petty, Briñol, Loersch, & McCaslin, 2009) [2] .

Discriminant Validity

Discriminant validity , on the other hand, is the extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct. For example, self-esteem is a general attitude toward the self that is fairly stable over time. It is not the same as mood, which is how good or bad one happens to be feeling right now. So people’s scores on a new measure of self-esteem should not be very highly correlated with their moods. If the new measure of self-esteem were highly correlated with a measure of mood, it could be argued that the new measure is not really measuring self-esteem; it is measuring mood instead.

When they created the Need for Cognition Scale, Cacioppo and Petty also provided evidence of discriminant validity by showing that people’s scores were not correlated with certain other variables. For example, they found only a weak correlation between people’s need for cognition and a measure of their cognitive style—the extent to which they tend to think analytically by breaking ideas into smaller parts or holistically in terms of “the big picture.” They also found no correlation between people’s need for cognition and measures of their test anxiety and their tendency to respond in socially desirable ways. All these low correlations provide evidence that the measure is reflecting a conceptually distinct construct.

Key Takeaways

  • Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them.
  • There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
  • Validity is a judgment based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
  • The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. The assessment of reliability and validity is an ongoing process.
  • Practice: Ask several friends to complete the Rosenberg Self-Esteem Scale. Then assess its internal consistency by making a scatterplot to show the split-half correlation (even- vs. odd-numbered items). Compute Pearson’s  r too if you know how.
  • Discussion: Think back to the last college exam you took and think of the exam as a psychological measure. What construct do you think it was intended to measure? Comment on its face and content validity. What data could you collect to assess its reliability and criterion validity?
  • Cacioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42 , 116–131. ↵
  • Petty, R. E, Briñol, P., Loersch, C., & McCaslin, M. J. (2009). The need for cognition. In M. R. Leary & R. H. Hoyle (Eds.), Handbook of individual differences in social behaviour (pp. 318–329). New York, NY: Guilford Press. ↵

The consistency of a measure.

The consistency of a measure over time.

The consistency of a measure on the same group of people at different times.

Consistency of people’s responses across the items on a multiple-item measure.

Method of assessing internal consistency through splitting the items into two sets and examining the relationship between them.

A statistic in which α is the mean of all possible split-half correlations for a set of items.

The extent to which different observers are consistent in their judgments.

The extent to which the scores from a measure represent the variable they are intended to.

The extent to which a measurement method appears to measure the construct of interest.

The extent to which a measure “covers” the construct of interest.

The extent to which people’s scores on a measure are correlated with other variables that one would expect them to be correlated with.

In reference to criterion validity, variables that one would expect to be correlated with the measure.

When the criterion is measured at the same time as the construct.

when the criterion is measured at some point in the future (after the construct has been measured).

When new measures positively correlate with existing measures of the same constructs.

The extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

validity formula in research

Grad Coach

Validity & Reliability In Research

A Plain-Language Explanation (With Examples)

By: Derek Jansen (MBA) | Expert Reviewer: Kerryn Warren (PhD) | September 2023

Validity and reliability are two related but distinctly different concepts within research. Understanding what they are and how to achieve them is critically important to any research project. In this post, we’ll unpack these two concepts as simply as possible.

This post is based on our popular online course, Research Methodology Bootcamp . In the course, we unpack the basics of methodology  using straightfoward language and loads of examples. If you’re new to academic research, you definitely want to use this link to get 50% off the course (limited-time offer).

Overview: Validity & Reliability

  • The big picture
  • Validity 101
  • Reliability 101 
  • Key takeaways

First, The Basics…

First, let’s start with a big-picture view and then we can zoom in to the finer details.

Validity and reliability are two incredibly important concepts in research, especially within the social sciences. Both validity and reliability have to do with the measurement of variables and/or constructs – for example, job satisfaction, intelligence, productivity, etc. When undertaking research, you’ll often want to measure these types of constructs and variables and, at the simplest level, validity and reliability are about ensuring the quality and accuracy of those measurements .

As you can probably imagine, if your measurements aren’t accurate or there are quality issues at play when you’re collecting your data, your entire study will be at risk. Therefore, validity and reliability are very important concepts to understand (and to get right). So, let’s unpack each of them.

Free Webinar: Research Methodology 101

What Is Validity?

In simple terms, validity (also called “construct validity”) is all about whether a research instrument accurately measures what it’s supposed to measure .

For example, let’s say you have a set of Likert scales that are supposed to quantify someone’s level of overall job satisfaction. If this set of scales focused purely on only one dimension of job satisfaction, say pay satisfaction, this would not be a valid measurement, as it only captures one aspect of the multidimensional construct. In other words, pay satisfaction alone is only one contributing factor toward overall job satisfaction, and therefore it’s not a valid way to measure someone’s job satisfaction.

validity formula in research

Oftentimes in quantitative studies, the way in which the researcher or survey designer interprets a question or statement can differ from how the study participants interpret it . Given that respondents don’t have the opportunity to ask clarifying questions when taking a survey, it’s easy for these sorts of misunderstandings to crop up. Naturally, if the respondents are interpreting the question in the wrong way, the data they provide will be pretty useless . Therefore, ensuring that a study’s measurement instruments are valid – in other words, that they are measuring what they intend to measure – is incredibly important.

There are various types of validity and we’re not going to go down that rabbit hole in this post, but it’s worth quickly highlighting the importance of making sure that your research instrument is tightly aligned with the theoretical construct you’re trying to measure .  In other words, you need to pay careful attention to how the key theories within your study define the thing you’re trying to measure – and then make sure that your survey presents it in the same way.

For example, sticking with the “job satisfaction” construct we looked at earlier, you’d need to clearly define what you mean by job satisfaction within your study (and this definition would of course need to be underpinned by the relevant theory). You’d then need to make sure that your chosen definition is reflected in the types of questions or scales you’re using in your survey . Simply put, you need to make sure that your survey respondents are perceiving your key constructs in the same way you are. Or, even if they’re not, that your measurement instrument is capturing the necessary information that reflects your definition of the construct at hand.

If all of this talk about constructs sounds a bit fluffy, be sure to check out Research Methodology Bootcamp , which will provide you with a rock-solid foundational understanding of all things methodology-related. Remember, you can take advantage of our 60% discount offer using this link.

Need a helping hand?

validity formula in research

What Is Reliability?

As with validity, reliability is an attribute of a measurement instrument – for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the “thing” it’s supposed to be measuring, reliability is concerned with consistency and stability . In other words, reliability reflects the degree to which a measurement instrument produces consistent results when applied repeatedly to the same phenomenon , under the same conditions .

As you can probably imagine, a measurement instrument that achieves a high level of consistency is naturally more dependable (or reliable) than one that doesn’t – in other words, it can be trusted to provide consistent measurements . And that, of course, is what you want when undertaking empirical research. If you think about it within a more domestic context, just imagine if you found that your bathroom scale gave you a different number every time you hopped on and off of it – you wouldn’t feel too confident in its ability to measure the variable that is your body weight 🙂

It’s worth mentioning that reliability also extends to the person using the measurement instrument . For example, if two researchers use the same instrument (let’s say a measuring tape) and they get different measurements, there’s likely an issue in terms of how one (or both) of them are using the measuring tape. So, when you think about reliability, consider both the instrument and the researcher as part of the equation.

As with validity, there are various types of reliability and various tests that can be used to assess the reliability of an instrument. A popular one that you’ll likely come across for survey instruments is Cronbach’s alpha , which is a statistical measure that quantifies the degree to which items within an instrument (for example, a set of Likert scales) measure the same underlying construct . In other words, Cronbach’s alpha indicates how closely related the items are and whether they consistently capture the same concept . 

Reliability reflects whether an instrument produces consistent results when applied to the same phenomenon, under the same conditions.

Recap: Key Takeaways

Alright, let’s quickly recap to cement your understanding of validity and reliability:

  • Validity is concerned with whether an instrument (e.g., a set of Likert scales) is measuring what it’s supposed to measure
  • Reliability is concerned with whether that measurement is consistent and stable when measuring the same phenomenon under the same conditions.

In short, validity and reliability are both essential to ensuring that your data collection efforts deliver high-quality, accurate data that help you answer your research questions . So, be sure to always pay careful attention to the validity and reliability of your measurement instruments when collecting and analysing data. As the adage goes, “rubbish in, rubbish out” – make sure that your data inputs are rock-solid.

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

Narrative analysis explainer

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS.

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS AND I HAVE GREATLY BENEFITED FROM THE CONTENT.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • How it works

Reliability and Validity – Definitions, Types & Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On October 26, 2023

A researcher must test the collected data before making any conclusion. Every  research design  needs to be concerned with reliability and validity to measure the quality of the research.

What is Reliability?

Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid.

Example: If you weigh yourself on a weighing scale throughout the day, you’ll get the same results. These are considered reliable results obtained through repeated measures.

Example: If a teacher conducts the same math test of students and repeats it next week with the same questions. If she gets the same score, then the reliability of the test is high.

What is the Validity?

Validity refers to the accuracy of the measurement. Validity shows how a specific test is suitable for a particular situation. If the results are accurate according to the researcher’s situation, explanation, and prediction, then the research is valid. 

If the method of measuring is accurate, then it’ll produce accurate results. If a method is reliable, then it’s valid. In contrast, if a method is not reliable, it’s not valid. 

Example:  Your weighing scale shows different results each time you weigh yourself within a day even after handling it carefully, and weighing before and after meals. Your weighing machine might be malfunctioning. It means your method had low reliability. Hence you are getting inaccurate or inconsistent results that are not valid.

Example:  Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from various participants, it means the validity of the questionnaire and product is high as it has high reliability.

Most of the time, validity is difficult to measure even though the process of measurement is reliable. It isn’t easy to interpret the real situation.

Example:  If the weighing scale shows the same result, let’s say 70 kg each time, even if your actual weight is 55 kg, then it means the weighing scale is malfunctioning. However, it was showing consistent results, but it cannot be considered as reliable. It means the method has low reliability.

Internal Vs. External Validity

One of the key features of randomised designs is that they have significantly high internal and external validity.

Internal validity  is the ability to draw a causal link between your treatment and the dependent variable of interest. It means the observed changes should be due to the experiment conducted, and any external factor should not influence the  variables .

Example: age, level, height, and grade.

External validity  is the ability to identify and generalise your study outcomes to the population at large. The relationship between the study’s situation and the situations outside the study is considered external validity.

Also, read about Inductive vs Deductive reasoning in this article.

Looking for reliable dissertation support?

We hear you.

  • Whether you want a full dissertation written or need help forming a dissertation proposal, we can help you with both.
  • Get different dissertation services at ResearchProspect and score amazing grades!

Threats to Interval Validity

Threats of external validity, how to assess reliability and validity.

Reliability can be measured by comparing the consistency of the procedure and its results. There are various methods to measure validity and reliability. Reliability can be measured through  various statistical methods  depending on the types of validity, as explained below:

Types of Reliability

Types of validity.

As we discussed above, the reliability of the measurement alone cannot determine its validity. Validity is difficult to be measured even if the method is reliable. The following type of tests is conducted for measuring validity. 

Does your Research Methodology Have the Following?

  • Great Research/Sources
  • Perfect Language
  • Accurate Sources

If not, we can help. Our panel of experts makes sure to keep the 3 pillars of Research Methodology strong.

Does your Research Methodology Have the Following?

How to Increase Reliability?

  • Use an appropriate questionnaire to measure the competency level.
  • Ensure a consistent environment for participants
  • Make the participants familiar with the criteria of assessment.
  • Train the participants appropriately.
  • Analyse the research items regularly to avoid poor performance.

How to Increase Validity?

Ensuring Validity is also not an easy job. A proper functioning method to ensure validity is given below:

  • The reactivity should be minimised at the first concern.
  • The Hawthorne effect should be reduced.
  • The respondents should be motivated.
  • The intervals between the pre-test and post-test should not be lengthy.
  • Dropout rates should be avoided.
  • The inter-rater reliability should be ensured.
  • Control and experimental groups should be matched with each other.

How to Implement Reliability and Validity in your Thesis?

According to the experts, it is helpful if to implement the concept of reliability and Validity. Especially, in the thesis and the dissertation, these concepts are adopted much. The method for implementation given below:

Frequently Asked Questions

What is reliability and validity in research.

Reliability in research refers to the consistency and stability of measurements or findings. Validity relates to the accuracy and truthfulness of results, measuring what the study intends to. Both are crucial for trustworthy and credible research outcomes.

What is validity?

Validity in research refers to the extent to which a study accurately measures what it intends to measure. It ensures that the results are truly representative of the phenomena under investigation. Without validity, research findings may be irrelevant, misleading, or incorrect, limiting their applicability and credibility.

What is reliability?

Reliability in research refers to the consistency and stability of measurements over time. If a study is reliable, repeating the experiment or test under the same conditions should produce similar results. Without reliability, findings become unpredictable and lack dependability, potentially undermining the study’s credibility and generalisability.

What is reliability in psychology?

In psychology, reliability refers to the consistency of a measurement tool or test. A reliable psychological assessment produces stable and consistent results across different times, situations, or raters. It ensures that an instrument’s scores are not due to random error, making the findings dependable and reproducible in similar conditions.

What is test retest reliability?

Test-retest reliability assesses the consistency of measurements taken by a test over time. It involves administering the same test to the same participants at two different points in time and comparing the results. A high correlation between the scores indicates that the test produces stable and consistent results over time.

How to improve reliability of an experiment?

  • Standardise procedures and instructions.
  • Use consistent and precise measurement tools.
  • Train observers or raters to reduce subjective judgments.
  • Increase sample size to reduce random errors.
  • Conduct pilot studies to refine methods.
  • Repeat measurements or use multiple methods.
  • Address potential sources of variability.

What is the difference between reliability and validity?

Reliability refers to the consistency and repeatability of measurements, ensuring results are stable over time. Validity indicates how well an instrument measures what it’s intended to measure, ensuring accuracy and relevance. While a test can be reliable without being valid, a valid test must inherently be reliable. Both are essential for credible research.

Are interviews reliable and valid?

Interviews can be both reliable and valid, but they are susceptible to biases. The reliability and validity depend on the design, structure, and execution of the interview. Structured interviews with standardised questions improve reliability. Validity is enhanced when questions accurately capture the intended construct and when interviewer biases are minimised.

Are IQ tests valid and reliable?

IQ tests are generally considered reliable, producing consistent scores over time. Their validity, however, is a subject of debate. While they effectively measure certain cognitive skills, whether they capture the entirety of “intelligence” or predict success in all life areas is contested. Cultural bias and over-reliance on tests are also concerns.

Are questionnaires reliable and valid?

Questionnaires can be both reliable and valid if well-designed. Reliability is achieved when they produce consistent results over time or across similar populations. Validity is ensured when questions accurately measure the intended construct. However, factors like poorly phrased questions, respondent bias, and lack of standardisation can compromise their reliability and validity.

You May Also Like

Textual analysis is the method of analysing and understanding the text. We need to look carefully at the text to identify the writer’s context and message.

Action research for my dissertation?, A brief overview of action research as a responsive, action-oriented, participative and reflective research technique.

What are the different types of research you can use in your dissertation? Here are some guidelines to help you choose a research strategy that would make your research more credible.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Caring Sci
  • v.4(2); 2015 Jun

Design and Implementation Content Validity Study: Development of an instrument for measuring Patient-Centered Communication

Vahid zamanzadeh.

1 Department of Medical-Surgical Nursing, Faculty of Nursing and Midwifery, Tabriz University of Medical Sciences, Tabriz, Iran

Akram Ghahramanian

Maryam rassouli.

2 Department of Pediatrics Nursing, Faculty of Nursing and Midwifery, Shahid Beheshti University of Medical Sciences, Tehran, Iran

Abbas Abbaszadeh

Hamid alavi-majd.

3 Department of Biostatistics, Faculty of Para Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran

Ali-Reza Nikanfar

4 Hematology and Oncology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran

Introduction: The importance of content validity in the instrument psychometric and its relevance with reliability, have made it an essential step in the instrument development. This article attempts to give an overview of the content validity process and to explain the complexity of this process by introducing an example.

Methods: We carried out a methodological study conducted to examine the content validity of the patient-centered communication instrument through a two-step process (development and judgment). At the first step, domain determination, sampling (item generation) and instrument formation and at the second step, content validity ratio, content validity index and modified kappa statistic was performed. Suggestions of expert panel and item impact scores are used to examine the instrument face validity.

Results: From a set of 188 items, content validity process identified seven dimensions includes trust building (eight items), informational support (seven items), emotional support (five items), problem solving (seven items), patient activation (10 items), intimacy/friendship (six items) and spirituality strengthening (14 items). Content validity study revealed that this instrument enjoys an appropriate level of content validity. The overall content validity index of the instrument using universal agreement approach was low; however, it can be advocated with respect to the high number of content experts that makes consensus difficult and high value of the S-CVI with the average approach, which was equal to 0.93.

Conclusion: This article illustrates acceptable quantities indices for content validity a new instrument and outlines them during design and psychometrics of patient-centered communication measuring instrument.

Introduction

In most studies, researchers study complex constructs for which valid and reliable instruments are needed. 1 Validity, which is defined as the ability of an instrument to measure the properties of the construct under study, 2 is a vital factor in selecting or applying an instrument. It is determined as its three common forms including content, construct, and criterion-related validity. 3 Since content validity is a prerequisite for other validity, it should receive the highest priority during instrument development. Validity is not the property of an instrument, but the property of the scores achieved by an instrument used for a specific purpose on a special group of respondents. Therefore, validity evidence should be obtained on each study for which an instrument is used. 4

Content validity, also known as definition validity and logical validity, 5 can be defined as the ability of the selected items to reflect the variables of the construct in the measure. This type of validity addresses the degree to which items‏ of‏ an instrument sufficiently represent the content domain. It also answers the question that to what extent the selected sample in an instrument or instrument‏ items‏ is a comprehensive sample of the content. 1 , 6 - 8 This type validity provides the preliminary evidence on construct validity of an instrument‏. 9 In addition, it can provide information on the representativeness and clarity of items and help improve an instrument through achieving recommend- dations from‏ an expert panel‏. 6 , 10 If an instrument lacks content validity, It is impossible to establish reliability for it‏. 11 On the other hand although more resources should be spent for a content validity study initially, it decreases the need for resources in the future reviews of an instrument during psychometric process‏. 1

Despite the fact that in instrument development, content validity is a critical step 12 and a trigger mechanism to link abstract concepts to visible and measurable indices, 7 it is studied superficially and transiently. This problem might be due the fact that the methods used to assess content validity in medical research literature are not referred to profoundly‏ 12 and sufficient details have rarely been provided on content validity process in a single resource. 13 It is possible that students do not realize complexities in this critical process. 12 Meanwhile, a number of experts have questioned historical legitimacy of content validity as a real type of validity. 14 - 16 These challenges about value and merit of content validity have arisen from lack of distinction between content validity and face validity, un-standardized mechanisms to determine content validity and the previously its un-quantified nature. 3 This article aims to discuss on the content validity process, to train quantifying of it with a example instrument. This is designed to measure the patient-centered communication between the patients with cancer and nurses as a key member of the health care providers in oncology wards of Iran.

Nurse-patient communication

For improving patients’ outcomes, nurses cannot perform health services such as physical cares, emotional support and exchanging information with their patients without establishing a relationship with them. 17 During recent decades, patient-centered communication was defined as a communication in which patients’ viewpoints are actively sought by a treatment team, 18 a relationship with patients, based on trust, respect, and reciprocity, and with mutually negotiated goals and expectations that can be an important support and buffer for cancer patients experiencing distress. 19

Communication serves to build and maintain this relationship, to transmit information, to provide support, and to make treatment decisions. Although patient-centered communication between providers and cancer patients can significantly affect clinical outcomes 20 and as an important element improves patient satisfaction, treatment compliance, and health outcomes, 21 , 22 however, recent evidence demonstrates that communication in cancer care may often be suboptimal, particularly with regard to the emotional experience of the patient. 23

Despite the public acceptance, there is little consensus on the meaning and operationalization of the concept of patient-centered communication, 19 , 24 so that a serious limitation is caused by lack of standard instruments to review and promote patient-centeredness in patient-healthcare communication. Part of this issue is related to the extended nature of patient-centeredness construct that has led to creating different and almost dissimilar instruments caused by researchers’ conceptualization and psychometrics. 25 Few‏ instruments can provide a comprehensive definition of this concept in cancer care and in a single tool. 26 Whereas, reviewing the literature in Iran shows that this concept has never been studied in the form of a research study. As one of the research priorities is to conduct research on cancer, 27 no quantitative and qualitative study has been carried out and no instrument has been made yet.

It is obvious that evaluating abilities of nurses in oncology wards to establish a patient-centered communication and its consequences require application of a reliable‏ instrument based on the context and culture of the target group. 26 When a new instrument is designed, measurement and report of its content validity have fundamental importance. 8 Therefore, this study was conducted to design and to examine content validity of the instrument measuring patient-centered communication in oncology wards in northwest of Iran.

Materials and methods

This methodological study is part of a larger study carried out through the exploratory mixed method research (qualitative-quantitative) to design and psychometrics the instrument measuring patient-centered communication in oncology wards in northwest of Iran. Data in the qualitative phase of study with qualitative content analysis approach was collected by semi-structured in-depth interview with 10 patients with cancer, three family members and seven oncology nurses in the Ali-Nasab and Shahid Ayatollah Qazi‏ Tabatabai‏ Hos- pitals of Tabriz and in the quantities phase of study, during a two-step process (design – judgment), the qualitative and quantities viewpoints of 15 experts were collected. 3

Ethical considerations such as approval of the ethic committee of Tabriz University of Medical Sciences, Permissions of administrators of Ali-Nasab and Shahid Ayatollah Qazi‏ Tabatabai‏ Hospitals, anonymity, informed consent, withdrawal from the study, and recording permission was respected.

Stage 1: Instrument Design

Instrument design is performed through three-‏ steps process, including determining content domain, sampling from content (item generation) and instrument construction. 11 , 14 the first step is determining the content domain of a construct that the instrument is made to measure it. Content domain is the content area related to the variables that being‏ measured. 28 It can be identified by literature review on the topic being measured, interviewing with the respondents and focus groups. Through a precise definition on the attributes and characteristics of the desired construct, a clear image of its boundaries, dimensions, and components is obtained. The qualitative research methods can also be applied to determine the variables and concepts of the pertinent construct. 29 The qualitative data collected in the interview with the respondents familiar with concept help enrich and develop what has been identified on concept, and are considered as an invaluable resource to generate instrument items. 30 To determine content domain in emotional instruments and cognitive instruments, we can use literature review and table of specifications, respectively. 3 In practice, table of specifications reviews alignment of a set of items (placed in rows) with the concepts forming the construct under study (placed in columns) through collecting quantitative and qualitative evidence from experts and by analyzing‏ data. 5 Ridenour and Newman also introduced the application of mixed method‏ (deductive- inductive) for conceptualization at the step of content domain determination and items generation. 31 However, generating items requires a preliminary task to determine the content domain an constract. 32 In addition, a useful approach would consists of returning to research questions and ensuring that the instrument items are reflect of and relevant to research questions. 33

Instrument construction is the third step in instrument design in which the items are refined and organized in a suitable format and sequence so that the finalized items are collected in a usable form. 3

Stage2: Judgment

This step entails‏ confirmation‏ by a specific number of experts, indicating that instrument items and the entire instrument have content validity. For this purpose, an expert panel is appointed. Determining the number of experts has always been partly arbitrary. At least five people are recommended to have sufficient control over chance agreement. The maximum number of judges has not been determined yet; however, it is unlikely that more than 10 people are used. Anyway, as the number of experts increases, the probability of chance agreement decreases. After determining an expert panel, we can collect and analyze their quantitative and qualitative viewpoints on the relevancy or representativeness, clarity and comprehend- siveness of the items to measure the construct operationally‏ defined by these items to ensure the content validity of the instrument. 3 , 7 , 8

Quantification of Content Validity

The content validity of instrument can be determined using the viewpoints of the panel of experts. This panel consists of content experts and lay experts. Lay experts are the potential research subjects and content experts are professionals who have research experience or work in the field. 34 Using subjects of the target group as expert ensures that the population for whom the instrument is being developed is represented 1

In qualitative content validity‏ method, content‏ experts and target group’s recommendations are adopted on observing grammar, using appropriate and correct words, applying correct and proper order of words in items and appropriate scoring. 35 However, in‏ the quantitative content validity method, confidence is maintained in selecting the most important and correct content in an‏ instrument, which is quantified by content validity‏ ratio (CVR). In this way, the experts are requested to specify whether an item is necessary for operating a construct in a set of items or not. To this end, they are requested to score each item from 1 to 3 with a three-degree range of “ not necessary,useful but not essential,essential ”‏ respectively. Content validity ratio varies between 1 and -1. The higher score indicates further agreement of members of panel on the necessity of an item in an instrument. The formula of content validity ratio is CVR=(N e - N/2)/(N/2), in which the N e is the number of panelists indicating "essential" and N is the total number of panelists. The numeric value of content validity ratio is determined by Lawshe Table. For example, in our study that is number of panelists 15 members,‏ if‏ CVR is bigger than 0.49, the item in the instrument with an acceptable level of significance‏ will‏ be‏ accepted. 36

In reports of instrument development, the most widely reported approach for content validity is the content validity index. 3 , 34 , 37 Panel members is asked to rate instrument items in terms of clarity and its relevancy to the construct underlying study as per the theoretical definitions of the construct itself and its dimensions on a 4-point ordinal scale (1[not relevant], 2[somewhat relevant], 3[quite relevant], 4[highly relevant]). 34 A table like the one shown below( Table 1 ) was added to the cover letter to guide experts for scoring method.

To obtain content validity index for relevancy and clarity of each item (I-CVIs), the number of those judging the item as relevant or clear (rating 3 or 4) was divided by the number of content experts but for relevancy, content validity index can be calculated both for item level (I-CVIs) and the scale-level (S-CVI). In item level, I-CVI is computed as the number of experts giving a rating 3 or 4 to the relevancy of each item, divided by the total number of experts.

The I-CVI expresses the proportion of agreement on the relevancy of each item, which is between zero and one 3 , 38 and the SCVI is defined as “the proportion of total items judged content valid” 3 or “the proportion of items on an instrument that achieved a rating of 3 or 4 by the content experts”. 28

Although instrument developers almost never give report what method have used to computing the scale-level index of an instrument (S-CVI). 6 There are two methods for calculating it, One method requires u niversal a greement among experts (S-CVI/ UA ), but a less conservative method is ave rages the item-level CVIs (S-CVI/Ave). For calculating them, first, the scale is dichotomized by combining values 3 and 4 together and 2 and 1 together and two dichotomous categories of responses including “ relevant and not relevant ” are formed for each item. 3 , 34 Then, in the universal agreement approach, the number of items considered relevant by all the judges (or number of items with CVI equal to 1) is divided by the total number of items. In the average approach, the sum of I-CVIs is divided by the total number of items. 10 Table 2 provides data for better understanding on calculation CVI and S-CVI by both methods. Data of table has been extracted from judges of our panel about relevancy items of dimension of trust building as a variable (subscale) in measuring construct of patient-centered communication. As the values obtained from both methods might be different, instrument makers should mention the method used for calculating it. 6 Davis proposes that researchers should consider 80 percent agreement or higher among judges for new instruments. 34 Judgment on each item is made as follows: If the I-CVI is higher than 79 percent, the item will be appropriate. If it is between 70 and 79 percent, it needs revision. If it is less than 70 percent, it is eliminated. 39

Number of items considered relevant by all the panelists=3, Number of terms=9, S-CVI/Ave *** or Average of I-CVIs=0.872, S-CVI/UA ** =3/9=.333NOTE: * Item-Content Validity Items, ** Scale-Content Validity Item/Universal agreement, *** Scale-Content Validity Item/Average Number of experts=14, Interpretation of I-CVIs: If the I-CVI is higher than 79 percent, the item will be appropriate. If it is between 70 and 79 percent, it needs revision. If it is less than 70 percent, it is eliminated.

Although content validity index is extensively used to estimate content validity by researchers, this index does not consider the possibility of inflated values because of the chance agreement. Therefore, Wynd‏ et al .,‏ propose both content validity index and multi-rater kappa statistic in content validity study‏ because, unlike the CVI, it adjusts for chance agreement. Chance agreement is an issue of concern while studying agreement indices among assessors, especially when we place four-point scoring within two relevant and not relevant classes. 7 In other words, kappa statistic is a consensus index of inter-rater agreement that adjusts for chance agreement 10 and is an important supplement to CVI because Kappa provides information about degree of agreement beyond chance. 7 Nevertheless, content validity index is mostly used by researchers because it is simple for calculation, easy to understand and provide information about each item, which can be used for modification or deletion of instrument items. 6 , 10

To calculate modified kappa statistic, the probability of chance agreement was first calculated for each item by following formula:

P C = [N! /A! (N -A)!]* . 5 N .

In this formula, N= number of experts in a panel and A= number of panelists who agree that the item is relevant.

After calculating I-CVI for all instrument items, finally, kappa was computed by entering the numerical values of probability of chance agreement (P C ) and content validity index of each item (I-CVI) in following formula:

K= (I-CVI - P C ) / (1- P C ).

Evaluation criteria for kappa is the values above 0.74, between 0.60 and 0.74, and the ones between 0.40 and 0.59 are considered as excellent, good, and fair, respectively. 40

Polit states that after controlling items by calculating adjusted kappa, each item with I-CVI equal or higher than 0.78 would be considered excellent. Researchers should note that, as the number of experts in panel increases, the probability of chance agreement diminishes and values of I-CVI and kappa converge. 10

Requesting panel members to evaluate instrument in terms of comprehensiveness would be the last step of measuring the content validity. The panel members are requested to judge whether instrument items and any of its dimensions are a complete and comprehensive sample of content as far as the theoretical definitions of concepts and its dimensions are concerned. Is it needed to eliminate or add any item? According to members’ judgment, proportion of agreement is calculated for the comprehensiveness of each dimension and the entire instrument. In so doing, the number of experts who have identified instrument comprehensiveness as favorable is divided into the total number of experts. 3 , 37

Determining face validity of an instrument

Face validity answers this question whether an instrument apparently has validity for subjects, patients and/or other participants. Face validity means if the designed instrument is apparently related to the construct underlying study. Do participants agree with items and wording of them in an instrument to realize research objectives? Face validity is related to the appearance and apparent attractiveness of an instrument, which may affect the instrument acceptability by respondents. 11 In principle, face validity is not considered as validity as far as measurement principles are concerned. In fact, it does not consider what to measure, but it focuses on the appearance of instrument. 9 To determine face validity of an instrument, researchers use respondents and experts’ viewpoints. In the qualitative method, face-to-face interviews are carried out with some members of the target groups. Difficulty level of items, desired suitability and relationship between items and the main objective of an instrument, ambiguity and misinterpretations of items, and/or incomprehensibility of the meaning of words are the issues discussed in the interviews. 41

Although content experts play a vital role in content validity, instrument review by a sample of subjects drawn from the target population is another important component of content validation. These individuals are asked to review instrument items because of their familiarity with the construct through direct personal experience. 37 Also they will be asked to identify the items they thought are the most important for them, and grade their importance on a 5-point Likert scale including very important 5 , important 4 , 2 relatively important 3 , slightly important 2 , and unimportant. In quantities method, for calculation item impact score, the first is calculated percent of patients who scored 4 or 5 to item importance (frequency), and the mean importance score of item (importance) and then item impact score of instrument items was calculated by following formula: Item Impact Score= frequency×Importance

If the item impact of an item is equal to or greater than 1.5 (which corresponds to a mean frequency of 50% and an importance mean of 3 on the 5-point Likert scale), it is maintained in the instrument; otherwise it is eliminated. 42

Results of stage1: Designing patient-centered communication measuring instrument

In the one step of our research, which was performed through qualitative content analysis by semi-structured in-depth interview with ten patients with cancer, three family members and seven oncology nurses, the results led to identifying content domain within seven dimensions including trust building, intimacy or friendship, patient activation, problem solving, emotional support, informational support, and spiritual strengthening. Each of these content domains was defined theoretically by combining qualitative study and literature review. In the item generation step, 260 items were generated from these dimensions and they were combined with 32 items obtained from literature and the related instruments. In research group, the items were studied in terms of overlapping and duplication. Finally, 188 items remained for the operational definition of the construct of patient-centered communication, and the preliminary instrument was made by 188 items (pool items) within seven dimensions.

Results of stage 2: Judgment of expert panel on validity of patient-centered communic- ation measuring instrument

In the second step and after selecting fifteen content experts including the instrument developer experts (four people), cancer research experts (four people),nurse-patient communication experts (three people) and four nurses experienced in cancer care, an expert panel was created for making quantitative and qualitative judgments on instrument items. The panel members were requested thrice to judge on content validity ratio, content validity index, and instrument comprehensiveness. In each round, they were requested to judge on face validity of instrument as well. In each round of correspondences via e-mail or in person, a letter of request was presented, which included study objectives and an account on instrument, scoring method, and required instructions on responding. Theoretical definitions of the construct underlying study, its dimensions, and items of each dimension were also mentioned in that letter. In case no reply was received for the reminder e-mail within a week, a telephone conversation would be made or a meeting would be arranged.

In the first round of judgment, 108 items out of 188 instrument items were eliminated. These eliminated items had content validity ratio lower than 0.49, (according to the expert numbers in our study that was 15, numerical values of the Lawshe table was 0.49) or those which combined to remained items based on the opinion of content experts through editing of item. Table 3 shows a sample of instrument items and CVR calculation method for them.

NOTE: * Number of experts evaluated the item essential, ** CVR or Content Validity Ratio = (N e -N/2)/(N/2) with 15 person at the expert panel (N=15), the items with the CVR bigger than 0.49 remained at the instrument and the rest eliminated.

The remaining items were modified according to the recommendations of panel members in the first round of judgment and for a second time to determine content validity index and instrument modification, the panel members were requested to judge by scoring 1 to 4 on the relevancy and clarity of instrument items according to Waltz and Bussel content validity index. 38

In the second round, the proportion of agreement among panel members on the relevancy and clarity of 80 remaining items of the first round of judgment was calculated.

To obtain content validity index for each item, the number of those judging the item as relevant was divided by the number of content experts (N=14). (As one of the 15 members of panel had not scored some items, the analyses were made by 14 judges). This work was also carried out to clarify the items of the instrument. The agreement among the judges for the entire instrument was only calculated for relevancy according to average and universal agreement approach.

In this round, among the 80 instrument items, 4 items with a CVI score lower than 0.70 were eliminated. Eight items with a CVI between 0.70 and 0.79 were modified (Modification of items was performed according to the recommendation of panel members and research group forums). Two items were eliminated despite having favorable CVI scores, one of which was eliminated due to ethical issues (As some content experts believed that the item “ I often think to death but I don’t speak about it with my nurses .” might cause moral harm to a patient, it was eliminated). On another eliminated item, “ Nurses know that how to communicate with me ”, some experts believed that if that item is eliminated, it would not harm the definition of trust building dimension. According to experts suggestions, an item ( Nurses try that I face no problem during care ) was added in this round. After modification, the instrument containing 57 items was sent to the panel members for the third time to judge on the relevancy, clarity and comprehensiveness of the items in each dimension and need for deletion or addition of the items. In this round, four items had a CVI lower than 0.70, which were eliminated.

The proportion of agreement among the experts was also calculated in this round in terms of comprehensiveness for each dimension of the construct underlying study. Table 4 shows the calculation of I-CVI, S-CVI and modified kappa for items in the instrument for 53 remaining items at the end of the third round of judgment. We also used panel members’ judgment on the clarity of items as well as their recommendations on the modification of items.

NOTE : * I-CVI: item-level content validity index, ** p c (probability of a chance occurrence) was computed using the formula: p c = [N! /A! (N -A)!] * .5 N where N= number of experts and A= number of panelists who agree that the item is relevant. Number of experts=14, *** K(Modified Kappa) was computed using the formula: K= (I-CVI- P C )/(1- P C ). Interpretation criteria for Kappa, using guidelines described in Cicchetti and Sparrow (1981): Fair=K of 0.40 to 0.59; Good=K of 0.60 to 0.74; and Excellent=K>0.74

Face validity results of patient-centered communication measuring instrument

A sample of 10 people of patients with cancer who had a long-term history of hospitalization in oncology wards (lay experts) was requested to judge on the importance, simplicity and understandability of items in an interview with one of the members of research team. According to their opinions, to make some items more understandable, objective examples were included in an item. “For instance, the item “N urses try not to cause any problem for me ” was changed into “ During care (e.g. preparation an intravenous line), Nurses try not to cause any problem for me ”. The item “Care decisions are made without paying attention to my needs ” was changed to “ Nurses didn’t ask my opinion about care(e.g. time of care or type of interventions) ”. In addition the quantitative analysis was also performed as calculating impact score of each item. Nine items had item impact score less than 1.5 and they were eliminated from the final instrument for preliminary test. Finally, at the end of the content validity and face validity process, our instrument was prepared with seven dimensions and 44 items for the next steps and doing the rest of psychometric testing.

Present paper demonstrates quantities indices for content validity a new instrument and outlines them during design and psychometrics of patient centered communication measuring instrument. It should be said that validation is a lengthy process, in the first-step of which, the content validity should be studied and the following analyses should be directed include reliability evaluation (through internal consistency and test-retest), construct validity (through factor analysis) and criterion-related validity. 37

Some limitations of content validity studies should be noted, Experts’ feedback is subjective; thus, the study is subjected to bias that may exist among the experts. If content domain is not well identified, this type of study does not necessarily identify content that might have been omitted from the instrument. However, experts are asked to suggest other items for the instrument, which may help minimize this limitation. 11

Content validity study is a systematic, subjective and two-stage process. In the first stage, instrument design is carried out and in the second stage, judgment/quantification on instrument items is performed and content experts study the accordance between theoretical and operational definitions. Such process should be the leading study in the process of making instrument to guarantee instrument reliability and prepare a valid instrument in terms of content for preliminary test phase. Validation is a lengthy process, in the first step of which, the content validity should be studied. The following analyses should be directed include reliability evaluation (through internal consistency and test-retest), construct validity by factor analysis and criterion-related validity. Meanwhile, we showed that although content validity is a subjective process, it is possible to objectify it.

Understanding content validity is important for clinician groups and researchers because they should realize if the instruments they use for their studies are suitable for the construct, population under study, and socio-cultural background in which the study is carried out, or there is a need for new or modified instruments.

Training on content validity study helps students, researchers, and clinical staffs better understand, use and criticize research instruments with a more accurate approach.

In general, content validity study revealed that this instrument enjoys an appropriate level of content validity. The overall content validity index of the instrument using a conservative approach (universal agree- ment approach) was low; however, it can be advocated with respect to the high number of content experts that makes consensus difficult and high value of the S-CVI with the average approach, which was equal to 0.93.

Acknowledgments

The researchers appreciate patients, nurses, managers, and administrators of Ali-Nasab and Shahid Ayatollah Qazi Tabatabaee hospitals. Approval to conduct this research with no. 5/74/474 was granted by the Hematology and Oncology Research Center affiliated to Tabriz University of Medical Sciences.

Ethical issues

None to be declared.

Conflict of interest

The authors declare no conflict of interest in this study.

Content Validity in Research: Definition & Examples

Charlotte Nickerson

Research Assistant at Harvard University

Undergraduate at Harvard University

Charlotte Nickerson is a student at Harvard University obsessed with the intersection of mental health, productivity, and design.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

  • Content validity is a type of criterion validity that demonstrates how well a measure covers the construct it is meant to represent.
  • It is important for researchers to establish content validity in order to ensure that their study is measuring what it intends to measure.
  • There are several ways to establish content validity, including expert opinion, focus groups , and surveys.

content validity

What Is Content Validity?

Content Validity is the degree to which elements of an assessment instrument are relevant to a representative of the targeted construct for a particular assessment purpose.

This encompasses aspects such as the appropriateness of the items, tasks, or questions to the specific domain being measured and whether the assessment instrument covers a broad enough range of content to enable conclusions to be drawn about the targeted construct (Rossiter, 2008).

One example of an assessment with high content validity is the Iowa Test of Basic Skills (ITBS). The ITBS is a standardized test that has been used since 1935 to assess the academic achievement of students in grades 3-8.

The test covers a wide range of academic skills, including reading, math, language arts, and social studies. The items on the test are carefully developed and reviewed by a panel of experts to ensure that they are fair and representative of the skills being tested.

As a result, the ITBS has high content validity and is widely used by schools and districts to measure student achievement.

Meanwhile, most driving tests have low content validity.  The questions on the test are often not representative of the skills needed to drive safely. For example, many driving permit tests do not include questions about how to parallel park or how to change lanes.

Meanwhile, driving license tests often do not test drivers in non-ideal conditions, such as rain or snow. As a result, these tests do not provide an accurate measure of a person’s ability to drive safely.

The higher the content validity of an assessment, the more accurately it can measure what it is intended to measure — the target construct (Rossiter, 2008).

Why is content validity important in research?

Content validity is important in research as it provides confidence that an instrument is measuring what it is supposed to be measuring.

This is particularly relevant when developing new measures or adapting existing ones for use with different populations.

It also has implications for the interpretation of results, as findings can only be accurately applied to groups for which the content validity of the measure has been established.

Step-by-step guide: How to measure content validity?

Haynes et al. (1995) emphasized the importance of content validity and gave an overview of ways to assess it.

One of the first ways of measuring content validity was the Delphi method, which was invented by NASA in 1940 as a way of systematically creating technical predictions. 

The method involves a group of experts who make predictions about the future and then reach a consensus about those predictions. Today, the Delphi method is most commonly used in medicine.

In a content validity study using the Delphi method, a panel of experts is asked to rate the items on an assessment instrument on a scale. The expert panel also has the opportunity to add comments about the items.

After all ratings have been collected, the average item rating is calculated. In the second round, the experts receive summarized results of the first round and are able to make further comments and revise their first-round answers.

This back-and-forth continues until some homogeneity criterion — similarity between the results of researchers — is achieved (Koller et al., 2017).

Lawshie (1975) and Lynn (1986) created numerical methods to assess content validity. Both of these methods require the development of a content validity index (CVI). A content validity index is a statistical measure of the degree to which an assessment instrument covers the content domain of interest.

There are two steps in calculating a content validity index:

  • Determining the number of items that should be included in the assessment instrument;
  • Determining the percentage of items that actually are included in the assessment instrument.

The first step, determining the number of items that should be included in an assessment instrument, can be done using one of two approaches: item sampling or expert consensus.

Item sampling involves selecting a sample of items from a larger set of items that cover the content domain. The number of items in the sample is then used to estimate the total number of items needed to cover the content domain.

This approach has the advantage of being quick and easy, but it can be biased if the sample of items is not representative of the larger set (Koller et al., 2017).

The second approach, expert consensus, involves asking a group of experts how many items should be included in an assessment instrument to adequately cover the content domain. This approach has the advantage of being more objective, but it can be time-consuming and expensive.

Experts are able to assign these items to dimensions of the construct that they intend to measure and assign relevance values to decide whether an item is a strong measure of the construct.

Although various attempts to numerize the process of measuring content validity exist, there is no systematic procedure that could be used as a general guideline for the evaluation of content validity (Newman et al., 2013).

When is content validity used?

Education assessment.

In the context of educational assessment, validity is the extent to which an assessment instrument accurately measures what it is intended to measure. Validity concerns anyone who is making inferences and decisions about a learner based on data.

This can have deep implications for students’ education and future. For instance, a test that poorly measures students’ abilities can lead to placement in a future course that is unsuitable for the student and, ultimately, to the student’s failure (Obilor, 2022).

There are a number of factors that specifically affect the validity of assessments given to students, such as (Obilor, 2018):

  • Unclear Direction: If directions do not clearly indicate to the respondent how to respond to the tool’s items, the validity of the tool is reduced.
  • Vocabulary: If the vocabulary of the respondent is poor, and he does not understand the items, the validity of the instrument is affected.
  • Poorly Constructed Test Items: If items are constructed in such a way that they have different meanings for different respondents, validity is affected.
  • Difficulty Level of Items: In an achievement test, too easy or too difficult test items would not discriminate among students, thereby lowering the validity of the test.
  • Influence of Extraneous Factors: Extraneous factors like the style of expression, legibility, mechanics of grammar (spelling, punctuation), handwriting, and length of the tool, amongst others, influence the validity of a tool.
  • Inappropriate Time Limit: In a speed test, if enough time limit is given, the result will be invalidated as a measure of speed. In a power test, an inappropriate time limit will lower the validity of the test.

There are a few reasons why interviews may lack content validity . First, interviewers may ask different questions or place different emphases on certain topics across different candidates. This can make it difficult to compare candidates on a level playing field.

Second, interviewers may have their own personal biases that come into play when making judgments about candidates.

Finally, the interview format itself may be flawed. For example, many companies ask potential programmers to complete brain teasers — such as calculating the number of plumbers in Chicago or coding tasks that rely heavily on theoretical knowledge of data structures — even if this knowledge would be used rarely or never on the job.

Questionnaires

Questionnaires rely on the respondents’ ability to accurately recall information and report it honestly. Additionally, the way in which questions are worded can influence responses.

To increase content validity when designing a questionnaire, careful consideration must be given to the types of questions that will be asked.

Open-ended questions are typically less biased than closed-ended questions, but they can be more difficult to analyze.

It is also important to avoid leading or loaded questions that might influence respondents’ answers in a particular direction. The wording of questions should be clear and concise to avoid confusion (Koller et al., 2017).

Is content validity internal or external?

Most experts agree that content validity is primarily an internal issue. This means that the concepts and items included in a test should be based on a thorough analysis of the specific content area being measured.

The items should also be representative of the range of difficulty levels within that content area. External factors, such as the opinions of experts or the general public, can influence content validity, but they are not necessarily the primary determinant.

In some cases, such as when developing a test for licensure or certification, external stakeholders may have a strong say in what is included in the test (Koller et al., 2017).

How can content validity be improved?

There are a few ways to increase content validity. One is to create items that are more representative of the targeted construct. Another is to increase the number of items on the assessment so that it covers a greater range of content.

Finally, experts can review the items on the assessment to ensure that they are fair and representative of the skills being tested (Koller et al., 2017).

How do you test the content validity of a questionnaire?

There are a few ways to test the content validity of a questionnaire. One way is to ask experts in the field to review the questions and provide feedback on whether or not they believe the questions are relevant and cover all important topics.

Another way is to administer the questionnaire to a small group of people and then analyze the results to see if there are any patterns or themes emerging from the responses.

Finally, it is also possible to use statistical methods to test for content validity, although this approach is more complex and usually requires access to specialized software (Koller et al., 2017).

How can you tell if an instrument is content-valid?

There are a few ways to tell if an instrument is content-valid. The first of these involves looking at two subsets of content validity: face and construct validity.

Face validity is a measure of whether or not the items on the test appear to measure what they claim to measure. This is highly subjective but convenient to assess.

Another way is to look at the construct validity, which is whether or not the items on the test measure what they are supposed to measure. Finally, you can also look at the criterion-related validity, which is whether or not the items on the test predict future performance.

What is the difference between content and criterion validity?

Content validity is a measure of how well a test covers the content it is supposed to cover.

Criterion validity, meanwhile, is an index of how well a test correlates with an established standard of comparison or a criterion.

For example, if a measure of criminal behavior is criterion valid, then it should be possible to use it to predict whether an individual will be arrested in the future for a criminal violation, is currently breaking the law, and has a previous criminal record (American Psychological Association).

Are content validity and construct validity the same?

Content validity is not the same as construct validity.

Content validity is a method of assessing the degree to which a measure covers the range of content that it purports to measure.

In contrast, construct validity is a method of assessing the degree to which a measure reflects the underlying construct that it purports to measure.

It is important to note that content validity and construct validity are not mutually exclusive; a measure can be both valid and invalid with respect to content and construct.

However, content validity is a necessary but not sufficient condition for construct validity. That is, a measure cannot be construct valid if it does not first have content validity (Koller et al., 2017).

For example, an academic achievement test in math may have content validity if it contains questions from all areas of math a student is expected to have learned before the test, but it may not have construct validity if it does not somehow relate to tests of similar and different constructs.

How many experts are needed for content validity?

There is no definitive answer to this question as it depends on a number of factors, including the nature of the instrument being validated and the purpose of the validation exercise.

However, in general, a minimum of three experts should be used in order to ensure that the content validity of an instrument is adequately established (Koller et al., 2017).

American Psychological Association. (n.D.). Content Validity. American Psychological Association Dictionary.

Haynes, S. N., Richard, D., & Kubany, E. S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological assessment , 7 (3), 238.

Koller, I., Levenson, M. R., & Glück, J. (2017). What do you think you are measuring? A mixed-methods procedure for assessing the content validity of test items and theory-based scaling. Frontiers in psychology , 8 , 126.

Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel psychology , 28 (4), 563-575.

Lynn, M. R. (1986). Determination and quantification of content validity. Nursing research .

Obilor, E. I. (2018). Fundamentals of research methods and Statistics in Education and Social Sciences. Port Harcourt: SABCOS Printers & Publishers.

OBILOR, E. I. P., & MIWARI, G. U. P. (2022). Content Validity in Educational Assessment.

Newman, Isadore, Janine Lim, and Fernanda Pineda. “Content validity using a mixed methods approach: Its application and development through the use of a table of specifications methodology.” Journal of Mixed Methods Research 7.3 (2013): 243-260.

Rossiter, J. R. (2008). Content validity of measures of abstract constructs in management and organizational research. British Journal of Management , 19 (4), 380-388.

Print Friendly, PDF & Email

  • Open access
  • Published: 23 April 2024

Longitudinal validation of cognitive reserve proxy measures: a cohort study in a rural Chinese community

  • Hao Chen 1 , 3   na1 ,
  • Jin Hu 1   na1 ,
  • Shiqi Gui 1 ,
  • Qiushuo Li 1 ,
  • Jing Wang 1 ,
  • Xing Yang 2 &
  • Jingyuan Yang 1  

Alzheimer's Research & Therapy volume  16 , Article number:  87 ( 2024 ) Cite this article

244 Accesses

Metrics details

While evidence supports cognitive reserve (CR) in preserving cognitive function, longitudinal validation of CR proxies, including later-life factors, remains scarce. This study aims to validate CR’s stability over time and its relation to cognitive function in rural Chinese older adults.

Within the project on the health status of rural older adults (HSRO), the survey included baseline assessment (2019) and follow-up assessment (2022). 792 older adults (mean age: 70.23 years) were followed up. The confirmatory factor analysis (CFA) was constructed using cognitive reserve proxies that included years of formal education, social support, hobbies, and exercise. We examined the longitudinal validity of the CR factor using confirmatory factor analyses and measurement invariance and explored the association of CR with cognition using Spearman’s correlation and Generalized Estimating Equations (GEE).

The results showed that CR’s CFA structure was stable over time (T0, χ 2 / df : 3.21/2; RMSEA: 0.02, and T1, χ 2 / df : 7.47/2; RMSEA: 0.05) and that it accepted both configural and metric invariance (Δχ 2 / df  = 2.28/3, P  = 0.52). In addition, it was found that CR had a stable positive relationship with cognitive function across time (T0, r  = 0.54; T1, r  = 0.49). Furthermore, longitudinal CR were associated with MMSE ( β  = 2.25; 95%CI  = 2.01 ~ 2.49).

Conclusions

This study provided valuable evidence on the stability and validity of cognitive reserve proxy measures in rural Chinese older adults. Our findings suggested that cognitive reserve is associated with cognitive function over time and highlighted the importance of accumulating cognitive reserve in later life.

Introduction

As the world’s aging population continues to grow and dementia prevalence increases, the prevention and treatment of dementia have become a top priority for society worldwide [ 1 , 2 ]. China, with its large aging population and high prevalence of dementia, faces an urgent need to address this issue [ 3 , 4 ]. Although no treatment is available to slow or stop dementia, prevention of cognitive decline is an important strategy. Cognitive reserve (CR), as a fascinating concept, emphasizing the capacity of lifestyle choices and life events throughout one’s life to positively influence and enhance cognitive processes, thereby bolstering efficiency and flexibility in addressing cognitive decline [ 5 ]. Accumulated evidence indicated that CR could enhance cognitive adaptability and reduce sensitivity to brain aging, pathology, or injury, delaying clinical symptoms [ 5 , 6 , 7 ].

Nevertheless, since CR cannot be directly measured, it is generally operationalized using proxies such as education, occupation, physical exercise, and social activities [ 8 , 9 ]. Even as numerous studies have shown an association between CR-related proxies and cognitive function, there is heterogeneity in the specific proxies used for CR assessment across different populations. These proxy factors may reflect the unique characteristics and contexts of the studied populations. For instance, studies have shown that education alone is associated with cognition in some populations, while other proxies such as leisure activities or occupation may not exhibit a significant relationship [ 10 , 11 ]. This highlights the importance of considering population-specific factors when examining the relationship between CR-related proxies and cognitive function. Researchers found that higher childhood school performance and engagement in complex job environments during adulthood were associated with a reduced risk of dementia [ 12 ]. Another longitudinal study found that higher social support and engagement in leisure activities improve cognitive reserve in old age [ 13 ]. This underscores the importance of exploring CR-related proxies at different stages of life to understand their contributions to cognitive reserve [ 14 ]. The above studies imply that although education and occupation in early life are prerequisites for cognitive reserve in older adults, additional proxy indicators of cognitive reserve in later life may contribute to its enhancement, offering a new perspective for older adults facing declining cognition. However, it is crucial to exercise caution when interpreting changes in proxy measures, as cognitive reserve itself is not directly measurable. These proxy measures serve as indicators but may not fully capture the true changes in cognitive reserve. Therefore, establishing longitudinal measures to track the changes in proxy indicators of cognitive reserve over time and assessing their structural validity remain areas requiring further development. However, there has been little progress in establishing longitudinal measures of the change in proxy measures of CR over time and assessing structural validity [ 14 , 15 , 16 ].

Measurement invariance techniques are often used in the field of psychology to check the stability of latent measures across time, groups, and ethnicities [ 17 ]. It also is a way to enhance the fairness and validity of neurocognitive ability tests, and although this method is well established for use, it has not yet fully realized its potential in cognition [ 18 ]. According to the existing literature, these techniques have not yet been applied in studies of the CR model. In addition, recent studies have shown that older adults with dementia have lower levels of education and lower levels of occupational complexity as well [ 19 ], and rural older adults have worse CR and cognitive function compared to their urban counterparts [ 20 ]. In China, many rural older adults have low levels of education and have only worked in agriculture during their early life. Therefore, validating the longitudinal effectiveness of cognitive reserve in later life is crucial to confirm its value in delaying cognitive decline in this population. To address this knowledge gap, this study aims to investigate cognitive reserve proxy measures in older adults within a rural Chinese community, validate the structural stability of these measures over time, and estimate their relationship with cognitive function.

Materials and methods

Study design and participants.

This is a cohort from the Guizhou rural older adults’ health study (HSRO) in China. The HSRO is a population-based prospective study conducted in Guizhou, China. The data were obtained using multistage cluster sampling; 12 villages were selected, and the baseline survey was conducted from July to August 2019. Participants were eligible if they were 60-year-old community volunteers who had lived in the area for at least 6 months. The study employed a two-wave (T0-T1) longitudinal survey design. This study included 1,654 older adults who were assessed for cognitive reserve-related proxy measures at baseline. In 2022 (T1), 792 individuals participated in the follow-up surveys. The study was approved by the Ethics Committee of Guizhou Medical University, and all the participants signed informed consent.

Measurement

  • Cognitive reserve

(CR) is a theoretical framework that aims to understand the protective factors contributing to cognitive abilities in individuals. In our study, we collected data on four proxies of cognitive reserve: years of education, social support, hobbies, and exercise. To analyze these variables, we employed confirmatory factor analysis, which allowed us to construct a latent variable model representing cognitive reserve. This approach helps us examine the relationship between these proxies and their collective influence on cognitive abilities. Education was measured by 1 item; subjects reported the total number of years at school. The Social Support Rating Scale(SSRS), developed by Xiao [ 21 ], was used to measure the amount of social support. It had three dimensions (subjective support, objective support, and support utilization). There were 10 items in SSRS. And seven questions were answered on a four-point Likert scale, while the remaining questions were answered differently (calculating the number of support sources). Participants were asked a series of questions regarding their engagement in various hobbies and activities, including housework, outdoor activities (e.g., fishing, hiking), gardening, reading books and newspapers, raising poultry or livestock, playing cards or mahjong, watching TV and listening to the radio, participating in organized social activities (e.g., square dancing), as well as indicating if they had no hobbies or engaged in other hobbies not mentioned [ 22 ]. The questionnaire comprised ten items, including an item for indicating the absence of hobbies. The number of hobbies was calculated by assigning scores to the remaining items. The exercise component of the questionnaire was designed based on the common exercise durations of 30 and 60 min for the elderly population [ 23 ]. Participants were asked to indicate the amount of time they spent exercising each day using the following response options: (1) never, (2) 0–30 min, (3) 30–60 min, and (4) more than 60 min.

The Chinese version of the Mini-Mental State Examination (MMSE) scale was used to evaluate individuals’ cognition [ 24 , 25 ]. The test includes 11 items, and the scores can immediately reflect global cognition in clinical, research, and community settings. The scores range from 0 to 30. The changes in cognitive function observed during the follow-up period were categorized into two groups: one group with no reduction in cognitive function (Maintenance) and another group with a decline in cognitive function (Decline).

The smoking category was divided into 3 categories: current smoking (defined as a total of > 100 cigarettes smoked in the past year), ever smoking (including quitting smoking for > 6 months) and never smoking. The alcohol consumption category was divided into 3 categories: regular drinking (defined as drinking on an average of ≥ 3–5 days per week in the past year) or ever/occasional drinking (defined as drink on an average of ≤ 1–2 days per week in the past year), and never drink. Participants in the study were asked which chronic diseases they were diagnosed with, and the number of chronic conditions was counted. Boxes are provided to ask participants if they have specific chronic diseases. These listed the chronic diseases included in the questionnaire, such as arthritis, hypertension, cardiovascular disease, stomach disease, cataracts, chronic lung disease, diabetes, asthma, reproductive disorders, and cancer. In addition, space was provided for participants to write down any other chronic diseases that were not listed.

Statistical analysis

Frequency and median (Interquartile Range (IQR), or range) were used to describe demographic characteristics. Non-parametric tests were employed to analyze the data. The Wilcoxon’s signed rank test was utilized for within-group comparisons of continuous variables with repeated measures, such as comparing baseline and follow-up data within the same group. On the other hand, the Marginal Homogeneity (MH) test was used for longitudinal comparisons between different groups, examining the differences in data distributions across different time points.

To capture proxy factor data for cognitive reserve (CR) more accurately, we utilized continuous information as the preferred form. Confirmatory factor analysis (CFA) was conducted separately for the baseline and follow-up assessments to test model fit. CFA of the CR proxy factor structure evaluation produced eight indicators of goodness of fit: Chi square/df, Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Normed Fit Index (NFI), Incremental Fit Index (IFI), Akaike information criterion (AIC), and Bayes information criterion (BIC). The following cut-off criteria for the fit index were used: (1) NFI > 0.90; (2) IFI > 0.90; (3) TLI > 0.90; (4) CFI > 0.90, and (5) RMSEA < 0.05; (6) Chi square/df < 5 [ 26 ]. For measurement invariance, a longitudinal two-group CFA was performed, testing for four increasingly stringent types of invariances: configuration, metric, scalar, and strict. Configural invariance was satisfied when indicator variables loaded onto the same factors across groups. Metric invariance is satisfied with adequate model fit when factor loadings remain constant across groups. Scalar invariance is satisfied when factor loadings and intercepts are held constant across groups when model fit is adequate. Strict invariance is satisfied when the factor loadings, intercepts, and residuals are constrained to be equal across groups when the model fit is adequate [ 27 ].

The factor scores for CR were obtained using the maximum likelihood method. Longitudinal changes in cognitive function were calculated using the formula ΔMMSE = MMSE T1 - MMSE T0 , where MMSE T1 represents the follow-up Mini-Mental State Examination (MMSE) score and MMSE T0 represents the baseline MMSE score. Similarly, cognitive reserve was calculated as ΔCR = CR T1 - CR T0 , where CR T1 represents the follow-up cognitive reserve score and CR T0 represents the baseline cognitive reserve score. Based on the above results, scores with the ΔMMSE greater than or equal to 0 were divided into the maintenance group, and those less than 0 were divided into the cognitive decline group. Similarly, the ΔCR was categorized into two groups: the group with increased CR (positive ΔCR) and the group with decreased CR (negative ΔCR). The Spearman’s correlation coefficients between CR scores and MMSE scores were calculated, and the Fisher Z method was used to estimate the significance of the difference between the longitudinal correlation coefficients. This test involves comparing the standard error of the difference between the two coefficients to the difference between the coefficients and calculating a Z-score. If the Z-score is larger than a critical value, the difference between the two coefficients is considered statistically significant [ 28 ]. Generalized estimating equations were employed to analyze the longitudinal relationship and interaction effects between cognitive reserve and cognition. The statistical analyses were performed using SPSS (IBM, Armonk, NY, USA, version 22.0) and R software (package: Lavaan, semTools, version 4.2.2).

A total of 792 study subjects entered the follow-up. The mean (SD) age at baseline assessment was 70.23 (5.87) years. There were 318 males and 474 females in the follow-up data; 96.7% of the subjects’ occupations were farmer only, and 81.2% had received less than six years of education (Table S1 ). At baseline and follow-up, there were statistically significant differences in the SSRS scores, the number of hobbies, the time spent being physically active, and the MMSE scores (Table  1 ).

The CR latent variables constituted by the CFA are shown in Figure S1 ; the highest factor loadings were for hobbies (0.69) at baseline and for social support (0.47) at follow-up. There were excellent goodness-of-fit results at T0 (χ 2 / df : 3.21/2; RMSEA: 0.02; CFI: 0.99; TLI: 1.00; NFI: 0.99; IFI: 1.00) and T1 (χ 2 / df : 7.47/2; RMSEA: 0.05; CFI: 0.96; TLI: 0.87; NFI: 0.94; IFI: 0.96) as presented in Table S2 . By adding constraints (Table  2 ; Tables S3 - S6 ), the model passes only configuration invariance and metric invariance (Δ (Metric – Configural model): Δχ 2  = 2.28; Δ df  = 3; ΔRMSEA= -0.012; ΔCFI = 0.003; ΔTLI = 0.028).

CR model factor scores were significantly positively correlated with cognitive function, either at baseline or follow-up (Fig.  1 ). The cognitive maintenance group exhibited a higher positive ΔCR compared to the cognitive decline group (Figure S2 ). When the longitudinal changes in the correlation coefficient between MMSE and cognitive reserve (CR) were examined, no significant difference in the correlation coefficient was seen for either the increased or decreased CR groups (Figure S3 ). Similarly, in Fig.  2 (A, B), no significant longitudinal changes in correlation coefficients were identified in the cognitive decline group. However, in the cognitive maintenance group, a statistically significant difference in the longitudinal correlation coefficient between MMSE and CR was detected ( P  < 0.05). Further age stratification (referenced to baseline age) showed that the correlations between CR and MMSE scores over time were statistically different for subjects ages 60–69 ( N  = 156; T0: r  = 0.51; T1: r  = 0.35) and 70–79 ( N  = 157; T0: r  = 0.63; T1: r  = 0.48) in the cognitive maintenance group (Fig.  2 ). Generalized estimating equations revealed longitudinal associations between CR and cognitive functioning. Further analyses indicated that the relationships between CR and MMSE scores differed significantly across cognitive subgroups. Interactions were also observed with both sex and age (Table  3 ).

figure 1

Correlation of CR with MMSE scores in two waves of study. Note : T0 for baseline, T1 for follow-up

figure 2

Comparison of longitudinal correlation coefficients between CR and MMSE in different cognitive groups. Note : ( A ) and ( B ) show the longitudinal comparison of the Z-transformed correlation coefficients of the cognitive maintenance group and the decline group for all subjects; ( C ), ( D ) are longitudinal comparisons for the 60–69 age group; ( E ), ( F ) are longitudinal comparisons for the 70–79 age group; ( G ), ( H ) are longitudinal comparisons for the ≥80 age group. T0 for baseline, T1 for follow-up

This study assessed the longitudinal stability and validity of proxy indicators of cognitive reserve in a rural Chinese community. Building upon Cognitive Reserve (CR) theory, our study identified a set of CR proxies. The Confirmatory Factor Analysis (CFA) model demonstrated a high fit at two separate time points, and the longitudinal structure confirmed the configuration and metric invariance of measurement. Subsequent analysis revealed robust positive correlations between the CR model’s factor scores and cognitive function. Further analysis showed that the factor scores of the CR model were robustly positively correlated with cognitive function. As far as we know, this is the first study to focus on measurement invariance for the longitudinal validation of the assessment CR model.

In recent years, Kartschmit et al. [ 14 ] summarized the shortcomings of currently available CR assessment tools and concluded that it was necessary to extend the investigation to different populations due to their different experiences in terms of CR proxy parameters. People from various cultural and lifestyle backgrounds may have a diverse variety of proxy parameters to improve the CR. This study conducted on older adults in rural communities in China, most of whom have low education and have been farmers all their lives. They may also have the assumption that certain exposures relatively late in life can also contribute to CR. Similarly, studies have shown that occupation is not associated with cognition in subjects with low levels of education in developing countries [ 11 ]. The present investigation found significant high levels of CFA fit indices at baseline and at follow-up. A Healthy Brain Project cohort found consistent results, demonstrating the stability of the longitudinal structure of the CFA in CR [ 16 ]. Moreover, this study applied measurement invariance, aiming to ensure reliable conclusions about real CR changes across time. According to measurement invariance conventions and reporting, this study accepted both configural invariance and metric invariance. However, while full invariance is preferred, it may not always be achieved. This partial invariance could be attributed to various factors, such as changes in the sample composition or modifications in the measures employed [ 29 ]. Similarly, some cognitive-related studies have failed to meet the most stringent invariance steps, finding significant changes in intercepts and residuals over time [ 30 ]. These differences may be attributed to sample characteristics, and there may indeed be real differences across time in the CR model.

Consistent with previous longitudinal studies of CR [ 31 , 32 ], this study supported the theory of the CR model that CR-related proxies were positively associated with cognitive function at either baseline or follow-up. Furthermore, our findings indicated that older adults in the cognitive maintenance group demonstrated a higher ΔCR, suggesting that the long-term accumulation of cognitive reserve may contribute to the preservation of cognitive performance at a stable level. This aligns with previous research suggesting that the maintenance of cognitive function is associated with cognitive reserve [ 33 ]. In the cognitive maintenance group, our findings revealed a notable decrease in the correlation coefficients between CR and MMSE scores over time, including different age groups. This intriguing observation may be aligned with the notion put forth by Montine et al. [ 34 ], suggesting that cognitive reserve “consumption” is manifested in cognitive performance prior to the onset of cognitive decline. Thus, the observed decline in correlation coefficients may indicate the utilization or “consumption” of cognitive reserve resources in maintaining cognitive performance at a stable level. Conversely, in the cognitive decline group, we observed no significant changes in the correlation coefficients between CR and MMSE scores over time. This finding suggests that, in the context of age-related cognitive decline, cognitive networks may undergo complex and dynamic processes involving the recruitment of additional neural resources for compensation. This observation aligns with the hypothesis proposed by Cabeza et al. [ 35 ] and implies that maintenance and compensation mechanisms could potentially occur simultaneously. Notably, the GEE model results similarly demonstrated statistically significant differences in the associations between CR and MMSE across cognitive subgroups, with interactions observed for both age groups and sexes. However, further research is needed to fully elucidate the intricate dynamics of these processes and their impact on the association between cognitive reserve and cognitive function. It is important to acknowledge that the concepts surrounding cognitive reserve remain subjects of ongoing debate and warrant further investigation through longitudinal studies. Furthermore, our study uncovered stable correlation coefficients between CR and MMSE scores in both the groups with increased and decreased CR over time. These intriguing findings suggest that changes in CR accumulation over time may not significantly impact the association with cognitive function. In other words, our results do not support the assumption that a greater accumulation of cognitive reserve necessarily translates to a stronger correlation with cognitive abilities. However, in the CR increased group, the intercept difference in cognitive level at different times was large, compared to the CR decreased group. Whether this is consistent with the existing evidence finding a more rapid rate of exacerbated cognitive decline in subjects with higher reserves requires further follow-up [ 36 ].

In terms of factor loading in longitudinal CFA, social support and hobbies have better factor loading than physical activity across time. Consistent with a cross-sectional study by the China Health and Retirement Project, older adults participating in hobby groups have better cognitive performance [ 37 ]. The low factor loading of physical activity in the reserve model may be related to the fact that older people in rural China spend more than half of each day with sedentary behavior [ 38 ]. While longer daily physical exercise would be expected to have positive effects on cognitive resilience, the influence of sedentary behavior over a significant portion of the day could potentially attenuate these effects. The majority of our participants, although having a background in farming, are not currently engaged in active agricultural work. This demographic shift from active farming to less physically demanding daily activities may contribute to the observed sedentary lifestyle, which is consistent with the lower factor loading of physical activity in our cognitive reserve model. Social support’s higher factor loading compared to physical activity likely results from rural communities’ strong social bonds, providing a steadier and more significant boost to cognitive reserve than inconsistent physical activity in individuals moving away from labor-intensive work. However, these findings should be interpreted with caution, acknowledging the need for further research to unravel the complex interplay between sedentary behavior, cognitive reserve, and the context of rural Chinese older adults. Additionally, the possibility of reverse causation, where cognitive decline might lead to reduced physical and social activities, calls for more in-depth investigation in future studies to clarify these intricate relationships.

A noteworthy strength of our study utilization of latent variables allows capturing more current CR-related factors each time to reduce recall bias and to ensure that CR measurements are valid across life stages. Nevertheless, some limitations of our study should be noted. Firstly, while the cognitive reserve (CR) proxies used in our study are commonly utilized indicators, they may not fully capture the CR in our specific population of rural older adults. This could potentially limit the generalizability of our findings to other populations or settings. Secondly, the small sample size of very old older adults and the limited number of follow-up waves may have affected the statistical power of our analysis and hindered our ability to capture the dynamic changes in CR over time. It is important to acknowledge that a larger sample size and a longer follow-up period would provide a more robust assessment of the relationship between CR and cognitive outcomes. Thirdly, the measurement of cognitive function using tools like the Mini-Mental State Examination (MMSE) is subject to measurement errors and may have ceiling effects, particularly in populations with high baseline cognitive performance. This could limit our ability to detect subtle changes in cognitive performance and affect the accuracy of the observed correlations. Lastly, the absence of cognitive-related biological indicators, such as neuroimaging or biomarkers, and the limited scope of cognitive status measures used in our study may have restricted our comprehensive assessment of cognitive reserve and its association with cognitive function.

In conclusion, this study provided confirmatory evidence of the longitudinal stability and validity of proxy indicators of cognitive reserve in low-educated rural older adults and indicated that cognitive reserve factors correlate with cognitive performance. Our results highlight the importance of proxy variables for late-life CR throughout the lifespan in preserving cognitive function. They play a crucial role in promoting healthy aging among rural Chinese older adults.

Data availability

The datasets that support the findings of this study are available on request from the corresponding author (Jingyuan Yang, e-mail: [email protected]). The data is not publicly available due to privacy or ethical restrictions.

Beard JR, Officer A, de Carvalho IA, Sadana R, Pot AM, Michel JP, Lloyd-Sherlock P, Epping-Jordan JE, Peeters G, Mahanani WR, Thiyagarajan JA, Chatterji S. The world report on ageing and health: a policy framework for healthy ageing. Lancet. 2016;387(10033):2145–54. https://doi.org/10.1016/S0140-6736(15)00516-4 .

Article   PubMed   Google Scholar  

Jia J, Zuo X, Jia XF, Chu C, Wu L, Zhou A, Wei C, Tang Y, Li D, Qin W, Song H, Ma Q, Li J, Sun Y, Min B, Xue S, Xu E, Yuan Q, Wang M, Huang X, Fan C, Liu J, Ren Y, Jia Q, Wang Q, Jiao L, Xing Y, Wu X, China C, Aging Study G. (2016) Diagnosis and treatment of dementia in neurology outpatient departments of general hospitals in China. Alzheimers Dement 12 (4):446–53. https://doi.org/10.1016/j.jalz.2015.06.1892 .

Nichols E, Szoeke CEI, Vollset SE, Abbasi N, Abd-Allah F, Abdela J, Aichour MTE, Akinyemi RO, Alahdab F, Asgedom SW, Awasthi A, Barker-Collo SL, Baune BT, Béjot Y, Belachew AB, Bennett DA, Biadgo B, Bijani A, Bin Sayeed MS, Brayne C, Carpenter DO, Carvalho F, Catalá-López F, Cerin E, Choi J-YJ, Dang AK, Degefa MG, Djalalinia S, Dubey M, Duken EE, Edvardsson D, Endres M, Eskandarieh S, Faro A, Farzadfar F, Fereshtehnejad S-M, Fernandes E, Filip I, Fischer F, Gebre AK, Geremew D, Ghasemi-Kasman M, Gnedovskaya EV, Gupta R, Hachinski V, Hagos TB, Hamidi S, Hankey GJ, Haro JM, Hay SI, Irvani SSN, Jha RP, Jonas JB, Kalani R, Karch A, Kasaeian A, Khader YS, Khalil IA, Khan EA, Khanna T, Khoja TAM, Khubchandani J, Kisa A, Kissimova-Skarbek K, Kivimäki M, Koyanagi A, Krohn KJ, Logroscino G, Lorkowski S, Majdan M, Malekzadeh R, März W, Massano J, Mengistu G, Meretoja A, Mohammadi M, Mohammadi-Khanaposhtani M, Mokdad AH, Mondello S, Moradi G, Nagel G, Naghavi M, Naik G, Nguyen LH, Nguyen TH, Nirayo YL, Nixon MR, Ofori-Asenso R, Ogbo FA, Olagunju AT, Owolabi MO, Panda-Jonas S, Passos VMA, Pereira DM, Pinilla-Monsalve GD, Piradov MA, Pond CD, Poustchi H, Qorbani M, Radfar A, Reiner RC, Robinson SR, Roshandel G, Rostami A, Russ TC, Sachdev PS, Safari H, Safiri S, Sahathevan R, Salimi Y, Satpathy M, Sawhney M, Saylan M, Sepanlou SG, Shafieesabet A, Shaikh MA, Sahraian MA, Shigematsu M, Shiri R, Shiue I, Silva JP, Smith M, Sobhani S, Stein DJ, Tabarés-Seisdedos R, Tovani-Palone MR, Tran BX, Tran TT, Tsegay AT, Ullah I, Venketasubramanian N, Vlassov V, Wang Y-P, Weiss J, Westerman R, Wijeratne T, Wyper GMA, Yano Y, Yimer EM, Yonemoto N, Yousefifard M, Zaidi Z, Zare Z, Vos T, Feigin VL, Murray CJL. Global, regional, and national burden of alzheimer’s disease and other dementias, 1990–2016. Lancet Neurol. 2019;18(1):88–106. https://doi.org/10.1016/S1474-4422(18)30403-4 . A systematic analysis for the global burden of disease study 2016.

Article   Google Scholar  

Chen X, Giles J, Yao Y, Yip W, Meng Q, Berkman L, Chen H, Chen X, Feng J, Feng Z, Glinskaya E, Gong J, Hu P, Kan H, Lei X, Liu X, Steptoe A, Wang G, Wang H, Wang H, Wang X, Wang Y, Yang L, Zhang L, Zhang Q, Wu J, Wu Z, Strauss J, Smith J, Zhao Y. The path to healthy ageing in China: a peking university-lancet commission. Lancet. 2022;400(10367):1967–2006. https://doi.org/10.1016/S0140-6736(22)01546-X .

Article   PubMed   PubMed Central   Google Scholar  

Stern Y. What is cognitive reserve? Theory and research application of the reserve concept. J Int Neuropsychol Soc. 2002;8(3):448–60.

Stern Y. Cognitive reserve in ageing and alzheimer’s disease. Lancet Neurol. 2012;11(11):1006–12. https://doi.org/10.1016/S1474-4422(12)70191-6 .

Mondini S, Madella I, Zangrossi A, Bigolin A, Tomasi C, Michieletto M, Villani D, Di Giovanni G, Mapelli D. Cognitive reserve in dementia: implications for cognitive training. Front Aging Neurosci. 2016;8:84. https://doi.org/10.3389/fnagi.2016.00084 .

Jones RN, Manly J, Glymour MM, Rentz DM, Jefferson AL, Stern Y. Conceptual and measurement challenges in research on cognitive reserve. J Int Neuropsychol Soc. 2011;17(4):593–601. https://doi.org/10.1017/S1355617710001748 .

Xu H, Yang R, Qi X, Dintica C, Song R, Bennett DA, Xu W. Association of lifespan cognitive reserve indicator with dementia risk in the presence of brain pathologies. JAMA Neurol. 2019;76(10):1184–91. https://doi.org/10.1001/jamaneurol.2019.2455 .

Ye Q, Zhu H, Chen H, Liu R, Huang L, Chen H, Cheng Y, Qin R, Shao P, Xu H, Ma J, Xu Y. Effects of cognitive reserve proxies on cognitive function and frontoparietal control network in subjects with white matter hyperintensities: a cross-sectional functional magnetic resonance imaging study. CNS Neurosci Ther. 2022;28(6):932–41. https://doi.org/10.1111/cns.13824 .

Suemoto CK, Bertola L, Grinberg LT, Leite REP, Rodriguez RD, Santana PH, Pasqualucci CA, Jacob-Filho W, Nitrini R. Education, but not occupation, is associated with cognitive impairment: the role of cognitive reserve in a sample from a low-to-middle-income country. Alzheimers Dement. 2022;18(11):2079–87. https://doi.org/10.1002/alz.12542 .

Dekhtyar S, Wang HX, Scott K, Goodman A, Koupil I, Herlitz A. A life-course study of cognitive reserve in dementia–from childhood to old age. Am J Geriatr Psychiatry. 2015;23(9):885–96. https://doi.org/10.1016/j.jagp.2015.02.002 .

Ihle A, Oris M, Baeriswyl M, Zuber S, Cullati S, Maurer J, Kliegel M. The longitudinal relation between social reserve and smaller subsequent decline in executive functioning in old age is mediated via cognitive reserve. Int Psychogeriatr. 2021;33(5):461–7. https://doi.org/10.1017/S1041610219001789 .

Kartschmit N, Mikolajczyk R, Schubert T, Lacruz ME. Measuring cognitive reserve (cr) - a systematic review of measurement properties of cr questionnaires for the adult population. PLoS ONE. 2019;14(8):e0219851. https://doi.org/10.1371/journal.pone.0219851 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Nogueira J, Gerardo B, Santana I, Simoes MR, Freitas S. The assessment of cognitive reserve: a systematic review of the most used quantitative measurement methods of cognitive reserve for aging. Front Psychol. 2022;13:847186. https://doi.org/10.3389/fpsyg.2022.847186 .

Summers MJ, Thow ME, Ward DD, Saunders NL, Klekociuk SZ, Imlach AR, Summers JJ, Vickers JC. Validation of a dynamic measure of current cognitive reserve in a longitudinally assessed sample of healthy older adults: the tasmanian healthy brain project. Assessment. 2019;26(4):737–42. https://doi.org/10.1177/1073191116685806 .

Millsap RE. Statistical approaches to measurement invariance. Routledge; 2012.

Wicherts JM. The importance of measurement invariance in neurocognitive ability testing. Clin Neuropsychol. 2016;30(7):1006–16. https://doi.org/10.1080/13854046.2016.1205136 .

Darwish H, Farran N, Assaad S, Chaaya M. Cognitive reserve factors in a developing country: education and occupational attainment lower the risk of dementia in a sample of Lebanese older adults. Front Aging Neurosci. 2018;10:277. https://doi.org/10.3389/fnagi.2018.00277 .

Chen X, Xue B, Hu Y. Cognitive reserve over life course and 7-year trajectories of cognitive decline: results from China health and retirement longitudinal study. BMC Public Health. 2022;22(1):231. https://doi.org/10.1186/s12889-022-12671-6 .

Xiao S-Y. The theoretical basis and research application of social support rating scale. J Clin Psychiatry. 1994;4(2):98–100.

Google Scholar  

Zheng Z. Twenty years’ follow-up on elder people’s health and quality of life. China Popul Dev Stud. 2020;3(4):297–309. https://doi.org/10.1007/s42379-020-00045-7 .

Wang L, Li S, Wei L, Ren B, Zhao M. The effects of exercise interventions on mental health in Chinese older adults. J Environ Public Health. 2022;2022(7265718). https://doi.org/10.1155/2022/7265718 .

Katzman R, Zhang MY, Ouang Ya Q, Wang ZY, Liu WT, Yu E, Wong SC, Salmon DP, Grant I. A Chinese version of the mini-mental state examination; impact of illiteracy in a Shanghai dementia survey. J Clin Epidemiol. 1988;41(10):971–8. https://doi.org/10.1016/0895-4356(88)90034-0 .

Article   CAS   PubMed   Google Scholar  

Folstein MF, Folstein SE, McHugh PR. Mini-mental state. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975;12(3):189–98. https://doi.org/10.1016/0022-3956(75)90026-6 .

West SG, Taylor AB, Wu W. Model fit and model selection in structural equation modeling. Handb Struct Equation Model. 2012;1:209–31.

Hirschfeld G, Brachel R. (2014) Multiple-group confirmatory factor analysis in r–a tutorial in measurement invariance with continuous and ordinal indicators. Practical Assessment, Research and Evaluation 19:7.

Ramseyer GC. Testing the difference between dependent correlations using the fisher z. J Experimental Educ. 1979;47(4):307–10.

Putnick DL, Bornstein MH. Measurement invariance conventions and reporting: the state of the art and future directions for psychological research. Dev Rev. 2016;41:71–90. https://doi.org/10.1016/j.dr.2016.06.004 .

Bertola L, Bensenor IM, Gross AL, Caramelli P, Barreto SM, Moreno AB, Griep RH, Viana MC, Lotufo PA, Suemoto CK. Longitudinal measurement invariance of neuropsychological tests in a diverse sample from the elsa-brasil study. Braz J Psychiatry. 2021;43(3):254–61. https://doi.org/10.1590/1516-4446-2020-0978 .

Li X, Song R, Qi X, Xu H, Yang W, Kivipelto M, Bennett DA, Xu W. Influence of cognitive reserve on cognitive trajectories: role of brain pathologies. Neurology. 2021;97(17):e1695–706. https://doi.org/10.1212/WNL.0000000000012728 .

Peeters G, Kenny RA, Lawlor B. Late life education and cognitive function in older adults. Int J Geriatr Psychiatry. 2020;35(6):633–9. https://doi.org/10.1002/gps.5281 .

Anaturk M, Kaufmann T, Cole JH, Suri S, Griffanti L, Zsoldos E, Filippini N, Singh-Manoux A, Kivimaki M, Westlye LT, Ebmeier KP, de Lange AG. Prediction of brain age and cognitive age: quantifying brain and cognitive maintenance in aging. Hum Brain Mapp. 2021;42(6):1626–40. https://doi.org/10.1002/hbm.25316 .

Montine TJ, Cholerton BA, Corrada MM, Edland SD, Flanagan ME, Hemmy LS, Kawas CH, White LR. Concepts for brain aging: resistance, resilience, reserve, and compensation. Alzheimers Res Ther. 2019;11(1):22. https://doi.org/10.1186/s13195-019-0479-y .

Cabeza R, Albert M, Belleville S, Craik FIM, Duarte A, Grady CL, Lindenberger U, Nyberg L, Park DC, Reuter-Lorenz PA, Rugg MD, Steffener J, Rajah MN. Maintenance, reserve and compensation: the cognitive neuroscience of healthy ageing. Nat Rev Neurosci. 2018;19(11):701–10. https://doi.org/10.1038/s41583-018-0068-2 .

Lee DH, Seo SW, Roh JH, Oh M, Oh JS, Oh SJ, Kim JS, Jeong Y. Effects of cognitive reserve in alzheimer’s disease and cognitively unimpaired individuals. Front Aging Neurosci. 2021;13:784054. https://doi.org/10.3389/fnagi.2021.784054 .

Fu C, Li Z, Mao Z. Association between social activities and cognitive function among the elderly in China: a cross-sectional study. Int J Environ Res Public Health. 2018;15(2). https://doi.org/10.3390/ijerph15020231 .

Han X, Wang X, Wang C, Wang P, Han X, Zhao M, Han Q, Jiang Z, Mao M, Chen S, Welmer AK, Launer LJ, Wang Y, Du Y, Qiu C. Accelerometer-assessed sedentary behaviour among Chinese rural older adults: patterns and associations with physical function. J Sports Sci. 2022;40(17):1940–9. https://doi.org/10.1080/02640414.2022.2122321 .

Download references

Acknowledgements

The authors would like to acknowledge the efforts of the participants who voluntarily gave their time to participate in the study.

The study was supported by the National Natural Science Foundation of China (Grant No. 81860598).

Author information

Hao Chen and Jin Hu contributed equally to this work.

Authors and Affiliations

Department of Epidemiology and Health Statistics, School of Public Health, The Key Laboratory of Environmental Pollution Monitoring and Disease Control, Guizhou Medical University, Guiyang, China

Hao Chen, Jin Hu, Shiqi Gui, Qiushuo Li, Jing Wang & Jingyuan Yang

School of Medicine and Health Management, Guizhou Medical University, Guiyang, China

The Third People’s Hospital of Guizhou Province, Guiyang, China

You can also search for this author in PubMed   Google Scholar

Contributions

HC conceived and wrote the original draft. JH, SG, QL, XY and JW took responsibility for data collection. HC and JH conducted the statistical analysis. JY revised the paper. All authors contributed to the final version of the paper and have read and approved the final manuscript.

Corresponding author

Correspondence to Jingyuan Yang .

Ethics declarations

Ethics approval and consent to participate.

Written informed consent was obtained from each participant before any study procedure was initiated, and the collection of data on human subjects was approved by the medical ethics committee of Guizhou Medical University (approval No. 2018-092). All methods in this study were performed in accordance with the guidelines of the Declaration of Helsinki.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Chen, H., Hu, J., Gui, S. et al. Longitudinal validation of cognitive reserve proxy measures: a cohort study in a rural Chinese community. Alz Res Therapy 16 , 87 (2024). https://doi.org/10.1186/s13195-024-01451-6

Download citation

Received : 03 January 2024

Accepted : 04 April 2024

Published : 23 April 2024

DOI : https://doi.org/10.1186/s13195-024-01451-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Confirmatory factor analyses
  • Measurement invariance

Alzheimer's Research & Therapy

ISSN: 1758-9193

validity formula in research

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

  • Machine learning
  • Social justice
  • Black holes
  • Classes and programs

Departments

  • Aeronautics and Astronautics
  • Brain and Cognitive Sciences
  • Architecture
  • Political Science
  • Mechanical Engineering

Centers, Labs, & Programs

  • Abdul Latif Jameel Poverty Action Lab (J-PAL)
  • Picower Institute for Learning and Memory
  • Lincoln Laboratory
  • School of Architecture + Planning
  • School of Engineering
  • School of Humanities, Arts, and Social Sciences
  • Sloan School of Management
  • School of Science
  • MIT Schwarzman College of Computing

Bringing an investigator’s eye to complex social challenges

Press contact :.

Anna Russo sits in a red armchair with her legs crossed, smiling at the camera

Previous image Next image

Anna Russo likes puzzles. They require patience, organization, and a view of the big picture. She brings an investigator’s eye to big institutional and societal challenges whose solutions can have wide-ranging, long-term impacts.

Russo’s path to MIT began with questions. She didn’t have the whole picture yet. “I had no idea what I wanted to do with my life,” says Russo, who is completing her PhD in economics in 2024. “I was good at math and science and thought I wanted to be a doctor.”

While completing her undergraduate studies at Yale University, where she double majored in economics and applied math, Russo discovered a passion for problem-solving, where she could apply an analytical lens to answering the kinds of thorny questions whose solutions could improve policy. “Empirical research is fun and exciting,” Russo says.

After Yale, Russo considered what to do next. She worked as a full-time research assistant with MIT economist Amy Finkelstein . Russo’s work with Finkelstein led her toward identifying, studying, and developing answers to complex questions. 

“My research combines ideas from two fields of economic inquiry — public finance and industrial organization — and applies them to questions about the design of environmental and health care policy,” Russo says. “I like the way economists think analytically about social problems.”

Narrowing her focus

Studying with and being advised by renowned economists as both an undergraduate and a doctoral student helped Russo narrow her research focus, fitting more pieces into the puzzle. “What drew me to MIT was its investment in its graduate students,” Russo says.

Economic research meant digging into policy questions, identifying market failures, and proposing solutions. Doctoral study allowed Russo to assemble data to rigorously follow each line of inquiry.

“Doctoral study means you get to write about something you’re really interested in,” Russo notes. This led her to study policy responses to climate change adaptation and mitigation. 

“In my first year, I worked on a project exploring the notion that floodplain regulation design doesn’t do a good job of incentivizing the right level of development in flood-prone areas,” she says. “How can economists help governments convince people to act in society’s best interest?”

It’s important to understand institutional details, Russo adds, which can help investigators identify and implement solutions. 

“Feedback, advice, and support from faculty were crucial as I grew as a researcher at MIT,” she says. Beyond her two main MIT advisors, Finkelstein and economist Nikhil Agarwal — educators she describes as “phenomenal, dedicated advisors and mentors” — Russo interacted regularly with faculty across the department. 

Russo later discovered another challenge she hoped to solve: inefficiencies in conservation and carbon offset programs. She set her sights on the United States Department of Agriculture’s Conservation Reserve Program because she believes it and programs like it can be improved. 

The CRP is a land conservation plan administered by USDA’s Farm Service Agency. In exchange for a yearly rental payment, farmers enrolled in the program agree to remove environmentally sensitive land from agricultural production and plant species that will improve environmental health and quality.

“I think we can tweak the program’s design to improve cost-effectiveness,” Russo says. “There’s a trove of data available.” The data include information like auction participants’ bids in response to well-specified auction rules, which Russo links to satellite data measuring land use outcomes. Understanding how landowners bid in CRP auctions can help identify and improve the program’s function. 

“We may be able to improve targeting and achieve more cost-effective conservation by adjusting the CRP’s scoring system,” Russo argues. Opportunities may exist to scale the incremental changes under study for other conservation programs and carbon offset markets more generally.  

Economics, Russo believes, can help us conceptualize problems and recommend effective alternative solutions.

The next puzzle

Russo wants to find her next challenge while continuing her research. She plans to continue her work as a junior fellow at the Harvard Society of Fellows, after which she’ll join the Harvard Department of Economics as an assistant professor. Russo also plans to continue helping other budding economists since she believes in the importance of supporting other students.   

Russo’s advisors are some of her biggest supporters. 

Finklestein emphasizes Russo’s curiosity, enthusiasm, and energy as key drivers in her success. “Her genuine curiosity and interest in getting to the bottom of a problem with the data — with an econometric analysis, with a modeling issue — is the best antidote for [the stress that can be associated with research],” Finklestein says. “It's a key ingredient in her ability to produce important and credible work.”

“She's also incredibly generous with her time and advice,” Finklestein continues, “whether it's helping an undergraduate research assistant with her senior thesis, or helping an advisor such as myself navigate a data access process she's previously been through.”

“Instead of an advisor-advisee relationship, working with her on a thesis felt more like a collaboration between equals,” Agarwal adds. “[She] has the maturity and smarts to produce pathbreaking research.

“Doctoral study is an opportunity for students to find their paths collaboratively,” Russo says. “If I can help someone else solve a small piece of their puzzle, that’s a huge positive. Research is a series of many, many small steps forward.” 

Identifying important causes for further investigation and study will always be important to Russo. “I also want to dig into some other market that’s not working well and figure out how to make it better,” she says. “Right now I’m really excited about understanding California wildfire mitigation.” 

Puzzles are made to be solved, after all.

Share this news article on:

Related links.

  • Video: "MIT SHASS Student Profiles: Economics PhD Student Anna Russo"
  • Amy Finklestein
  • Nikhil Agarwal
  • Department of Economics

Related Topics

  • Graduate, postdoctoral
  • Climate change
  • Agriculture
  • Health care
  • School of Humanities Arts and Social Sciences

Related Articles

“We’ve Got You Covered,” a book co-authored by MIT economist Amy Finkelstein, describes a way to revamp health care in the United States.

A new vision for US health care

Nikhil Agarwal

Optimizing kidney donation and other markets without money

Amy Finkelstein

Measuring health care

Previous item Next item

More MIT News

Photos of Roger Levy, Tracy Slatyer, and Martin Wainwright

Three from MIT awarded 2024 Guggenheim Fellowships

Read full story →

Carlos Prieto sits, playing cello, in a well-lit room

A musical life: Carlos Prieto ’59 in conversation and concert

Side-by-side headshots of Riyam Al-Msari and Francisca Vasconcelos

Two from MIT awarded 2024 Paul and Daisy Soros Fellowships for New Americans

Cartoon images of people connected by networks, depicts a team working remotely on a project.

MIT Emerging Talent opens pathways for underserved global learners

Two students push the tubular steel Motorsports car into Lobby 13 while a third sits in the car and steers

The MIT Edgerton Center’s third annual showcase dazzles onlookers

Lydia Bourouiba stands near a full bookshelf and chalk board.

3 Questions: A shared vocabulary for how infectious diseases spread

  • More news on MIT News homepage →

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

  • Map (opens in new window)
  • Events (opens in new window)
  • People (opens in new window)
  • Careers (opens in new window)
  • Accessibility
  • Social Media Hub
  • MIT on Facebook
  • MIT on YouTube
  • MIT on Instagram
  • U.S. Department of Health & Human Services

National Institutes of Health (NIH) - Turning Discovery into Health

  • Virtual Tour
  • Staff Directory
  • En Español

You are here

News releases.

News Release

Wednesday, April 24, 2024

Gene-based therapy restores cellular development and function in brain cells from people with Timothy syndrome

NIH-supported study shows potential treatment pathway for neurodevelopmental disorder.

In a proof-of-concept study, researchers demonstrated the effectiveness of a potential new therapy for Timothy syndrome , an often life-threatening and rare genetic disorder that affects a wide range of bodily systems, leading to severe cardiac, neurological, and psychiatric symptoms as well as physical differences such as webbed fingers and toes. The treatment restored typical cellular function in 3D structures created from cells of people with Timothy syndrome, known as organoids, which can mimic the function of cells in the body. These results could serve as the foundation for new treatment approaches for the disorder. The study, supported by the National Institutes of Health (NIH), appears in the journal Nature .

"Not only do these findings offer a potential road map to treat Timothy syndrome, but research into this condition also offers broader insights into other rare genetic conditions and mental disorders," said Joshua A. Gordon, M.D., Ph.D., director of the National Institute of Mental Health, part of NIH.

Sergiu Pasca, M.D., and colleagues at Stanford University, Stanford, California, collected cells from three people with Timothy syndrome and three people without Timothy syndrome and examined a specific region of a gene known as CACNA1C that harbors a mutation that causes Timothy syndrome. They tested whether they could use small pieces of genetic material that bind to gene products and promote the production of a protein not carrying the mutation, known as antisense oligonucleotides (ASOs), to restore cellular deficits underlying the syndrome.

In the lab, researchers applied the ASOs to human brain tissue structures grown from human cells, known as organoids, and tissue structures formed through the integration of multiple cell types, known as assembloids. They also analyzed organoids transplanted into the brains of rats. All of the methods were created using cells from people with Timothy syndrome. Applying the ASOs restored normal functioning in the cells, and the therapy's effects were dose-dependent and lasted at least 90 days.

"Our study showed that we can correct cellular deficits associated with Timothy syndrome," said Dr. Pasca. "We are now actively working towards translating these findings into the clinic, bringing hope that one day we may have an effective treatment for this devastating neurodevelopmental disorder.

The genetic mutation that causes Timothy syndrome affects the exon 8A region of the CACNA1C gene. The gene contains instructions for controlling calcium channels—pores in the cell critical for cellular communication. The CACNA1C gene in humans also contains another region (exon 8) that controls calcium channels but is not impacted in Timothy syndrome type 1. The ASOs tested in this study decreased the use of the mutated exon 8A and increased reliance on the nonaffected exon 8, restoring normal calcium channel functioning.

Grants: MH115012 , MH119319

About the National Institute of Mental Health (NIMH):  The mission of the NIMH is to transform the understanding and treatment of mental illnesses through basic and clinical research, paving the way for prevention, recovery and cure. For more information, visit the  NIMH website .

About the National Institutes of Health (NIH): NIH, the nation's medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes, treatments, and cures for both common and rare diseases. For more information about NIH and its programs, visit www.nih.gov .

NIH…Turning Discovery Into Health ®

Connect with Us

  • More Social Media from NIH

IMAGES

  1. Validity

    validity formula in research

  2. Validity and Reliability in Research- Types and Differences 2024

    validity formula in research

  3. 28.9 Validity and hypothesis testing

    validity formula in research

  4. School essay: Components of valid research

    validity formula in research

  5. Validity In Psychology Research: Types & Examples

    validity formula in research

  6. How to determine validity for quantitative research?

    validity formula in research

VIDEO

  1. Validity and Reliability in Research

  2. A Guide to the Kuder-Richardson Reliability Test

  3. Validation Of Research Instruments

  4. WKB Method, Connection Formula, Validity of WKB Method

  5. The Evolution and Future of Formula 1 Racing

  6. Validity vs Reliability || Research ||

COMMENTS

  1. Validity

    Internal Validity (Causal Inference): Example 1: In an experiment, a researcher manipulates the independent variable (e.g., a new drug) and controls for other variables to ensure that any observed effects on the dependent variable (e.g., symptom reduction) are indeed due to the manipulation.

  2. The 4 Types of Validity in Research

    The 4 Types of Validity in Research | Definitions & Examples. Published on September 6, 2019 by Fiona Middleton.Revised on June 22, 2023. Validity tells you how accurately a method measures something. If a method measures what it claims to measure, and the results closely correspond to real-world values, then it can be considered valid.

  3. How do we assess reliability and validity?

    The criterion-related validity focus on the degree to which it correlates with some chosen criterion measure of the same construct (relations to other variables). There are two broad classes of this validity form. Predictive validity: if the test information is to be used to forecast future criterion performance.; Example: Use spelling test scores to predict reading test scores, the validity ...

  4. Reliability vs. Validity in Research

    Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.opt. It's important to consider reliability and validity when you are creating your research design, planning your methods, and writing up your results, especially in quantitative research. Failing to do so can lead to several types of research ...

  5. Validity in Research and Psychology: Types & Examples

    In this vein, there are many different types of validity and ways of thinking about it. Let's take a look at several of the more common types. Each kind is a line of evidence that can help support or refute a test's overall validity. In this post, learn about face, content, criterion, discriminant, concurrent, predictive, and construct ...

  6. Validity In Psychology Research: Types & Examples

    In psychology research, validity refers to the extent to which a test or measurement tool accurately measures what it's intended to measure. It ensures that the research findings are genuine and not due to extraneous factors. Validity can be categorized into different types, including construct validity (measuring the intended abstract trait), internal validity (ensuring causal conclusions ...

  7. Validity (statistics)

    Validity is the main extent to which a concept, conclusion, or measurement is well-founded and likely corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool (for example, a test in education) is the degree to which the tool measures what it claims to measure. Validity is based on the strength of a ...

  8. Construct Validity

    Construct Validity | Definition, Types, & Examples. Published on February 17, 2022 by Pritha Bhandari.Revised on June 22, 2023. Construct validity is about how well a test measures the concept it was designed to evaluate. It's crucial to establishing the overall validity of a method.. Assessing construct validity is especially important when you're researching something that can't be ...

  9. Reliability vs Validity in Research

    Revised on 10 October 2022. Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method, technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure. It's important to consider reliability and validity when you are ...

  10. Validity

    Research validity in surveys relates to the extent at which the survey measures right elements that need to be measured. In simple terms, validity refers to how well an instrument as measures what it is intended to measure. Reliability alone is not enough, measures need to be reliable, as well as, valid. For example, if a weight measuring scale ...

  11. Reliability and Validity of Measurement

    The answer is that they conduct research using the measure to confirm that the scores make sense based on their understanding of the construct being measured. ... across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to. ...

  12. Validity & Reliability In Research

    In simple terms, validity (also called "construct validity") is all about whether a research instrument accurately measures what it's supposed to measure. For example, let's say you have a set of Likert scales that are supposed to quantify someone's level of overall job satisfaction. If this set of scales focused purely on only one ...

  13. PDF VALIDITY OF QUANTITATIVE RESEARCH

    Validity 1 VALIDITY OF QUANTITATIVE RESEARCH Recall "the basic aim of science is to explain natural phenomena. Such explanations are called theories" (Kerlinger, 1986, p. 8). Theories have varying degrees of truth. Validity is the best approximation to the truth or falsity of propositions (Cook & Campbell, 1979).

  14. Reliability and Validity

    Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid. Example: If you weigh yourself on a ...

  15. Reliability and Validity in Research: Definitions, Examples

    Reliability is a measure of the stability or consistency of test scores. You can also think of it as the ability for a test or research findings to be repeatable. For example, a medical thermometer is a reliable tool that would measure the correct temperature each time it is used. In the same way, a reliable math test will accurately measure ...

  16. What Is Content Validity?

    Revised on June 22, 2023. Content validity evaluates how well an instrument (like a test) covers all relevant parts of the construct it aims to measure. Here, a construct is a theoretical concept, theme, or idea: in particular, one that cannot usually be measured directly. Content validity is one of the four types of measurement validity.

  17. (PDF) Importance of Reliability and Validity in Research

    Reliability is used. in qualitative research and is the degree to which an assessment tool is free from errors, produces. consistent results, and is a necessary component of validity (Haradhan ...

  18. Design and Implementation Content Validity Study: Development of an

    Introduction. In most studies, researchers study complex constructs for which valid and reliable instruments are needed. 1 Validity, which is defined as the ability of an instrument to measure the properties of the construct under study, 2 is a vital factor in selecting or applying an instrument. It is determined as its three common forms including content, construct, and criterion-related ...

  19. Content Validity in Research: Definition & Examples

    Olivia Guy-Evans, MSc. Content validity is a type of criterion validity that demonstrates how well a measure covers the construct it is meant to represent. It is important for researchers to establish content validity in order to ensure that their study is measuring what it intends to measure. There are several ways to establish content ...

  20. (PDF) Validity and Reliability in Quantitative Research

    The validity and reliability of the scales used in research are important factors that enable the research to yield healthy results. For this reason, it is useful to understand how the reliability ...

  21. FORMULA SCORING AND VALIDITY

    validity from .60 to.57 would result from abandoning formula scoring. By (3), this reduction in validity is the same as would occur if the test length were halved ( K =0.48)! In this case, failure to use formula scoring produces a decrement in validity equivalent to throwing away one-halfof the test items and one-halfof the testing time, or

  22. Longitudinal validation of cognitive reserve proxy measures: a cohort

    Background While evidence supports cognitive reserve (CR) in preserving cognitive function, longitudinal validation of CR proxies, including later-life factors, remains scarce. This study aims to validate CR's stability over time and its relation to cognitive function in rural Chinese older adults. Methods Within the project on the health status of rural older adults (HSRO), the survey ...

  23. What Is Criterion Validity?

    Revised on June 22, 2023. Criterion validity (or criterion-related validity) evaluates how accurately a test measures the outcome it was designed to measure. An outcome can be a disease, behavior, or performance. Concurrent validity measures tests and criterion variables in the present, while predictive validity measures those in the future.

  24. Bringing an investigator's eye to complex social challenges

    "My research combines ideas from two fields of economic inquiry — public finance and industrial organization — and applies them to questions about the design of environmental and health care policy," Russo says. "I like the way economists think analytically about social problems." Narrowing her focus

  25. What Is Predictive Validity?

    Revised on June 22, 2023. Predictive validity refers to the ability of a test or other measurement to predict a future outcome. Here, an outcome can be a behavior, performance, or even disease that occurs at some point in the future. Example: Predictive validity. A pre-employment test has predictive validity when it can accurately identify the ...

  26. Gene-based therapy restores cellular development and function in brain

    The treatment restored typical cellular function in 3D structures created from cells of people with Timothy syndrome, known as organoids, which can mimic the function of cells in the body. ... but research into this condition also offers broader insights into other rare genetic conditions and mental disorders," said Joshua A. Gordon, M.D., Ph.D ...

  27. What Is Convergent Validity?

    Convergent validity refers to how closely a test is related to other tests that measure the same (or similar) constructs. Here, a construct is a behavior, attitude, or concept, particularly one that is not directly observable. Ideally, two tests measuring the same construct, such as stress, should have a moderate to high correlation.