Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is Quantitative Research? | Definition, Uses & Methods

What Is Quantitative Research? | Definition, Uses & Methods

Published on June 12, 2020 by Pritha Bhandari . Revised on June 22, 2023.

Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations.

Quantitative research is the opposite of qualitative research , which involves collecting and analyzing non-numerical data (e.g., text, video, or audio).

Quantitative research is widely used in the natural and social sciences: biology, chemistry, psychology, economics, sociology, marketing, etc.

  • What is the demographic makeup of Singapore in 2020?
  • How has the average temperature changed globally over the last century?
  • Does environmental pollution affect the prevalence of honey bees?
  • Does working from home increase productivity for people with long commutes?

Table of contents

Quantitative research methods, quantitative data analysis, advantages of quantitative research, disadvantages of quantitative research, other interesting articles, frequently asked questions about quantitative research.

You can use quantitative research methods for descriptive, correlational or experimental research.

  • In descriptive research , you simply seek an overall summary of your study variables.
  • In correlational research , you investigate relationships between your study variables.
  • In experimental research , you systematically examine whether there is a cause-and-effect relationship between variables.

Correlational and experimental research can both be used to formally test hypotheses , or predictions, using statistics. The results may be generalized to broader populations based on the sampling method used.

To collect quantitative data, you will often need to use operational definitions that translate abstract concepts (e.g., mood) into observable and quantifiable measures (e.g., self-ratings of feelings and energy levels).

Note that quantitative research is at risk for certain research biases , including information bias , omitted variable bias , sampling bias , or selection bias . Be sure that you’re aware of potential biases as you collect and analyze your data to prevent them from impacting your work too much.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

quantitative analysis and research

Once data is collected, you may need to process it before it can be analyzed. For example, survey and test data may need to be transformed from words to numbers. Then, you can use statistical analysis to answer your research questions .

Descriptive statistics will give you a summary of your data and include measures of averages and variability. You can also use graphs, scatter plots and frequency tables to visualize your data and check for any trends or outliers.

Using inferential statistics , you can make predictions or generalizations based on your data. You can test your hypothesis or use your sample data to estimate the population parameter .

First, you use descriptive statistics to get a summary of the data. You find the mean (average) and the mode (most frequent rating) of procrastination of the two groups, and plot the data to see if there are any outliers.

You can also assess the reliability and validity of your data collection methods to indicate how consistently and accurately your methods actually measured what you wanted them to.

Quantitative research is often used to standardize data collection and generalize findings . Strengths of this approach include:

  • Replication

Repeating the study is possible because of standardized data collection protocols and tangible definitions of abstract concepts.

  • Direct comparisons of results

The study can be reproduced in other cultural settings, times or with different groups of participants. Results can be compared statistically.

  • Large samples

Data from large samples can be processed and analyzed using reliable and consistent procedures through quantitative data analysis.

  • Hypothesis testing

Using formalized and established hypothesis testing procedures means that you have to carefully consider and report your research variables, predictions, data collection and testing methods before coming to a conclusion.

Despite the benefits of quantitative research, it is sometimes inadequate in explaining complex research topics. Its limitations include:

  • Superficiality

Using precise and restrictive operational definitions may inadequately represent complex concepts. For example, the concept of mood may be represented with just a number in quantitative research, but explained with elaboration in qualitative research.

  • Narrow focus

Predetermined variables and measurement procedures can mean that you ignore other relevant observations.

  • Structural bias

Despite standardized procedures, structural biases can still affect quantitative research. Missing data , imprecise measurements or inappropriate sampling methods are biases that can lead to the wrong conclusions.

  • Lack of context

Quantitative research often uses unnatural settings like laboratories or fails to consider historical and cultural contexts that may affect data collection and results.

Prevent plagiarism. Run a free check.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square goodness of fit test
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). What Is Quantitative Research? | Definition, Uses & Methods. Scribbr. Retrieved April 2, 2024, from https://www.scribbr.com/methodology/quantitative-research/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, descriptive statistics | definitions, types, examples, inferential statistics | an easy introduction & examples, unlimited academic ai-proofreading.

✔ Document error-free in 5minutes ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Quantitative Research – Methods, Types and Analysis

Quantitative Research – Methods, Types and Analysis

Table of Contents

What is Quantitative Research

Quantitative Research

Quantitative research is a type of research that collects and analyzes numerical data to test hypotheses and answer research questions . This research typically involves a large sample size and uses statistical analysis to make inferences about a population based on the data collected. It often involves the use of surveys, experiments, or other structured data collection methods to gather quantitative data.

Quantitative Research Methods

Quantitative Research Methods

Quantitative Research Methods are as follows:

Descriptive Research Design

Descriptive research design is used to describe the characteristics of a population or phenomenon being studied. This research method is used to answer the questions of what, where, when, and how. Descriptive research designs use a variety of methods such as observation, case studies, and surveys to collect data. The data is then analyzed using statistical tools to identify patterns and relationships.

Correlational Research Design

Correlational research design is used to investigate the relationship between two or more variables. Researchers use correlational research to determine whether a relationship exists between variables and to what extent they are related. This research method involves collecting data from a sample and analyzing it using statistical tools such as correlation coefficients.

Quasi-experimental Research Design

Quasi-experimental research design is used to investigate cause-and-effect relationships between variables. This research method is similar to experimental research design, but it lacks full control over the independent variable. Researchers use quasi-experimental research designs when it is not feasible or ethical to manipulate the independent variable.

Experimental Research Design

Experimental research design is used to investigate cause-and-effect relationships between variables. This research method involves manipulating the independent variable and observing the effects on the dependent variable. Researchers use experimental research designs to test hypotheses and establish cause-and-effect relationships.

Survey Research

Survey research involves collecting data from a sample of individuals using a standardized questionnaire. This research method is used to gather information on attitudes, beliefs, and behaviors of individuals. Researchers use survey research to collect data quickly and efficiently from a large sample size. Survey research can be conducted through various methods such as online, phone, mail, or in-person interviews.

Quantitative Research Analysis Methods

Here are some commonly used quantitative research analysis methods:

Statistical Analysis

Statistical analysis is the most common quantitative research analysis method. It involves using statistical tools and techniques to analyze the numerical data collected during the research process. Statistical analysis can be used to identify patterns, trends, and relationships between variables, and to test hypotheses and theories.

Regression Analysis

Regression analysis is a statistical technique used to analyze the relationship between one dependent variable and one or more independent variables. Researchers use regression analysis to identify and quantify the impact of independent variables on the dependent variable.

Factor Analysis

Factor analysis is a statistical technique used to identify underlying factors that explain the correlations among a set of variables. Researchers use factor analysis to reduce a large number of variables to a smaller set of factors that capture the most important information.

Structural Equation Modeling

Structural equation modeling is a statistical technique used to test complex relationships between variables. It involves specifying a model that includes both observed and unobserved variables, and then using statistical methods to test the fit of the model to the data.

Time Series Analysis

Time series analysis is a statistical technique used to analyze data that is collected over time. It involves identifying patterns and trends in the data, as well as any seasonal or cyclical variations.

Multilevel Modeling

Multilevel modeling is a statistical technique used to analyze data that is nested within multiple levels. For example, researchers might use multilevel modeling to analyze data that is collected from individuals who are nested within groups, such as students nested within schools.

Applications of Quantitative Research

Quantitative research has many applications across a wide range of fields. Here are some common examples:

  • Market Research : Quantitative research is used extensively in market research to understand consumer behavior, preferences, and trends. Researchers use surveys, experiments, and other quantitative methods to collect data that can inform marketing strategies, product development, and pricing decisions.
  • Health Research: Quantitative research is used in health research to study the effectiveness of medical treatments, identify risk factors for diseases, and track health outcomes over time. Researchers use statistical methods to analyze data from clinical trials, surveys, and other sources to inform medical practice and policy.
  • Social Science Research: Quantitative research is used in social science research to study human behavior, attitudes, and social structures. Researchers use surveys, experiments, and other quantitative methods to collect data that can inform social policies, educational programs, and community interventions.
  • Education Research: Quantitative research is used in education research to study the effectiveness of teaching methods, assess student learning outcomes, and identify factors that influence student success. Researchers use experimental and quasi-experimental designs, as well as surveys and other quantitative methods, to collect and analyze data.
  • Environmental Research: Quantitative research is used in environmental research to study the impact of human activities on the environment, assess the effectiveness of conservation strategies, and identify ways to reduce environmental risks. Researchers use statistical methods to analyze data from field studies, experiments, and other sources.

Characteristics of Quantitative Research

Here are some key characteristics of quantitative research:

  • Numerical data : Quantitative research involves collecting numerical data through standardized methods such as surveys, experiments, and observational studies. This data is analyzed using statistical methods to identify patterns and relationships.
  • Large sample size: Quantitative research often involves collecting data from a large sample of individuals or groups in order to increase the reliability and generalizability of the findings.
  • Objective approach: Quantitative research aims to be objective and impartial in its approach, focusing on the collection and analysis of data rather than personal beliefs, opinions, or experiences.
  • Control over variables: Quantitative research often involves manipulating variables to test hypotheses and establish cause-and-effect relationships. Researchers aim to control for extraneous variables that may impact the results.
  • Replicable : Quantitative research aims to be replicable, meaning that other researchers should be able to conduct similar studies and obtain similar results using the same methods.
  • Statistical analysis: Quantitative research involves using statistical tools and techniques to analyze the numerical data collected during the research process. Statistical analysis allows researchers to identify patterns, trends, and relationships between variables, and to test hypotheses and theories.
  • Generalizability: Quantitative research aims to produce findings that can be generalized to larger populations beyond the specific sample studied. This is achieved through the use of random sampling methods and statistical inference.

Examples of Quantitative Research

Here are some examples of quantitative research in different fields:

  • Market Research: A company conducts a survey of 1000 consumers to determine their brand awareness and preferences. The data is analyzed using statistical methods to identify trends and patterns that can inform marketing strategies.
  • Health Research : A researcher conducts a randomized controlled trial to test the effectiveness of a new drug for treating a particular medical condition. The study involves collecting data from a large sample of patients and analyzing the results using statistical methods.
  • Social Science Research : A sociologist conducts a survey of 500 people to study attitudes toward immigration in a particular country. The data is analyzed using statistical methods to identify factors that influence these attitudes.
  • Education Research: A researcher conducts an experiment to compare the effectiveness of two different teaching methods for improving student learning outcomes. The study involves randomly assigning students to different groups and collecting data on their performance on standardized tests.
  • Environmental Research : A team of researchers conduct a study to investigate the impact of climate change on the distribution and abundance of a particular species of plant or animal. The study involves collecting data on environmental factors and population sizes over time and analyzing the results using statistical methods.
  • Psychology : A researcher conducts a survey of 500 college students to investigate the relationship between social media use and mental health. The data is analyzed using statistical methods to identify correlations and potential causal relationships.
  • Political Science: A team of researchers conducts a study to investigate voter behavior during an election. They use survey methods to collect data on voting patterns, demographics, and political attitudes, and analyze the results using statistical methods.

How to Conduct Quantitative Research

Here is a general overview of how to conduct quantitative research:

  • Develop a research question: The first step in conducting quantitative research is to develop a clear and specific research question. This question should be based on a gap in existing knowledge, and should be answerable using quantitative methods.
  • Develop a research design: Once you have a research question, you will need to develop a research design. This involves deciding on the appropriate methods to collect data, such as surveys, experiments, or observational studies. You will also need to determine the appropriate sample size, data collection instruments, and data analysis techniques.
  • Collect data: The next step is to collect data. This may involve administering surveys or questionnaires, conducting experiments, or gathering data from existing sources. It is important to use standardized methods to ensure that the data is reliable and valid.
  • Analyze data : Once the data has been collected, it is time to analyze it. This involves using statistical methods to identify patterns, trends, and relationships between variables. Common statistical techniques include correlation analysis, regression analysis, and hypothesis testing.
  • Interpret results: After analyzing the data, you will need to interpret the results. This involves identifying the key findings, determining their significance, and drawing conclusions based on the data.
  • Communicate findings: Finally, you will need to communicate your findings. This may involve writing a research report, presenting at a conference, or publishing in a peer-reviewed journal. It is important to clearly communicate the research question, methods, results, and conclusions to ensure that others can understand and replicate your research.

When to use Quantitative Research

Here are some situations when quantitative research can be appropriate:

  • To test a hypothesis: Quantitative research is often used to test a hypothesis or a theory. It involves collecting numerical data and using statistical analysis to determine if the data supports or refutes the hypothesis.
  • To generalize findings: If you want to generalize the findings of your study to a larger population, quantitative research can be useful. This is because it allows you to collect numerical data from a representative sample of the population and use statistical analysis to make inferences about the population as a whole.
  • To measure relationships between variables: If you want to measure the relationship between two or more variables, such as the relationship between age and income, or between education level and job satisfaction, quantitative research can be useful. It allows you to collect numerical data on both variables and use statistical analysis to determine the strength and direction of the relationship.
  • To identify patterns or trends: Quantitative research can be useful for identifying patterns or trends in data. For example, you can use quantitative research to identify trends in consumer behavior or to identify patterns in stock market data.
  • To quantify attitudes or opinions : If you want to measure attitudes or opinions on a particular topic, quantitative research can be useful. It allows you to collect numerical data using surveys or questionnaires and analyze the data using statistical methods to determine the prevalence of certain attitudes or opinions.

Purpose of Quantitative Research

The purpose of quantitative research is to systematically investigate and measure the relationships between variables or phenomena using numerical data and statistical analysis. The main objectives of quantitative research include:

  • Description : To provide a detailed and accurate description of a particular phenomenon or population.
  • Explanation : To explain the reasons for the occurrence of a particular phenomenon, such as identifying the factors that influence a behavior or attitude.
  • Prediction : To predict future trends or behaviors based on past patterns and relationships between variables.
  • Control : To identify the best strategies for controlling or influencing a particular outcome or behavior.

Quantitative research is used in many different fields, including social sciences, business, engineering, and health sciences. It can be used to investigate a wide range of phenomena, from human behavior and attitudes to physical and biological processes. The purpose of quantitative research is to provide reliable and valid data that can be used to inform decision-making and improve understanding of the world around us.

Advantages of Quantitative Research

There are several advantages of quantitative research, including:

  • Objectivity : Quantitative research is based on objective data and statistical analysis, which reduces the potential for bias or subjectivity in the research process.
  • Reproducibility : Because quantitative research involves standardized methods and measurements, it is more likely to be reproducible and reliable.
  • Generalizability : Quantitative research allows for generalizations to be made about a population based on a representative sample, which can inform decision-making and policy development.
  • Precision : Quantitative research allows for precise measurement and analysis of data, which can provide a more accurate understanding of phenomena and relationships between variables.
  • Efficiency : Quantitative research can be conducted relatively quickly and efficiently, especially when compared to qualitative research, which may involve lengthy data collection and analysis.
  • Large sample sizes : Quantitative research can accommodate large sample sizes, which can increase the representativeness and generalizability of the results.

Limitations of Quantitative Research

There are several limitations of quantitative research, including:

  • Limited understanding of context: Quantitative research typically focuses on numerical data and statistical analysis, which may not provide a comprehensive understanding of the context or underlying factors that influence a phenomenon.
  • Simplification of complex phenomena: Quantitative research often involves simplifying complex phenomena into measurable variables, which may not capture the full complexity of the phenomenon being studied.
  • Potential for researcher bias: Although quantitative research aims to be objective, there is still the potential for researcher bias in areas such as sampling, data collection, and data analysis.
  • Limited ability to explore new ideas: Quantitative research is often based on pre-determined research questions and hypotheses, which may limit the ability to explore new ideas or unexpected findings.
  • Limited ability to capture subjective experiences : Quantitative research is typically focused on objective data and may not capture the subjective experiences of individuals or groups being studied.
  • Ethical concerns : Quantitative research may raise ethical concerns, such as invasion of privacy or the potential for harm to participants.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Questionnaire

Questionnaire – Definition, Types, and Examples

Case Study Research

Case Study – Methods, Examples and Guide

Observational Research

Observational Research – Methods and Guide

Qualitative Research Methods

Qualitative Research Methods

Explanatory Research

Explanatory Research – Types, Methods, Guide

Survey Research

Survey Research – Types, Methods, Examples

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • What Is Quantitative Research? | Definition & Methods

What Is Quantitative Research? | Definition & Methods

Published on 4 April 2022 by Pritha Bhandari . Revised on 10 October 2022.

Quantitative research is the process of collecting and analysing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalise results to wider populations.

Quantitative research is the opposite of qualitative research , which involves collecting and analysing non-numerical data (e.g. text, video, or audio).

Quantitative research is widely used in the natural and social sciences: biology, chemistry, psychology, economics, sociology, marketing, etc.

  • What is the demographic makeup of Singapore in 2020?
  • How has the average temperature changed globally over the last century?
  • Does environmental pollution affect the prevalence of honey bees?
  • Does working from home increase productivity for people with long commutes?

Table of contents

Quantitative research methods, quantitative data analysis, advantages of quantitative research, disadvantages of quantitative research, frequently asked questions about quantitative research.

You can use quantitative research methods for descriptive, correlational or experimental research.

  • In descriptive research , you simply seek an overall summary of your study variables.
  • In correlational research , you investigate relationships between your study variables.
  • In experimental research , you systematically examine whether there is a cause-and-effect relationship between variables.

Correlational and experimental research can both be used to formally test hypotheses , or predictions, using statistics. The results may be generalised to broader populations based on the sampling method used.

To collect quantitative data, you will often need to use operational definitions that translate abstract concepts (e.g., mood) into observable and quantifiable measures (e.g., self-ratings of feelings and energy levels).

Prevent plagiarism, run a free check.

Once data is collected, you may need to process it before it can be analysed. For example, survey and test data may need to be transformed from words to numbers. Then, you can use statistical analysis to answer your research questions .

Descriptive statistics will give you a summary of your data and include measures of averages and variability. You can also use graphs, scatter plots and frequency tables to visualise your data and check for any trends or outliers.

Using inferential statistics , you can make predictions or generalisations based on your data. You can test your hypothesis or use your sample data to estimate the population parameter .

You can also assess the reliability and validity of your data collection methods to indicate how consistently and accurately your methods actually measured what you wanted them to.

Quantitative research is often used to standardise data collection and generalise findings . Strengths of this approach include:

  • Replication

Repeating the study is possible because of standardised data collection protocols and tangible definitions of abstract concepts.

  • Direct comparisons of results

The study can be reproduced in other cultural settings, times or with different groups of participants. Results can be compared statistically.

  • Large samples

Data from large samples can be processed and analysed using reliable and consistent procedures through quantitative data analysis.

  • Hypothesis testing

Using formalised and established hypothesis testing procedures means that you have to carefully consider and report your research variables, predictions, data collection and testing methods before coming to a conclusion.

Despite the benefits of quantitative research, it is sometimes inadequate in explaining complex research topics. Its limitations include:

  • Superficiality

Using precise and restrictive operational definitions may inadequately represent complex concepts. For example, the concept of mood may be represented with just a number in quantitative research, but explained with elaboration in qualitative research.

  • Narrow focus

Predetermined variables and measurement procedures can mean that you ignore other relevant observations.

  • Structural bias

Despite standardised procedures, structural biases can still affect quantitative research. Missing data , imprecise measurements or inappropriate sampling methods are biases that can lead to the wrong conclusions.

  • Lack of context

Quantitative research often uses unnatural settings like laboratories or fails to consider historical and cultural contexts that may affect data collection and results.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to test a hypothesis by systematically collecting and analysing data, while qualitative methods allow you to explore ideas and experiences in depth.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organisations.

Operationalisation means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioural avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalise the variables that you want to measure.

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research , you also have to consider the internal and external validity of your experiment.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, October 10). What Is Quantitative Research? | Definition & Methods. Scribbr. Retrieved 2 April 2024, from https://www.scribbr.co.uk/research-methods/introduction-to-quantitative-research/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

News alert: UC Berkeley has announced its next university librarian

Secondary menu

  • Log in to your Library account
  • Hours and Maps
  • Connect from Off Campus
  • UC Berkeley Home

Search form

Research methods--quantitative, qualitative, and more: overview.

  • Quantitative Research
  • Qualitative Research
  • Data Science Methods (Machine Learning, AI, Big Data)
  • Text Mining and Computational Text Analysis
  • Evidence Synthesis/Systematic Reviews
  • Get Data, Get Help!

About Research Methods

This guide provides an overview of research methods, how to choose and use them, and supports and resources at UC Berkeley. 

As Patten and Newhart note in the book Understanding Research Methods , "Research methods are the building blocks of the scientific enterprise. They are the "how" for building systematic knowledge. The accumulation of knowledge through research is by its nature a collective endeavor. Each well-designed study provides evidence that may support, amend, refute, or deepen the understanding of existing knowledge...Decisions are important throughout the practice of research and are designed to help researchers collect evidence that includes the full spectrum of the phenomenon under study, to maintain logical rules, and to mitigate or account for possible sources of bias. In many ways, learning research methods is learning how to see and make these decisions."

The choice of methods varies by discipline, by the kind of phenomenon being studied and the data being used to study it, by the technology available, and more.  This guide is an introduction, but if you don't see what you need here, always contact your subject librarian, and/or take a look to see if there's a library research guide that will answer your question. 

Suggestions for changes and additions to this guide are welcome! 

START HERE: SAGE Research Methods

Without question, the most comprehensive resource available from the library is SAGE Research Methods.  HERE IS THE ONLINE GUIDE  to this one-stop shopping collection, and some helpful links are below:

  • SAGE Research Methods
  • Little Green Books  (Quantitative Methods)
  • Little Blue Books  (Qualitative Methods)
  • Dictionaries and Encyclopedias  
  • Case studies of real research projects
  • Sample datasets for hands-on practice
  • Streaming video--see methods come to life
  • Methodspace- -a community for researchers
  • SAGE Research Methods Course Mapping

Library Data Services at UC Berkeley

Library Data Services Program and Digital Scholarship Services

The LDSP offers a variety of services and tools !  From this link, check out pages for each of the following topics:  discovering data, managing data, collecting data, GIS data, text data mining, publishing data, digital scholarship, open science, and the Research Data Management Program.

Be sure also to check out the visual guide to where to seek assistance on campus with any research question you may have!

Library GIS Services

Other Data Services at Berkeley

D-Lab Supports Berkeley faculty, staff, and graduate students with research in data intensive social science, including a wide range of training and workshop offerings Dryad Dryad is a simple self-service tool for researchers to use in publishing their datasets. It provides tools for the effective publication of and access to research data. Geospatial Innovation Facility (GIF) Provides leadership and training across a broad array of integrated mapping technologies on campu Research Data Management A UC Berkeley guide and consulting service for research data management issues

General Research Methods Resources

Here are some general resources for assistance:

  • Assistance from ICPSR (must create an account to access): Getting Help with Data , and Resources for Students
  • Wiley Stats Ref for background information on statistics topics
  • Survey Documentation and Analysis (SDA) .  Program for easy web-based analysis of survey data.

Consultants

  • D-Lab/Data Science Discovery Consultants Request help with your research project from peer consultants.
  • Research data (RDM) consulting Meet with RDM consultants before designing the data security, storage, and sharing aspects of your qualitative project.
  • Statistics Department Consulting Services A service in which advanced graduate students, under faculty supervision, are available to consult during specified hours in the Fall and Spring semesters.

Related Resourcex

  • IRB / CPHS Qualitative research projects with human subjects often require that you go through an ethics review.
  • OURS (Office of Undergraduate Research and Scholarships) OURS supports undergraduates who want to embark on research projects and assistantships. In particular, check out their "Getting Started in Research" workshops
  • Sponsored Projects Sponsored projects works with researchers applying for major external grants.
  • Next: Quantitative Research >>
  • Last Updated: Apr 3, 2023 3:14 PM
  • URL: https://guides.lib.berkeley.edu/researchmethods

Banner Image

Quantitative and Qualitative Research

  • I NEED TO . . .

What is Quantitative Research?

  • What is Qualitative Research?
  • Quantitative vs Qualitative
  • Step 1: Accessing CINAHL
  • Step 2: Create a Keyword Search
  • Step 3: Create a Subject Heading Search
  • Step 4: Repeat Steps 1-3 for Second Concept
  • Step 5: Repeat Steps 1-3 for Quantitative Terms
  • Step 6: Combining All Searches
  • Step 7: Adding Limiters
  • Step 8: Save Your Search!
  • What Kind of Article is This?
  • More Research Help This link opens in a new window

Quantitative methodology is the dominant research framework in the social sciences. It refers to a set of strategies, techniques and assumptions used to study psychological, social and economic processes through the exploration of numeric patterns . Quantitative research gathers a range of numeric data. Some of the numeric data is intrinsically quantitative (e.g. personal income), while in other cases the numeric structure is  imposed (e.g. ‘On a scale from 1 to 10, how depressed did you feel last week?’). The collection of quantitative information allows researchers to conduct simple to extremely sophisticated statistical analyses that aggregate the data (e.g. averages, percentages), show relationships among the data (e.g. ‘Students with lower grade point averages tend to score lower on a depression scale’) or compare across aggregated data (e.g. the USA has a higher gross domestic product than Spain). Quantitative research includes methodologies such as questionnaires, structured observations or experiments and stands in contrast to qualitative research. Qualitative research involves the collection and analysis of narratives and/or open-ended observations through methodologies such as interviews, focus groups or ethnographies.

Coghlan, D., Brydon-Miller, M. (2014).  The SAGE encyclopedia of action research  (Vols. 1-2). London, : SAGE Publications Ltd doi: 10.4135/9781446294406

What is the purpose of quantitative research?

The purpose of quantitative research is to generate knowledge and create understanding about the social world. Quantitative research is used by social scientists, including communication researchers, to observe phenomena or occurrences affecting individuals. Social scientists are concerned with the study of people. Quantitative research is a way to learn about a particular group of people, known as a sample population. Using scientific inquiry, quantitative research relies on data that are observed or measured to examine questions about the sample population.

Allen, M. (2017).  The SAGE encyclopedia of communication research methods  (Vols. 1-4). Thousand Oaks, CA: SAGE Publications, Inc doi: 10.4135/9781483381411

How do I know if the study is a quantitative design?  What type of quantitative study is it?

Quantitative Research Designs: Descriptive non-experimental, Quasi-experimental or Experimental?

Studies do not always explicitly state what kind of research design is being used.  You will need to know how to decipher which design type is used.  The following video will help you determine the quantitative design type.

  • << Previous: I NEED TO . . .
  • Next: What is Qualitative Research? >>
  • Last Updated: Dec 8, 2023 10:05 PM
  • URL: https://libguides.uta.edu/quantitative_and_qualitative_research

University of Texas Arlington Libraries 702 Planetarium Place · Arlington, TX 76019 · 817-272-3000

  • Internet Privacy
  • Accessibility
  • Problems with a guide? Contact Us.

Grad Coach

Quantitative Data Analysis 101

The lingo, methods and techniques, explained simply.

By: Derek Jansen (MBA)  and Kerryn Warren (PhD) | December 2020

Quantitative data analysis is one of those things that often strikes fear in students. It’s totally understandable – quantitative analysis is a complex topic, full of daunting lingo , like medians, modes, correlation and regression. Suddenly we’re all wishing we’d paid a little more attention in math class…

The good news is that while quantitative data analysis is a mammoth topic, gaining a working understanding of the basics isn’t that hard , even for those of us who avoid numbers and math . In this post, we’ll break quantitative analysis down into simple , bite-sized chunks so you can approach your research with confidence.

Quantitative data analysis methods and techniques 101

Overview: Quantitative Data Analysis 101

  • What (exactly) is quantitative data analysis?
  • When to use quantitative analysis
  • How quantitative analysis works

The two “branches” of quantitative analysis

  • Descriptive statistics 101
  • Inferential statistics 101
  • How to choose the right quantitative methods
  • Recap & summary

What is quantitative data analysis?

Despite being a mouthful, quantitative data analysis simply means analysing data that is numbers-based – or data that can be easily “converted” into numbers without losing any meaning.

For example, category-based variables like gender, ethnicity, or native language could all be “converted” into numbers without losing meaning – for example, English could equal 1, French 2, etc.

This contrasts against qualitative data analysis, where the focus is on words, phrases and expressions that can’t be reduced to numbers. If you’re interested in learning about qualitative analysis, check out our post and video here .

What is quantitative analysis used for?

Quantitative analysis is generally used for three purposes.

  • Firstly, it’s used to measure differences between groups . For example, the popularity of different clothing colours or brands.
  • Secondly, it’s used to assess relationships between variables . For example, the relationship between weather temperature and voter turnout.
  • And third, it’s used to test hypotheses in a scientifically rigorous way. For example, a hypothesis about the impact of a certain vaccine.

Again, this contrasts with qualitative analysis , which can be used to analyse people’s perceptions and feelings about an event or situation. In other words, things that can’t be reduced to numbers.

How does quantitative analysis work?

Well, since quantitative data analysis is all about analysing numbers , it’s no surprise that it involves statistics . Statistical analysis methods form the engine that powers quantitative analysis, and these methods can vary from pretty basic calculations (for example, averages and medians) to more sophisticated analyses (for example, correlations and regressions).

Sounds like gibberish? Don’t worry. We’ll explain all of that in this post. Importantly, you don’t need to be a statistician or math wiz to pull off a good quantitative analysis. We’ll break down all the technical mumbo jumbo in this post.

Need a helping hand?

quantitative analysis and research

As I mentioned, quantitative analysis is powered by statistical analysis methods . There are two main “branches” of statistical methods that are used – descriptive statistics and inferential statistics . In your research, you might only use descriptive statistics, or you might use a mix of both , depending on what you’re trying to figure out. In other words, depending on your research questions, aims and objectives . I’ll explain how to choose your methods later.

So, what are descriptive and inferential statistics?

Well, before I can explain that, we need to take a quick detour to explain some lingo. To understand the difference between these two branches of statistics, you need to understand two important words. These words are population and sample .

First up, population . In statistics, the population is the entire group of people (or animals or organisations or whatever) that you’re interested in researching. For example, if you were interested in researching Tesla owners in the US, then the population would be all Tesla owners in the US.

However, it’s extremely unlikely that you’re going to be able to interview or survey every single Tesla owner in the US. Realistically, you’ll likely only get access to a few hundred, or maybe a few thousand owners using an online survey. This smaller group of accessible people whose data you actually collect is called your sample .

So, to recap – the population is the entire group of people you’re interested in, and the sample is the subset of the population that you can actually get access to. In other words, the population is the full chocolate cake , whereas the sample is a slice of that cake.

So, why is this sample-population thing important?

Well, descriptive statistics focus on describing the sample , while inferential statistics aim to make predictions about the population, based on the findings within the sample. In other words, we use one group of statistical methods – descriptive statistics – to investigate the slice of cake, and another group of methods – inferential statistics – to draw conclusions about the entire cake. There I go with the cake analogy again…

With that out the way, let’s take a closer look at each of these branches in more detail.

Descriptive statistics vs inferential statistics

Branch 1: Descriptive Statistics

Descriptive statistics serve a simple but critically important role in your research – to describe your data set – hence the name. In other words, they help you understand the details of your sample . Unlike inferential statistics (which we’ll get to soon), descriptive statistics don’t aim to make inferences or predictions about the entire population – they’re purely interested in the details of your specific sample .

When you’re writing up your analysis, descriptive statistics are the first set of stats you’ll cover, before moving on to inferential statistics. But, that said, depending on your research objectives and research questions , they may be the only type of statistics you use. We’ll explore that a little later.

So, what kind of statistics are usually covered in this section?

Some common statistical tests used in this branch include the following:

  • Mean – this is simply the mathematical average of a range of numbers.
  • Median – this is the midpoint in a range of numbers when the numbers are arranged in numerical order. If the data set makes up an odd number, then the median is the number right in the middle of the set. If the data set makes up an even number, then the median is the midpoint between the two middle numbers.
  • Mode – this is simply the most commonly occurring number in the data set.
  • In cases where most of the numbers are quite close to the average, the standard deviation will be relatively low.
  • Conversely, in cases where the numbers are scattered all over the place, the standard deviation will be relatively high.
  • Skewness . As the name suggests, skewness indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph, or do they skew to the left or right?

Feeling a bit confused? Let’s look at a practical example using a small data set.

Descriptive statistics example data

On the left-hand side is the data set. This details the bodyweight of a sample of 10 people. On the right-hand side, we have the descriptive statistics. Let’s take a look at each of them.

First, we can see that the mean weight is 72.4 kilograms. In other words, the average weight across the sample is 72.4 kilograms. Straightforward.

Next, we can see that the median is very similar to the mean (the average). This suggests that this data set has a reasonably symmetrical distribution (in other words, a relatively smooth, centred distribution of weights, clustered towards the centre).

In terms of the mode , there is no mode in this data set. This is because each number is present only once and so there cannot be a “most common number”. If there were two people who were both 65 kilograms, for example, then the mode would be 65.

Next up is the standard deviation . 10.6 indicates that there’s quite a wide spread of numbers. We can see this quite easily by looking at the numbers themselves, which range from 55 to 90, which is quite a stretch from the mean of 72.4.

And lastly, the skewness of -0.2 tells us that the data is very slightly negatively skewed. This makes sense since the mean and the median are slightly different.

As you can see, these descriptive statistics give us some useful insight into the data set. Of course, this is a very small data set (only 10 records), so we can’t read into these statistics too much. Also, keep in mind that this is not a list of all possible descriptive statistics – just the most common ones.

But why do all of these numbers matter?

While these descriptive statistics are all fairly basic, they’re important for a few reasons:

  • Firstly, they help you get both a macro and micro-level view of your data. In other words, they help you understand both the big picture and the finer details.
  • Secondly, they help you spot potential errors in the data – for example, if an average is way higher than you’d expect, or responses to a question are highly varied, this can act as a warning sign that you need to double-check the data.
  • And lastly, these descriptive statistics help inform which inferential statistical techniques you can use, as those techniques depend on the skewness (in other words, the symmetry and normality) of the data.

Simply put, descriptive statistics are really important , even though the statistical techniques used are fairly basic. All too often at Grad Coach, we see students skimming over the descriptives in their eagerness to get to the more exciting inferential methods, and then landing up with some very flawed results.

Don’t be a sucker – give your descriptive statistics the love and attention they deserve!

Examples of descriptive statistics

Branch 2: Inferential Statistics

As I mentioned, while descriptive statistics are all about the details of your specific data set – your sample – inferential statistics aim to make inferences about the population . In other words, you’ll use inferential statistics to make predictions about what you’d expect to find in the full population.

What kind of predictions, you ask? Well, there are two common types of predictions that researchers try to make using inferential stats:

  • Firstly, predictions about differences between groups – for example, height differences between children grouped by their favourite meal or gender.
  • And secondly, relationships between variables – for example, the relationship between body weight and the number of hours a week a person does yoga.

In other words, inferential statistics (when done correctly), allow you to connect the dots and make predictions about what you expect to see in the real world population, based on what you observe in your sample data. For this reason, inferential statistics are used for hypothesis testing – in other words, to test hypotheses that predict changes or differences.

Inferential statistics are used to make predictions about what you’d expect to find in the full population, based on the sample.

Of course, when you’re working with inferential statistics, the composition of your sample is really important. In other words, if your sample doesn’t accurately represent the population you’re researching, then your findings won’t necessarily be very useful.

For example, if your population of interest is a mix of 50% male and 50% female , but your sample is 80% male , you can’t make inferences about the population based on your sample, since it’s not representative. This area of statistics is called sampling, but we won’t go down that rabbit hole here (it’s a deep one!) – we’ll save that for another post .

What statistics are usually used in this branch?

There are many, many different statistical analysis methods within the inferential branch and it’d be impossible for us to discuss them all here. So we’ll just take a look at some of the most common inferential statistical methods so that you have a solid starting point.

First up are T-Tests . T-tests compare the means (the averages) of two groups of data to assess whether they’re statistically significantly different. In other words, do they have significantly different means, standard deviations and skewness.

This type of testing is very useful for understanding just how similar or different two groups of data are. For example, you might want to compare the mean blood pressure between two groups of people – one that has taken a new medication and one that hasn’t – to assess whether they are significantly different.

Kicking things up a level, we have ANOVA, which stands for “analysis of variance”. This test is similar to a T-test in that it compares the means of various groups, but ANOVA allows you to analyse multiple groups , not just two groups So it’s basically a t-test on steroids…

Next, we have correlation analysis . This type of analysis assesses the relationship between two variables. In other words, if one variable increases, does the other variable also increase, decrease or stay the same. For example, if the average temperature goes up, do average ice creams sales increase too? We’d expect some sort of relationship between these two variables intuitively , but correlation analysis allows us to measure that relationship scientifically .

Lastly, we have regression analysis – this is quite similar to correlation in that it assesses the relationship between variables, but it goes a step further to understand cause and effect between variables, not just whether they move together. In other words, does the one variable actually cause the other one to move, or do they just happen to move together naturally thanks to another force? Just because two variables correlate doesn’t necessarily mean that one causes the other.

Stats overload…

I hear you. To make this all a little more tangible, let’s take a look at an example of a correlation in action.

Here’s a scatter plot demonstrating the correlation (relationship) between weight and height. Intuitively, we’d expect there to be some relationship between these two variables, which is what we see in this scatter plot. In other words, the results tend to cluster together in a diagonal line from bottom left to top right.

Sample correlation

As I mentioned, these are are just a handful of inferential techniques – there are many, many more. Importantly, each statistical method has its own assumptions and limitations.

For example, some methods only work with normally distributed (parametric) data, while other methods are designed specifically for non-parametric data. And that’s exactly why descriptive statistics are so important – they’re the first step to knowing which inferential techniques you can and can’t use.

Remember that every statistical method has its own assumptions and limitations,  so you need to be aware of these.

How to choose the right analysis method

To choose the right statistical methods, you need to think about two important factors :

  • The type of quantitative data you have (specifically, level of measurement and the shape of the data). And,
  • Your research questions and hypotheses

Let’s take a closer look at each of these.

Factor 1 – Data type

The first thing you need to consider is the type of data you’ve collected (or the type of data you will collect). By data types, I’m referring to the four levels of measurement – namely, nominal, ordinal, interval and ratio. If you’re not familiar with this lingo, check out the video below.

Why does this matter?

Well, because different statistical methods and techniques require different types of data. This is one of the “assumptions” I mentioned earlier – every method has its assumptions regarding the type of data.

For example, some techniques work with categorical data (for example, yes/no type questions, or gender or ethnicity), while others work with continuous numerical data (for example, age, weight or income) – and, of course, some work with multiple data types.

If you try to use a statistical method that doesn’t support the data type you have, your results will be largely meaningless . So, make sure that you have a clear understanding of what types of data you’ve collected (or will collect). Once you have this, you can then check which statistical methods would support your data types here .

If you haven’t collected your data yet, you can work in reverse and look at which statistical method would give you the most useful insights, and then design your data collection strategy to collect the correct data types.

Another important factor to consider is the shape of your data . Specifically, does it have a normal distribution (in other words, is it a bell-shaped curve, centred in the middle) or is it very skewed to the left or the right? Again, different statistical techniques work for different shapes of data – some are designed for symmetrical data while others are designed for skewed data.

This is another reminder of why descriptive statistics are so important – they tell you all about the shape of your data.

Factor 2: Your research questions

The next thing you need to consider is your specific research questions, as well as your hypotheses (if you have some). The nature of your research questions and research hypotheses will heavily influence which statistical methods and techniques you should use.

If you’re just interested in understanding the attributes of your sample (as opposed to the entire population), then descriptive statistics are probably all you need. For example, if you just want to assess the means (averages) and medians (centre points) of variables in a group of people.

On the other hand, if you aim to understand differences between groups or relationships between variables and to infer or predict outcomes in the population, then you’ll likely need both descriptive statistics and inferential statistics.

So, it’s really important to get very clear about your research aims and research questions, as well your hypotheses – before you start looking at which statistical techniques to use.

Never shoehorn a specific statistical technique into your research just because you like it or have some experience with it. Your choice of methods must align with all the factors we’ve covered here.

Time to recap…

You’re still with me? That’s impressive. We’ve covered a lot of ground here, so let’s recap on the key points:

  • Quantitative data analysis is all about  analysing number-based data  (which includes categorical and numerical data) using various statistical techniques.
  • The two main  branches  of statistics are  descriptive statistics  and  inferential statistics . Descriptives describe your sample, whereas inferentials make predictions about what you’ll find in the population.
  • Common  descriptive statistical methods include  mean  (average),  median , standard  deviation  and  skewness .
  • Common  inferential statistical methods include  t-tests ,  ANOVA ,  correlation  and  regression  analysis.
  • To choose the right statistical methods and techniques, you need to consider the  type of data you’re working with , as well as your  research questions  and hypotheses.

quantitative analysis and research

Psst… there’s more (for free)

This post is part of our dissertation mini-course, which covers everything you need to get started with your dissertation, thesis or research project. 

You Might Also Like:

Narrative analysis explainer

74 Comments

Oddy Labs

Hi, I have read your article. Such a brilliant post you have created.

Derek Jansen

Thank you for the feedback. Good luck with your quantitative analysis.

Abdullahi Ramat

Thank you so much.

Obi Eric Onyedikachi

Thank you so much. I learnt much well. I love your summaries of the concepts. I had love you to explain how to input data using SPSS

Lumbuka Kaunda

Amazing and simple way of breaking down quantitative methods.

Charles Lwanga

This is beautiful….especially for non-statisticians. I have skimmed through but I wish to read again. and please include me in other articles of the same nature when you do post. I am interested. I am sure, I could easily learn from you and get off the fear that I have had in the past. Thank you sincerely.

Essau Sefolo

Send me every new information you might have.

fatime

i need every new information

Dr Peter

Thank you for the blog. It is quite informative. Dr Peter Nemaenzhe PhD

Mvogo Mvogo Ephrem

It is wonderful. l’ve understood some of the concepts in a more compréhensive manner

Maya

Your article is so good! However, I am still a bit lost. I am doing a secondary research on Gun control in the US and increase in crime rates and I am not sure which analysis method I should use?

Joy

Based on the given learning points, this is inferential analysis, thus, use ‘t-tests, ANOVA, correlation and regression analysis’

Peter

Well explained notes. Am an MPH student and currently working on my thesis proposal, this has really helped me understand some of the things I didn’t know.

Jejamaije Mujoro

I like your page..helpful

prashant pandey

wonderful i got my concept crystal clear. thankyou!!

Dailess Banda

This is really helpful , thank you

Lulu

Thank you so much this helped

wossen

Wonderfully explained

Niamatullah zaheer

thank u so much, it was so informative

mona

THANKYOU, this was very informative and very helpful

Thaddeus Ogwoka

This is great GRADACOACH I am not a statistician but I require more of this in my thesis

Include me in your posts.

Alem Teshome

This is so great and fully useful. I would like to thank you again and again.

Mrinal

Glad to read this article. I’ve read lot of articles but this article is clear on all concepts. Thanks for sharing.

Emiola Adesina

Thank you so much. This is a very good foundation and intro into quantitative data analysis. Appreciate!

Josyl Hey Aquilam

You have a very impressive, simple but concise explanation of data analysis for Quantitative Research here. This is a God-send link for me to appreciate research more. Thank you so much!

Lynnet Chikwaikwai

Avery good presentation followed by the write up. yes you simplified statistics to make sense even to a layman like me. Thank so much keep it up. The presenter did ell too. i would like more of this for Qualitative and exhaust more of the test example like the Anova.

Adewole Ikeoluwa

This is a very helpful article, couldn’t have been clearer. Thank you.

Samih Soud ALBusaidi

Awesome and phenomenal information.Well done

Nūr

The video with the accompanying article is super helpful to demystify this topic. Very well done. Thank you so much.

Lalah

thank you so much, your presentation helped me a lot

Anjali

I don’t know how should I express that ur article is saviour for me 🥺😍

Saiqa Aftab Tunio

It is well defined information and thanks for sharing. It helps me a lot in understanding the statistical data.

Funeka Mvandaba

I gain a lot and thanks for sharing brilliant ideas, so wish to be linked on your email update.

Rita Kathomi Gikonyo

Very helpful and clear .Thank you Gradcoach.

Hilaria Barsabal

Thank for sharing this article, well organized and information presented are very clear.

AMON TAYEBWA

VERY INTERESTING AND SUPPORTIVE TO NEW RESEARCHERS LIKE ME. AT LEAST SOME BASICS ABOUT QUANTITATIVE.

Tariq

An outstanding, well explained and helpful article. This will help me so much with my data analysis for my research project. Thank you!

chikumbutso

wow this has just simplified everything i was scared of how i am gonna analyse my data but thanks to you i will be able to do so

Idris Haruna

simple and constant direction to research. thanks

Mbunda Castro

This is helpful

AshikB

Great writing!! Comprehensive and very helpful.

himalaya ravi

Do you provide any assistance for other steps of research methodology like making research problem testing hypothesis report and thesis writing?

Sarah chiwamba

Thank you so much for such useful article!

Lopamudra

Amazing article. So nicely explained. Wow

Thisali Liyanage

Very insightfull. Thanks

Melissa

I am doing a quality improvement project to determine if the implementation of a protocol will change prescribing habits. Would this be a t-test?

Aliyah

The is a very helpful blog, however, I’m still not sure how to analyze my data collected. I’m doing a research on “Free Education at the University of Guyana”

Belayneh Kassahun

tnx. fruitful blog!

Suzanne

So I am writing exams and would like to know how do establish which method of data analysis to use from the below research questions: I am a bit lost as to how I determine the data analysis method from the research questions.

Do female employees report higher job satisfaction than male employees with similar job descriptions across the South African telecommunications sector? – I though that maybe Chi Square could be used here. – Is there a gender difference in talented employees’ actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – Is there a gender difference in the cost of actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – What practical recommendations can be made to the management of South African telecommunications companies on leveraging gender to mitigate employee turnover decisions?

Your assistance will be appreciated if I could get a response as early as possible tomorrow

Like

This was quite helpful. Thank you so much.

kidane Getachew

wow I got a lot from this article, thank you very much, keep it up

FAROUK AHMAD NKENGA

Thanks for yhe guidance. Can you send me this guidance on my email? To enable offline reading?

Nosi Ruth Xabendlini

Thank you very much, this service is very helpful.

George William Kiyingi

Every novice researcher needs to read this article as it puts things so clear and easy to follow. Its been very helpful.

Adebisi

Wonderful!!!! you explained everything in a way that anyone can learn. Thank you!!

Miss Annah

I really enjoyed reading though this. Very easy to follow. Thank you

Reza Kia

Many thanks for your useful lecture, I would be really appreciated if you could possibly share with me the PPT of presentation related to Data type?

Protasia Tairo

Thank you very much for sharing, I got much from this article

Fatuma Chobo

This is a very informative write-up. Kindly include me in your latest posts.

naphtal

Very interesting mostly for social scientists

Boy M. Bachtiar

Thank you so much, very helpfull

You’re welcome 🙂

Dr Mafaza Mansoor

woow, its great, its very informative and well understood because of your way of writing like teaching in front of me in simple languages.

Opio Len

I have been struggling to understand a lot of these concepts. Thank you for the informative piece which is written with outstanding clarity.

Eric

very informative article. Easy to understand

Leena Fukey

Beautiful read, much needed.

didin

Always greet intro and summary. I learn so much from GradCoach

Mmusyoka

Quite informative. Simple and clear summary.

Jewel Faver

I thoroughly enjoyed reading your informative and inspiring piece. Your profound insights into this topic truly provide a better understanding of its complexity. I agree with the points you raised, especially when you delved into the specifics of the article. In my opinion, that aspect is often overlooked and deserves further attention.

Shantae

Absolutely!!! Thank you

Thazika Chitimera

Thank you very much for this post. It made me to understand how to do my data analysis.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • Reviews / Why join our community?
  • For companies
  • Frequently asked questions

Quantitative Research

What is quantitative research.

Quantitative research is the methodology which researchers use to test theories about people’s attitudes and behaviors based on numerical and statistical evidence. Researchers sample a large number of users (e.g., through surveys) to indirectly obtain measurable, bias-free data about users in relevant situations.

“Quantification clarifies issues which qualitative analysis leaves fuzzy. It is more readily contestable and likely to be contested. It sharpens scholarly discussion, sparks off rival hypotheses, and contributes to the dynamics of the research process.” — Angus Maddison, Notable scholar of quantitative macro-economic history
  • Transcript loading…

See how quantitative research helps reveal cold, hard facts about users which you can interpret and use to improve your designs.

Use Quantitative Research to Find Mathematical Facts about Users

Quantitative research is a subset of user experience (UX) research . Unlike its softer, more individual-oriented “counterpart”, qualitative research , quantitative research means you collect statistical/numerical data to draw generalized conclusions about users’ attitudes and behaviors . Compare and contrast quantitative with qualitative research, below:

Quantitative research is often best done from early on in projects since it helps teams to optimally direct product development and avoid costly design mistakes later. As you typically get user data from a distance—i.e., without close physical contact with users—also applying qualitative research will help you investigate why users think and feel the ways they do. Indeed, in an iterative design process quantitative research helps you test the assumptions you and your design team develop from your qualitative research. Regardless of the method you use, with proper care you can gather objective and unbiased data – information which you can complement with qualitative approaches to build a fuller understanding of your target users. From there, you can work towards firmer conclusions and drive your design process towards a more realistic picture of how target users will ultimately receive your product.

quantitative analysis and research

Quantitative analysis helps you test your assumptions and establish clearer views of your users in their various contexts.

Quantitative Research Methods You Can Use to Guide Optimal Designs

There are many quantitative research methods, and they help uncover different types of information on users. Some methods, such as A/B testing, are typically done on finished products, while others such as surveys could be done throughout a project’s design process. Here are some of the most helpful methods:

A/B testing – You test two or more versions of your design on users to find the most effective. Each variation differs by just one feature and may or may not affect how users respond. A/B testing is especially valuable for testing assumptions you’ve drawn from qualitative research. The only potential concerns here are scale—in that you’ll typically need to conduct it on thousands of users—and arguably more complexity in terms of considering the statistical significance involved.

Analytics – With tools such as Google Analytics, you measure metrics (e.g., page views, click-through rates) to build a picture (e.g., “How many users take how long to complete a task?”).

Desirability Studies – You measure an aspect of your product (e.g., aesthetic appeal) by typically showing it to participants and asking them to select from a menu of descriptive words. Their responses can reveal powerful insights (e.g., 78% associate the product/brand with “fashionable”).

Surveys and Questionnaires – When you ask for many users’ opinions, you will gain massive amounts of information. Keep in mind that you’ll have data about what users say they do, as opposed to insights into what they do . You can get more reliable results if you incentivize your participants well and use the right format.

Tree Testing – You remove the user interface so users must navigate the site and complete tasks using links alone. This helps you see if an issue is related to the user interface or information architecture.

Another powerful benefit of conducting quantitative research is that you can keep your stakeholders’ support with hard facts and statistics about your design’s performance—which can show what works well and what needs improvement—and prove a good return on investment. You can also produce reports to check statistics against different versions of your product and your competitors’ products.

Most quantitative research methods are relatively cheap. Since no single research method can help you answer all your questions, it’s vital to judge which method suits your project at the time/stage. Remember, it’s best to spend appropriately on a combination of quantitative and qualitative research from early on in development. Design improvements can be costly, and so you can estimate the value of implementing changes when you get the statistics to suggest that these changes will improve usability. Overall, you want to gather measurements objectively, where your personality, presence and theories won’t create bias.

Learn More about Quantitative Research

Take our User Research course to see how to get the most from quantitative research.

See how quantitative research methods fit into your design research landscape .

This insightful piece shows the value of pairing quantitative with qualitative research .

Find helpful tips on combining quantitative research methods in mixed methods research .

Questions related to Quantitative Research

Qualitative and quantitative research differ primarily in the data they produce. Quantitative research yields numerical data to test hypotheses and quantify patterns. It's precise and generalizable. Qualitative research, on the other hand, generates non-numerical data and explores meanings, interpretations, and deeper insights. Watch our video featuring Professor Alan Dix on different types of research methods.

This video elucidates the nuances and applications of both research types in the design field.

In quantitative research, determining a good sample size is crucial for the reliability of the results. William Hudson, CEO of Syntagm, emphasizes the importance of statistical significance with an example in our video. 

He illustrates that even with varying results between design choices, we need to discern whether the differences are statistically significant or products of chance. This ensures the validity of the results, allowing for more accurate interpretations. Statistical tools like chi-square tests can aid in analyzing the results effectively. To delve deeper into these concepts, take William Hudson’s Data-Driven Design: Quantitative UX Research Course . 

Quantitative research is crucial as it provides precise, numerical data that allows for high levels of statistical inference. Our video from William Hudson, CEO of Syntagm, highlights the importance of analytics in examining existing solutions. 

Quantitative methods, like analytics and A/B testing, are pivotal for identifying areas for improvement, understanding user behaviors, and optimizing user experiences based on solid, empirical evidence. This empirical nature ensures that the insights derived are reliable, allowing for practical improvements and innovations. Perhaps most importantly, numerical data is useful to secure stakeholder buy-in and defend design decisions and proposals. Explore this approach in our Data-Driven Design: Quantitative Research for UX Research course and learn from William Hudson’s detailed explanations of when and why to use analytics in the research process.

After establishing initial requirements, statistical data is crucial for informed decisions through quantitative research. William Hudson, CEO of Syntagm, sheds light on the role of quantitative research throughout a typical project lifecycle in this video:

 During the analysis and design phases, quantitative research helps validate user requirements and understand user behaviors. Surveys and analytics are standard tools, offering insights into user preferences and design efficacy. Quantitative research can also be used in early design testing, allowing for optimal design modifications based on user interactions and feedback, and it’s fundamental for A/B and multivariate testing once live solutions are available.

To write a compelling quantitative research question:

Create clear, concise, and unambiguous questions that address one aspect at a time.

Use common, short terms and provide explanations for unusual words.

Avoid leading, compound, and overlapping queries and ensure that questions are not vague or broad.

According to our video by William Hudson, CEO of Syntagm, quality and respondent understanding are vital in forming good questions. 

He emphasizes the importance of addressing specific aspects and avoiding intimidating and confusing elements, such as extensive question grids or ranking questions, to ensure participant engagement and accurate responses. For more insights, see the article Writing Good Questions for Surveys .

Survey research is typically quantitative, collecting numerical data and statistical analysis to make generalizable conclusions. However, it can also have qualitative elements, mainly when it includes open-ended questions, allowing for expressive responses. Our video featuring the CEO of Syntagm, William Hudson, provides in-depth insights into when and how to effectively utilize surveys in the product or service lifecycle, focusing on user satisfaction and potential improvements.

He emphasizes the importance of surveys in triangulating data to back up qualitative research findings, ensuring we have a complete understanding of the user's requirements and preferences.

Descriptive research focuses on describing the subject being studied and getting answers to questions like what, where, when, and who of the research question. However, it doesn’t include the answers to the underlying reasons, or the “why” behind the answers obtained from the research. We can use both f qualitative and quantitative methods to conduct descriptive research. Descriptive research does not describe the methods, but rather the data gathered through the research (regardless of the methods used).

When we use quantitative research and gather numerical data, we can use statistical analysis to understand relationships between different variables. Here’s William Hudson, CEO of Syntagm with more on correlation and how we can apply tests such as Pearson’s r and Spearman Rank Coefficient to our data.

This helps interpret phenomena such as user experience by analyzing session lengths and conversion values, revealing whether variables like time spent on a page affect checkout values, for example.

Random Sampling: Each individual in the population has an equitable opportunity to be chosen, which minimizes biases and simplifies analysis.

Systematic Sampling: Selecting every k-th item from a list after a random start. It's simpler and faster than random sampling when dealing with large populations.

Stratified Sampling: Segregate the population into subgroups or strata according to comparable characteristics. Then, samples are taken randomly from each stratum.

Cluster Sampling: Divide the population into clusters and choose a random sample.

Multistage Sampling: Various sampling techniques are used at different stages to collect detailed information from diverse populations.

Convenience Sampling: The researcher selects the sample based on availability and willingness to participate, which may only represent part of the population.

Quota Sampling: Segment the population into subgroups, and samples are non-randomly selected to fulfill a predetermined quota from each subset.

These are just a few techniques, and choosing the right one depends on your research question, discipline, resource availability, and the level of accuracy required. In quantitative research, there isn't a one-size-fits-all sampling technique; choosing a method that aligns with your research goals and population is critical. However, a well-planned strategy is essential to avoid wasting resources and time, as highlighted in our video featuring William Hudson, CEO of Syntagm.

He emphasizes the importance of recruiting participants meticulously, ensuring their engagement and the quality of their responses. Accurate and thoughtful participant responses are crucial for obtaining reliable results. William also sheds light on dealing with failing participants and scrutinizing response quality to refine the outcomes.

The 4 types of quantitative research are Descriptive, Correlational, Causal-Comparative/Quasi-Experimental, and Experimental Research. Descriptive research aims to depict ‘what exists’ clearly and precisely. Correlational research examines relationships between variables. Causal-comparative research investigates the cause-effect relationship between variables. Experimental research explores causal relationships by manipulating independent variables. To gain deeper insights into quantitative research methods in UX, consider enrolling in our Data-Driven Design: Quantitative Research for UX course.

The strength of quantitative research is its ability to provide precise numerical data for analyzing target variables.This allows for generalized conclusions and predictions about future occurrences, proving invaluable in various fields, including user experience. William Hudson, CEO of Syntagm, discusses the role of surveys, analytics, and testing in providing objective insights in our video on quantitative research methods, highlighting the significance of structured methodologies in eliciting reliable results.

To master quantitative research methods, enroll in our comprehensive course, Data-Driven Design: Quantitative Research for UX . 

This course empowers you to leverage quantitative data to make informed design decisions, providing a deep dive into methods like surveys and analytics. Whether you’re a novice or a seasoned professional, this course at Interaction Design Foundation offers valuable insights and practical knowledge, ensuring you acquire the skills necessary to excel in user experience research. Explore our diverse topics to elevate your understanding of quantitative research methods.

Literature on Quantitative Research

Here’s the entire UX literature on Quantitative Research by the Interaction Design Foundation, collated in one place:

Learn more about Quantitative Research

Take a deep dive into Quantitative Research with our course User Research – Methods and Best Practices .

How do you plan to design a product or service that your users will love , if you don't know what they want in the first place? As a user experience designer, you shouldn't leave it to chance to design something outstanding; you should make the effort to understand your users and build on that knowledge from the outset. User research is the way to do this, and it can therefore be thought of as the largest part of user experience design .

In fact, user research is often the first step of a UX design process—after all, you cannot begin to design a product or service without first understanding what your users want! As you gain the skills required, and learn about the best practices in user research, you’ll get first-hand knowledge of your users and be able to design the optimal product—one that’s truly relevant for your users and, subsequently, outperforms your competitors’ .

This course will give you insights into the most essential qualitative research methods around and will teach you how to put them into practice in your design work. You’ll also have the opportunity to embark on three practical projects where you can apply what you’ve learned to carry out user research in the real world . You’ll learn details about how to plan user research projects and fit them into your own work processes in a way that maximizes the impact your research can have on your designs. On top of that, you’ll gain practice with different methods that will help you analyze the results of your research and communicate your findings to your clients and stakeholders—workshops, user journeys and personas, just to name a few!

By the end of the course, you’ll have not only a Course Certificate but also three case studies to add to your portfolio. And remember, a portfolio with engaging case studies is invaluable if you are looking to break into a career in UX design or user research!

We believe you should learn from the best, so we’ve gathered a team of experts to help teach this course alongside our own course instructors. That means you’ll meet a new instructor in each of the lessons on research methods who is an expert in their field—we hope you enjoy what they have in store for you!

All open-source articles on Quantitative Research

Best practices for qualitative user research.

quantitative analysis and research

  • 3 years ago

Card Sorting

quantitative analysis and research

Understand the User’s Perspective through Research for Mobile UX

quantitative analysis and research

  • 10 mths ago

7 Simple Ways to Get Better Results From Ethnographic Research

quantitative analysis and research

Question Everything

quantitative analysis and research

Tree Testing

quantitative analysis and research

Adding Quality to Your Design Research with an SSQS Checklist

quantitative analysis and research

  • 8 years ago

How to Fit Quantitative Research into the Project Lifecycle

quantitative analysis and research

Why and When to Use Surveys

quantitative analysis and research

Correlation in User Experience

quantitative analysis and research

First-Click Testing

quantitative analysis and research

What to Test

quantitative analysis and research

Rating Scales in UX Research: The Ultimate Guide

quantitative analysis and research

Open Access—Link to us!

We believe in Open Access and the  democratization of knowledge . Unfortunately, world-class educational materials such as this page are normally hidden behind paywalls or in expensive textbooks.

If you want this to change , cite this page , link to us, or join us to help us democratize design knowledge !

Privacy Settings

Our digital services use necessary tracking technologies, including third-party cookies, for security, functionality, and to uphold user rights. Optional cookies offer enhanced features, and analytics.

Experience the full potential of our site that remembers your preferences and supports secure sign-in.

Governs the storage of data necessary for maintaining website security, user authentication, and fraud prevention mechanisms.

Enhanced Functionality

Saves your settings and preferences, like your location, for a more personalized experience.

Referral Program

We use cookies to enable our referral program, giving you and your friends discounts.

Error Reporting

We share user ID with Bugsnag and NewRelic to help us track errors and fix issues.

Optimize your experience by allowing us to monitor site usage. You’ll enjoy a smoother, more personalized journey without compromising your privacy.

Analytics Storage

Collects anonymous data on how you navigate and interact, helping us make informed improvements.

Differentiates real visitors from automated bots, ensuring accurate usage data and improving your website experience.

Lets us tailor your digital ads to match your interests, making them more relevant and useful to you.

Advertising Storage

Stores information for better-targeted advertising, enhancing your online ad experience.

Personalization Storage

Permits storing data to personalize content and ads across Google services based on user behavior, enhancing overall user experience.

Advertising Personalization

Allows for content and ad personalization across Google services based on user behavior. This consent enhances user experiences.

Enables personalizing ads based on user data and interactions, allowing for more relevant advertising experiences across Google services.

Receive more relevant advertisements by sharing your interests and behavior with our trusted advertising partners.

Enables better ad targeting and measurement on Meta platforms, making ads you see more relevant.

Allows for improved ad effectiveness and measurement through Meta’s Conversions API, ensuring privacy-compliant data sharing.

LinkedIn Insights

Tracks conversions, retargeting, and web analytics for LinkedIn ad campaigns, enhancing ad relevance and performance.

LinkedIn CAPI

Enhances LinkedIn advertising through server-side event tracking, offering more accurate measurement and personalization.

Google Ads Tag

Tracks ad performance and user engagement, helping deliver ads that are most useful to you.

Share the knowledge!

Share this content on:

or copy link

Cite according to academic standards

Simply copy and paste the text below into your bibliographic reference list, onto your blog, or anywhere else. You can also just hyperlink to this page.

New to UX Design? We’re Giving You a Free ebook!

The Basics of User Experience Design

Download our free ebook The Basics of User Experience Design to learn about core concepts of UX design.

In 9 chapters, we’ll cover: conducting user interviews, design thinking, interaction design, mobile UX design, usability, UX research, and many more!

Qualitative vs Quantitative Research Methods & Data Analysis

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

What is the difference between quantitative and qualitative?

The main difference between quantitative and qualitative research is the type of data they collect and analyze.

Quantitative research collects numerical data and analyzes it using statistical methods. The aim is to produce objective, empirical data that can be measured and expressed in numerical terms. Quantitative research is often used to test hypotheses, identify patterns, and make predictions.

Qualitative research , on the other hand, collects non-numerical data such as words, images, and sounds. The focus is on exploring subjective experiences, opinions, and attitudes, often through observation and interviews.

Qualitative research aims to produce rich and detailed descriptions of the phenomenon being studied, and to uncover new insights and meanings.

Quantitative data is information about quantities, and therefore numbers, and qualitative data is descriptive, and regards phenomenon which can be observed but not measured, such as language.

What Is Qualitative Research?

Qualitative research is the process of collecting, analyzing, and interpreting non-numerical data, such as language. Qualitative research can be used to understand how an individual subjectively perceives and gives meaning to their social reality.

Qualitative data is non-numerical data, such as text, video, photographs, or audio recordings. This type of data can be collected using diary accounts or in-depth interviews and analyzed using grounded theory or thematic analysis.

Qualitative research is multimethod in focus, involving an interpretive, naturalistic approach to its subject matter. This means that qualitative researchers study things in their natural settings, attempting to make sense of, or interpret, phenomena in terms of the meanings people bring to them. Denzin and Lincoln (1994, p. 2)

Interest in qualitative data came about as the result of the dissatisfaction of some psychologists (e.g., Carl Rogers) with the scientific study of psychologists such as behaviorists (e.g., Skinner ).

Since psychologists study people, the traditional approach to science is not seen as an appropriate way of carrying out research since it fails to capture the totality of human experience and the essence of being human.  Exploring participants’ experiences is known as a phenomenological approach (re: Humanism ).

Qualitative research is primarily concerned with meaning, subjectivity, and lived experience. The goal is to understand the quality and texture of people’s experiences, how they make sense of them, and the implications for their lives.

Qualitative research aims to understand the social reality of individuals, groups, and cultures as nearly as possible as participants feel or live it. Thus, people and groups are studied in their natural setting.

Some examples of qualitative research questions are provided, such as what an experience feels like, how people talk about something, how they make sense of an experience, and how events unfold for people.

Research following a qualitative approach is exploratory and seeks to explain ‘how’ and ‘why’ a particular phenomenon, or behavior, operates as it does in a particular context. It can be used to generate hypotheses and theories from the data.

Qualitative Methods

There are different types of qualitative research methods, including diary accounts, in-depth interviews , documents, focus groups , case study research , and ethnography.

The results of qualitative methods provide a deep understanding of how people perceive their social realities and in consequence, how they act within the social world.

The researcher has several methods for collecting empirical materials, ranging from the interview to direct observation, to the analysis of artifacts, documents, and cultural records, to the use of visual materials or personal experience. Denzin and Lincoln (1994, p. 14)

Here are some examples of qualitative data:

Interview transcripts : Verbatim records of what participants said during an interview or focus group. They allow researchers to identify common themes and patterns, and draw conclusions based on the data. Interview transcripts can also be useful in providing direct quotes and examples to support research findings.

Observations : The researcher typically takes detailed notes on what they observe, including any contextual information, nonverbal cues, or other relevant details. The resulting observational data can be analyzed to gain insights into social phenomena, such as human behavior, social interactions, and cultural practices.

Unstructured interviews : generate qualitative data through the use of open questions.  This allows the respondent to talk in some depth, choosing their own words.  This helps the researcher develop a real sense of a person’s understanding of a situation.

Diaries or journals : Written accounts of personal experiences or reflections.

Notice that qualitative data could be much more than just words or text. Photographs, videos, sound recordings, and so on, can be considered qualitative data. Visual data can be used to understand behaviors, environments, and social interactions.

Qualitative Data Analysis

Qualitative research is endlessly creative and interpretive. The researcher does not just leave the field with mountains of empirical data and then easily write up his or her findings.

Qualitative interpretations are constructed, and various techniques can be used to make sense of the data, such as content analysis, grounded theory (Glaser & Strauss, 1967), thematic analysis (Braun & Clarke, 2006), or discourse analysis.

For example, thematic analysis is a qualitative approach that involves identifying implicit or explicit ideas within the data. Themes will often emerge once the data has been coded.

RESEARCH THEMATICANALYSISMETHOD

Key Features

  • Events can be understood adequately only if they are seen in context. Therefore, a qualitative researcher immerses her/himself in the field, in natural surroundings. The contexts of inquiry are not contrived; they are natural. Nothing is predefined or taken for granted.
  • Qualitative researchers want those who are studied to speak for themselves, to provide their perspectives in words and other actions. Therefore, qualitative research is an interactive process in which the persons studied teach the researcher about their lives.
  • The qualitative researcher is an integral part of the data; without the active participation of the researcher, no data exists.
  • The study’s design evolves during the research and can be adjusted or changed as it progresses. For the qualitative researcher, there is no single reality. It is subjective and exists only in reference to the observer.
  • The theory is data-driven and emerges as part of the research process, evolving from the data as they are collected.

Limitations of Qualitative Research

  • Because of the time and costs involved, qualitative designs do not generally draw samples from large-scale data sets.
  • The problem of adequate validity or reliability is a major criticism. Because of the subjective nature of qualitative data and its origin in single contexts, it is difficult to apply conventional standards of reliability and validity. For example, because of the central role played by the researcher in the generation of data, it is not possible to replicate qualitative studies.
  • Also, contexts, situations, events, conditions, and interactions cannot be replicated to any extent, nor can generalizations be made to a wider context than the one studied with confidence.
  • The time required for data collection, analysis, and interpretation is lengthy. Analysis of qualitative data is difficult, and expert knowledge of an area is necessary to interpret qualitative data. Great care must be taken when doing so, for example, looking for mental illness symptoms.

Advantages of Qualitative Research

  • Because of close researcher involvement, the researcher gains an insider’s view of the field. This allows the researcher to find issues that are often missed (such as subtleties and complexities) by the scientific, more positivistic inquiries.
  • Qualitative descriptions can be important in suggesting possible relationships, causes, effects, and dynamic processes.
  • Qualitative analysis allows for ambiguities/contradictions in the data, which reflect social reality (Denscombe, 2010).
  • Qualitative research uses a descriptive, narrative style; this research might be of particular benefit to the practitioner as she or he could turn to qualitative reports to examine forms of knowledge that might otherwise be unavailable, thereby gaining new insight.

What Is Quantitative Research?

Quantitative research involves the process of objectively collecting and analyzing numerical data to describe, predict, or control variables of interest.

The goals of quantitative research are to test causal relationships between variables , make predictions, and generalize results to wider populations.

Quantitative researchers aim to establish general laws of behavior and phenomenon across different settings/contexts. Research is used to test a theory and ultimately support or reject it.

Quantitative Methods

Experiments typically yield quantitative data, as they are concerned with measuring things.  However, other research methods, such as controlled observations and questionnaires , can produce both quantitative information.

For example, a rating scale or closed questions on a questionnaire would generate quantitative data as these produce either numerical data or data that can be put into categories (e.g., “yes,” “no” answers).

Experimental methods limit how research participants react to and express appropriate social behavior.

Findings are, therefore, likely to be context-bound and simply a reflection of the assumptions that the researcher brings to the investigation.

There are numerous examples of quantitative data in psychological research, including mental health. Here are a few examples:

Another example is the Experience in Close Relationships Scale (ECR), a self-report questionnaire widely used to assess adult attachment styles .

The ECR provides quantitative data that can be used to assess attachment styles and predict relationship outcomes.

Neuroimaging data : Neuroimaging techniques, such as MRI and fMRI, provide quantitative data on brain structure and function.

This data can be analyzed to identify brain regions involved in specific mental processes or disorders.

For example, the Beck Depression Inventory (BDI) is a clinician-administered questionnaire widely used to assess the severity of depressive symptoms in individuals.

The BDI consists of 21 questions, each scored on a scale of 0 to 3, with higher scores indicating more severe depressive symptoms. 

Quantitative Data Analysis

Statistics help us turn quantitative data into useful information to help with decision-making. We can use statistics to summarize our data, describing patterns, relationships, and connections. Statistics can be descriptive or inferential.

Descriptive statistics help us to summarize our data. In contrast, inferential statistics are used to identify statistically significant differences between groups of data (such as intervention and control groups in a randomized control study).

  • Quantitative researchers try to control extraneous variables by conducting their studies in the lab.
  • The research aims for objectivity (i.e., without bias) and is separated from the data.
  • The design of the study is determined before it begins.
  • For the quantitative researcher, the reality is objective, exists separately from the researcher, and can be seen by anyone.
  • Research is used to test a theory and ultimately support or reject it.

Limitations of Quantitative Research

  • Context: Quantitative experiments do not take place in natural settings. In addition, they do not allow participants to explain their choices or the meaning of the questions they may have for those participants (Carr, 1994).
  • Researcher expertise: Poor knowledge of the application of statistical analysis may negatively affect analysis and subsequent interpretation (Black, 1999).
  • Variability of data quantity: Large sample sizes are needed for more accurate analysis. Small-scale quantitative studies may be less reliable because of the low quantity of data (Denscombe, 2010). This also affects the ability to generalize study findings to wider populations.
  • Confirmation bias: The researcher might miss observing phenomena because of focus on theory or hypothesis testing rather than on the theory of hypothesis generation.

Advantages of Quantitative Research

  • Scientific objectivity: Quantitative data can be interpreted with statistical analysis, and since statistics are based on the principles of mathematics, the quantitative approach is viewed as scientifically objective and rational (Carr, 1994; Denscombe, 2010).
  • Useful for testing and validating already constructed theories.
  • Rapid analysis: Sophisticated software removes much of the need for prolonged data analysis, especially with large volumes of data involved (Antonius, 2003).
  • Replication: Quantitative data is based on measured values and can be checked by others because numerical data is less open to ambiguities of interpretation.
  • Hypotheses can also be tested because of statistical analysis (Antonius, 2003).

Antonius, R. (2003). Interpreting quantitative data with SPSS . Sage.

Black, T. R. (1999). Doing quantitative research in the social sciences: An integrated approach to research design, measurement and statistics . Sage.

Braun, V. & Clarke, V. (2006). Using thematic analysis in psychology . Qualitative Research in Psychology , 3, 77–101.

Carr, L. T. (1994). The strengths and weaknesses of quantitative and qualitative research : what method for nursing? Journal of advanced nursing, 20(4) , 716-721.

Denscombe, M. (2010). The Good Research Guide: for small-scale social research. McGraw Hill.

Denzin, N., & Lincoln. Y. (1994). Handbook of Qualitative Research. Thousand Oaks, CA, US: Sage Publications Inc.

Glaser, B. G., Strauss, A. L., & Strutzel, E. (1968). The discovery of grounded theory; strategies for qualitative research. Nursing research, 17(4) , 364.

Minichiello, V. (1990). In-Depth Interviewing: Researching People. Longman Cheshire.

Punch, K. (1998). Introduction to Social Research: Quantitative and Qualitative Approaches. London: Sage

Further Information

  • Designing qualitative research
  • Methods of data collection and analysis
  • Introduction to quantitative and qualitative research
  • Checklists for improving rigour in qualitative research: a case of the tail wagging the dog?
  • Qualitative research in health care: Analysing qualitative data
  • Qualitative data analysis: the framework approach
  • Using the framework method for the analysis of
  • Qualitative data in multi-disciplinary health research
  • Content Analysis
  • Grounded Theory
  • Thematic Analysis

Print Friendly, PDF & Email

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.60(9); 2016 Sep

Basic statistical tools in research and data analysis

Zulfiqar ali.

Department of Anaesthesiology, Division of Neuroanaesthesiology, Sheri Kashmir Institute of Medical Sciences, Soura, Srinagar, Jammu and Kashmir, India

S Bala Bhaskar

1 Department of Anaesthesiology and Critical Care, Vijayanagar Institute of Medical Sciences, Bellary, Karnataka, India

Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

INTRODUCTION

Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population.[ 1 ] This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice.[ 2 ]

Variable is a characteristic that varies from one individual member of population to another individual.[ 3 ] Variables such as height and weight are measured by some type of scale, convey quantitative information and are called as quantitative variables. Sex and eye colour give qualitative information and are called as qualitative variables[ 3 ] [ Figure 1 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g001.jpg

Classification of variables

Quantitative variables

Quantitative or numerical data are subdivided into discrete and continuous measurements. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… (integer), whereas continuous data can assume any value. Observations that can be counted constitute the discrete data and observations that can be measured constitute the continuous data. Examples of discrete data are number of episodes of respiratory arrests or the number of re-intubations in an intensive care unit. Similarly, examples of continuous data are the serial serum glucose levels, partial pressure of oxygen in arterial blood and the oesophageal temperature.

A hierarchical scale of increasing precision can be used for observing and recording the data which is based on categorical, ordinal, interval and ratio scales [ Figure 1 ].

Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be arranged in any particular order. If only two categories exist (as in gender male and female), it is called as a dichotomous (or binary) data. The various causes of re-intubation in an intensive care unit due to upper airway obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary oedema and neurological impairment are examples of categorical variables.

Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal intervals. Examples are the American Society of Anesthesiologists status or Richmond agitation-sedation scale.

Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. A good example of an interval scale is the Fahrenheit degree scale used to measure temperature. With the Fahrenheit scale, the difference between 70° and 75° is equal to the difference between 80° and 85°: The units of measurement are equal throughout the full range of the scale.

Ratio scales are similar to interval scales, in that equal differences between scale values have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property. For example, the system of centimetres is an example of a ratio scale. There is a true zero point and the value of 0 cm means a complete absence of length. The thyromental distance of 6 cm in an adult may be twice that of a child in whom it may be 3 cm.

STATISTICS: DESCRIPTIVE AND INFERENTIAL STATISTICS

Descriptive statistics[ 4 ] try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics[ 4 ] use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and inferential statistics are illustrated in Table 1 .

Example of descriptive and inferential statistics

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g002.jpg

Descriptive statistics

The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.

Measures of central tendency

The measures of central tendency are mean, median and mode.[ 6 ] Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores. Mean may be influenced profoundly by the extreme variables. For example, the average stay of organophosphorus poisoning patients in ICU may be influenced by a single patient who stays in ICU for around 5 months because of septicaemia. The extreme values are called outliers. The formula for the mean is

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g003.jpg

where x = each observation and n = number of observations. Median[ 6 ] is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value) while mode is the most frequently occurring variable in a distribution. Range defines the spread, or variability, of a sample.[ 7 ] It is described by the minimum and maximum values of the variables. If we rank the data and after ranking, group the observations into percentiles, we can get better information of the pattern of spread of the variables. In percentiles, we rank the observations into 100 equal parts. We can then describe 25%, 50%, 75% or any other percentile amount. The median is the 50 th percentile. The interquartile range will be the observations in the middle 50% of the observations about the median (25 th -75 th percentile). Variance[ 7 ] is a measure of how spread out is the distribution. It gives an indication of how close an individual observation clusters about the mean value. The variance of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g004.jpg

where σ 2 is the population variance, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The variance of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g005.jpg

where s 2 is the sample variance, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. The formula for the variance of a population has the value ‘ n ’ as the denominator. The expression ‘ n −1’ is known as the degrees of freedom and is one less than the number of parameters. Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used. The square root of the variance is the standard deviation (SD).[ 8 ] The SD of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g006.jpg

where σ is the population SD, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The SD of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g007.jpg

where s is the sample SD, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. An example for calculation of variation and SD is illustrated in Table 2 .

Example of mean, variance, standard deviation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g008.jpg

Normal distribution or Gaussian distribution

Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point.[ 1 ] The standard normal distribution curve is a symmetrical bell-shaped. In a normal distribution curve, about 68% of the scores are within 1 SD of the mean. Around 95% of the scores are within 2 SDs of the mean and 99% within 3 SDs of the mean [ Figure 2 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g009.jpg

Normal distribution curve

Skewed distribution

It is a distribution with an asymmetry of the variables about its mean. In a negatively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the right of Figure 1 . In a positively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the left of the figure leading to a longer right tail.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g010.jpg

Curves showing negatively skewed and positively skewed distribution

Inferential statistics

In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. Hypothesis tests are thus procedures for making rational decisions about the reality of observed effects.

Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty).

In inferential statistics, the term ‘null hypothesis’ ( H 0 ‘ H-naught ,’ ‘ H-null ’) denotes that there is no relationship (difference) between the population variables in question.[ 9 ]

Alternative hypothesis ( H 1 and H a ) denotes that a statement between the variables is expected to be true.[ 9 ]

The P value (or the calculated probability) is the probability of the event occurring by chance if the null hypothesis is true. The P value is a numerical between 0 and 1 and is interpreted by researchers in deciding whether to reject or retain the null hypothesis [ Table 3 ].

P values with interpretation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g011.jpg

If P value is less than the arbitrarily chosen value (known as α or the significance level), the null hypothesis (H0) is rejected [ Table 4 ]. However, if null hypotheses (H0) is incorrectly rejected, this is known as a Type I error.[ 11 ] Further details regarding alpha error, beta error and sample size calculation and factors influencing them are dealt with in another section of this issue by Das S et al .[ 12 ]

Illustration for null hypothesis

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g012.jpg

PARAMETRIC AND NON-PARAMETRIC TESTS

Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.[ 13 ]

Two most basic prerequisites for parametric statistical analysis are:

  • The assumption of normality which specifies that the means of the sample group are normally distributed
  • The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal.

However, if the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric[ 14 ] statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.

Parametric tests

The parametric tests assume that the data are on a quantitative (numerical) scale, with a normal distribution of the underlying population. The samples have the same variance (homogeneity of variances). The samples are randomly drawn from the population, and the observations within a group are independent of each other. The commonly used parametric tests are the Student's t -test, analysis of variance (ANOVA) and repeated measures ANOVA.

Student's t -test

Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three circumstances:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g013.jpg

where X = sample mean, u = population mean and SE = standard error of mean

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g014.jpg

where X 1 − X 2 is the difference between the means of the two groups and SE denotes the standard error of the difference.

  • To test if the population means estimated by two dependent samples differ significantly (the paired t -test). A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment.

The formula for paired t -test is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g015.jpg

where d is the mean difference and SE denotes the standard error of this difference.

The group variances can be compared using the F -test. The F -test is the ratio of variances (var l/var 2). If F differs significantly from 1.0, then it is concluded that the group variances differ significantly.

Analysis of variance

The Student's t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups.

In ANOVA, we study two variances – (a) between-group variability and (b) within-group variability. The within-group variability (error variance) is the variation that cannot be accounted for in the study design. It is based on random differences present in our samples.

However, the between-group (or effect variance) is the result of our treatment. These two estimates of variances are compared using the F-test.

A simplified formula for the F statistic is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g016.jpg

where MS b is the mean squares between the groups and MS w is the mean squares within groups.

Repeated measures analysis of variance

As with ANOVA, repeated measures ANOVA analyses the equality of means of three or more groups. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time.

As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: The data violate the ANOVA assumption of independence. Hence, in the measurement of repeated dependent variables, repeated measures ANOVA should be used.

Non-parametric tests

When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests (distribution-free test) are used in such situation as they do not require the normality assumption.[ 15 ] Non-parametric tests may fail to detect a significant difference when compared with a parametric test. That is, they usually have less power.

As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5 .

Analogue of parametric and non-parametric tests

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g017.jpg

Median test for one sample: The sign test and Wilcoxon's signed rank test

The sign test and Wilcoxon's signed rank test are used for median tests of one sample. These tests examine whether one instance of sample data is greater or smaller than the median reference value.

This test examines the hypothesis about the median θ0 of a population. It tests the null hypothesis H0 = θ0. When the observed value (Xi) is greater than the reference value (θ0), it is marked as+. If the observed value is smaller than the reference value, it is marked as − sign. If the observed value is equal to the reference value (θ0), it is eliminated from the sample.

If the null hypothesis is true, there will be an equal number of + signs and − signs.

The sign test ignores the actual values of the data and only uses + or − signs. Therefore, it is useful when it is difficult to measure the values.

Wilcoxon's signed rank test

There is a major limitation of sign test as we lose the quantitative information of the given data and merely use the + or – signs. Wilcoxon's signed rank test not only examines the observed values in comparison with θ0 but also takes into consideration the relative sizes, adding more statistical power to the test. As in the sign test, if there is an observed value that is equal to the reference value θ0, this observed value is eliminated from the sample.

Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums.

Mann-Whitney test

It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other.

Mann–Whitney test compares all data (xi) belonging to the X group and all data (yi) belonging to the Y group and calculates the probability of xi being greater than yi: P (xi > yi). The null hypothesis states that P (xi > yi) = P (xi < yi) =1/2 while the alternative hypothesis states that P (xi > yi) ≠1/2.

Kolmogorov-Smirnov test

The two-sample Kolmogorov-Smirnov (KS) test was designed as a generic method to test whether two random samples are drawn from the same distribution. The null hypothesis of the KS test is that both distributions are identical. The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves.

Kruskal-Wallis test

The Kruskal–Wallis test is a non-parametric test to analyse the variance.[ 14 ] It analyses if there is any difference in the median values of three or more independent samples. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic.

Jonckheere test

In contrast to Kruskal–Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal–Wallis test.[ 14 ]

Friedman test

The Friedman test is a non-parametric test for testing the difference between several related samples. The Friedman test is an alternative for repeated measures ANOVAs which is used when the same parameter has been measured under different conditions on the same subjects.[ 13 ]

Tests to analyse the categorical data

Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data if there were no differences between groups (i.e., the null hypothesis). It is calculated by the sum of the squared difference between observed ( O ) and the expected ( E ) data (or the deviation, d ) divided by the expected data by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g018.jpg

A Yates correction factor is used when the sample size is small. Fischer's exact test is used to determine if there are non-random associations between two categorical variables. It does not assume random sampling, and instead of referring a calculated statistic to a sampling distribution, it calculates an exact probability. McNemar's test is used for paired nominal data. It is applied to 2 × 2 table with paired-dependent samples. It is used to determine whether the row and column frequencies are equal (that is, whether there is ‘marginal homogeneity’). The null hypothesis is that the paired proportions are equal. The Mantel-Haenszel Chi-square test is a multivariate test as it analyses multiple grouping variables. It stratifies according to the nominated confounding variables and identifies any that affects the primary outcome variable. If the outcome variable is dichotomous, then logistic regression is used.

SOFTWARES AVAILABLE FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

Numerous statistical software systems are available currently. The commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System ((SAS – developed by SAS Institute North Carolina, United States of America), R (designed by Ross Ihaka and Robert Gentleman from R core team), Minitab (developed by Minitab Inc), Stata (developed by StataCorp) and the MS Excel (developed by Microsoft).

There are a number of web resources which are related to statistical power analyses. A few are:

  • StatPages.net – provides links to a number of online power calculators
  • G-Power – provides a downloadable power analysis program that runs under DOS
  • Power analysis for ANOVA designs an interactive site that calculates power or sample size needed to attain a given power for one effect in a factorial ANOVA design
  • SPSS makes a program called SamplePower. It gives an output of a complete report on the computer screen which can be cut and paste into another document.

It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an appropriately well-designed study leading to valid and reliable results. Inappropriate use of statistical techniques may lead to faulty conclusions, inducing errors and undermining the significance of the article. Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research which can be utilised for formulating the evidence-based guidelines.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Logo for UNT Open Books

1 Introduction to Quantitative Analysis

Chris Bailey, PhD, CSCS, RSCC

Chapter Learning Objectives

  • Understand the justification for quantitative analysis
  • Learn how data and the scientific process can be used to inform decisions
  • Learn and differentiate between some of the commonly used terminology in quantitative analysis
  • Introduce the functions of quantitative analysis
  • Introduce some of the technology used in quantitative analysis

Why is quantitative analysis important?

Let’s begin by answering the “who cares” question. When will you ever use any of this? As we will soon demonstrate, you likely already are, but it may not be the most objective method. Quantitative data are objective in nature, which is a benefit when we are trying to make decisions based on data without the influence of anything else. Much of what we will learn in quantitative analysis enables us to become more objective so that our individual experiences, traditions, and biases [1] cannot influence our decisions.

No matter what career path you are on, you will need to be able to justify your actions and decisions with data. Whether you are a sport performance professional, personal trainer, or physical therapist, you are likely tracking progression and using the data to influence your future plans for athletes, clients, or patients. Data from the individuals you work with may justify your current plan, or they could illustrate an area that needs to be adjusted to meet certain goals.

If we are not collecting data, we have to rely on our short memories and subjective feelings. These can be biased whether we realize it or not. For example as a physical therapist (PT), we want our rehabilitation plan to work, so we may only see and remember the positives and miss the negatives. If we had a set regimen of tests, we could look at it in a more objective way that is less likely to be influenced by our own biases.

A woman looking at her phone

Let’s look at an example of how you might use analysis on a regular basis. In this scenario, your cell phone is outdated, has a cracked screen, and takes terrible photos compared to currently available options. What factors would you consider when thinking about your new phone purchase?

Here are some ways you might approach your decision:

  • Brand loyalty
  • Read reviews
  • Watch YouTube video reviews
  • Check out your friend’s phone

First and most often foremost is price. What can you afford? You’ll need to research the different phones available and which are in your price range.

What about the type of phone you currently have? Does that play a role? Many cell phone users like to stick to the same operating system they are used to. For example, if you currently have an iPhone, you are probably more likely to stick with an iPhone for your next purchase as opposed to switching to an Android device. This is referred to as brand loyalty.

The next step might be to read reviews or watch video reviews on YouTube.

Finally, maybe you are jealous of the phone your friend just got. So you’ll just get the same one or the slightly newer version. Of course, you may come up with other factors that play a role in your decision-making process.

Each of these are ways of collecting data to influence your decision, even if you don’t realize you are collecting data. The decision-making process is likely a multi-factor process as we discussed. In kinesiology, we can answer questions in a similar way by creating methods of data collection to help us answer questions and make informed decisions.

A more kinesiology specific example

Let’s look at a more specific example in kinesiology, tracking physical activity…or lack thereof. What if we wanted to evaluate the physical inactivity of adults in the United States at the state level and examine if there are differences according to race or ethnicity? Fortunately, the United State Center for Disease Control (CDC) has compiled such data. According to the CDC, all states in the United States had more than 15% of adults that were considered physically inactive as of 2018 [2] .

Let’s break this down a little further, because this statistic is actually worse than it sounds and the results differ depending on race/ethnicity. The CDC defines physical inactivity as not participating in any physical activity during the past month (via self-report and excluding work-related activities). The actual range of physically inactive adults was from 17.3 – 47.7% in all states. There were 22 states that had greater than 25% of their population classified as physically inactive. Interestingly, these results differ slightly when race or ethnicity is considered. This study classified their sample into 3 groups: Hispanic adults, non-Hispanic black adults, and non-Hispanic white adults. Of the 3, those that would be considered minorities in the United States had higher percentages of physical inactivity. Hispanic adults expressed physical inactivity of 30% or greater in 22 states plus Puerto Rico, and 23 states plus Washington D.C. expressed physical inactivity of 30% or higher in non-Hispanic black adults. If we compare that data to non-Hispanic white adults, only 5 states plus Puerto Rico expressed physical inactivity of 30% or higher. [3]

In this example, we just used some data to answer a question about the prevalence of physical inactivity in the United States. But, we shouldn’t stop there. We should come up with some sort of practical application. A very simple one based on the data is that we should encourage more physical activity in the U.S. Said another way, we should discourage physical inactivity as the data suggests that there are many that are physically inactive. Looking a bit deeper at the results, we might suggest that health educators target their efforts in specific areas and populations since the results suggest that geographic and population disparities exist. This study did not evaluate why these disparities exist, but we should consider them in potential solutions.

While this may seem fairly straight forward, there are many other factors we need to consider in quantitative analysis. For example, do we know whether or not the data are valid and reliable? Do you know the difference between validity and reliability ? It’s okay if you don’t. As we will see later, many people confuse these two on a regular basis. What issues do you see with the data collection? Many may take issue with the data being acquired via self-report. We will discuss surveys/questionnaires later in this book, but they are a great way to reach a very wide and large sample of a population. Obviously, more objective methods (e.g. an accelerometer or pedometer) would be better, but when we have a very large sample, potential error is less of a concern since a greater proportion of the population is being measured.

Using Data and the Scientific Process to Inform

As we have just seen, data collected on a specific topic is used as information to help us understand more about that topic. This is a part of the scientific process of acquiring knowledge, sometimes referred to as the scientific method, which you’ve likely heard of before. While the scientific method was popularized in the 20th century, it’s development is often credited to Aristotle. [4] [5]

Image depicitng steps of the scientific method

While the number of steps and their naming may differ depending on the source, they are often similar to Figure 1.1 shown above. First, one might wonder about a specific question based on an observation. Consider an example where Elise, an athletic trainer with a professional baseball team, notices that the majority of injuries and treatment times are highest each year during Spring Training [6] . Anecdotally , she observes that several of the injured players did not follow the off-season training program. She wonders if the sudden increase in workload plays a role. In this example, she is at the first step we described above.

Moving forward, she should examine previously published relevant research. In doing so, she notices there are quite a few studies in this area and many specifically look at the ratio of recent (acute) workloads to the accumulated (chronic) workloads and some have found higher risks of injury associated with higher levels of these ratios. [7] [8]

Now that she has enough information, she can finalize a hypothesis . Elise then hypothesizes that elevated ratios will increase the risk of injuries, but that the increased risk may differ from previous research because it wasn’t done on baseball players.

Now she is on to the experiment stage and she must design a way to test her hypothesis. So, she utilizes a smartphone application that helps athletes and coaches track their workloads during the off-season and during spring training. She also uses their injury data during Spring Training to see if those that incurred injuries during spring training had higher acute:chronic workload ratios compared to those that did not get injured. Spring Training is now over and she can now analyze the results. She finds that there is no statistical difference in acute:chronic workload ratios between the injured and non-injured groups.

Moving to the next stage she must draw conclusions based on the results found. The results did not support Elise’s hypothesis, so she cannot say that a sudden increase in workload increases risk of injury. But as she is contemplating this, she realizes that she did not take different injury types into consideration. Her sample included all athletes that were injured during Spring Training, which includes both ligament (for example, ulnar collateral ligament sprain), muscular (for example, hamstring strain), and tendon (for example, patellar tendon strain) injuries. She now recognizes that injury type may play a role in the relationship between workload accumulation and injury risk.

Now it’s time to report the results. This step may take different forms depending on your occupation. In Elise’s case, this may be a written report or a presentation to the team’s staff and front office executives. This could also be formally written up as a research paper and submitted for publication.

Hopefully you noticed that this step is also followed by an arrow that leads back to the first step. The scientific process is a cycle and we often finish the last step with more questions, which lead right back into more research. This was also the case with Elise’s example. She can now repeat the study and examine if injury type is important to her previous research question.

This text will focus on working with and analyzing data, but many of the other stages are dependent on this data. Also, the data analysis stage is dependent upon those that happened before it. Can you spot the data used in the example above? It primarily used workloads and injury status. If the data we need to answer a question aren’t available, we must find ways to collect it and that is what Elise did in the example above. There may be other times were the data are already available, but they aren’t recorded in the same source (table or spreadsheet), which means they need to be combined. Many times, the data are not recorded in an immediately usable format, so we may need to reorganize it (often referred to as data wrangling). Once we have the data in a usable format, we can then move onto analysis. Overwhelmingly, this text will focus on the analysis stage and all of the different techniques that can be used when appropriate. But how the other stages are influenced by the analysis stage and how the other stages influence it will also be addressed.

Terminology in Quantitative Analysis

There are many terms that are frequently used in statistical and quantitative analysis that are often confused and used interchangeably, but should not be. Many of which we may have already used, so now is a good time to begin defining some of our frequently used terms so that we avoid some confusion. Of course, we will have important terminology later on and we will define it when we encounter it.

  • If we were to measure the body mass index (BMI) of all of the U.S. population, we would need to collect both the height and body mass of roughly 332.4 million people [9] .
  • In the example above using the BMI of the entire U.S. population, the BMI would be a parameter.
  • If we were to measure the BMI of only a sample of the U.S. population, we might randomly sample only 1% of the U.S. population (≈ 3.3 million people).
  • In the example above using the BMI in a sample of the U.S. population, the BMI would be a statistic.
  • A new device is created to evaluate your heart rate variability (HRV) via the camera sensors on your smart phone. To make sure it is actually measuring accurately, we might compare the new data to a well known and accepted standard way to measure HRV.
  • In order to evaluate the between trial reliability of the newly created HRV device described above, we might collect data at 2 or 3 different times throughout the early morning to see how similar they are (or aren’t).
  • Anecdotal : evidence that is collected by personal experiences and not in systematic manner. Most often considered of lower value in scientific occupations and may lead to error.
  • Empirical : evidence that is collected and documented by systematic experimentation.
  • Hypothesis : a research and scientific-based guess to answer a specific question or phenomenon.
  • For example, we might compare  jump performance results of one athlete to other athletes to say that he or she is a superior performer. Or we could use these results in a rehab setting to determine if our patient is progressing in their rehabilitation as they should, compared to data previous patients have produced at the same stage of recovery.
  • For example measuring vertical jump performance likely results in a measure of jump height or vertical power.
  • Following with the example above, we could use a jump and reach device, a switch mat, or a force plate to measure vertical jumping performance. Not all measurements in kinesiology are physical in nature, so these instruments may take other forms.
  • Formative evaluation: Pretest, mid-test, or any evaluation prior to the final evaluation that helps to track changes in the quantity being measured.
  • Summative evaluation: Final evaluation that helps to demonstrate achievement.

Time series plot depicting change in strength asymmetry of the knee at various stages of the rehabilitation process.

Examine the data plot above that shows a measurement of strength asymmetry as a percentage for an athlete returning from an ACL knee injury. Positive values indicate a left-side asymmetry and negative values indicate a right-side asymmetry. Can you guess which side was injured based on the initial data? This athlete had a right knee injury. Initially the athlete was roughly 17% stronger on the left side, which should have given you a clue to answer the previous question. Based on what we discussed in the previous 2 terms, would you say this data was created by formative or summative assessments?

Criterion-referencing : compares a performance to a specific preset requirement.

For example, passing a strength and conditioning, personal trainer, or other fitness related certification exam. Generally, these exams require test-takers to achieve a minimum score that represents mastery over the content. Some may even require that test takers achieve a specific score in many areas, not necessarily just the overall score. Either way, there may be a set score that represents the “criterion” necessary for certification, such as 70% or better. Some other criterion referenced standards  based evaluation examples include: athletic training Board of Certification (BOC) exam, CPR exam, or a U.S. driver’s learners permit exam (this may vary by state).

Norm-referencing : compares performance(s) to the sample that the performer tested with or with a similar population.

Examples of norm-referenced standards include the interpretation of SAT, ACT, GRE, and IQ test scores. All of these may express results relative to those that take the exam. For example a score of 100 on the IQ (intelligence quotient) test represents the average score based on a normal distribution. We’ll learn about the normal distribution later, but this means that roughly 66.6% of test takers will score between 85 and 115. This is because all scores are transformed to make the current average score equal 100 with a standard deviation of 15 [10] . This means that a test-takers score might change based on the intelligence of the others that also take the exam in a similar time period. This also means that comparing the IQ of someone who took the exam today to someone who took the test 10 or more years ago is meaningless as a score of 135 may show that you are in the 99th percentile of your current time period. Furthermore, IQs have been shown substantially rise with time [11] . So, you could argue that an IQ of 100 as tested in 2020 is superior to the IQ of 100 in 2000.

Functions of Quantitative Analysis

Overall, it is required of us as professionals (or future professionals) in the field of kinesiology to make informed decisions, which often means using quantitative data. We can break this down further into several functions of quantitative analysis. Morrow and colleagues (2016) [12] recognize the following functions of quantitative analysis in Human Performance:

  • Professionals may be able to group athletes, patients, or students following an evaluation of their abilities, which may help facilitate development. For example, an initial assessment may help a youth softball coach group athletes based on skill level and experience.
  • The ability to predict future events may be the “Holy Grail” of many fields of research and business, but it requires large amounts of data that is often hard to come by (especially in sport performance). A very common example of this is the efforts and money spent on predicting injury in sports. Intuitively, the notion makes sense. If we can predict an injury, we should be able to prevent it. Currently, much of this research lies in the area of training loads and the rate at which an athlete increases them. [13] [14] [15]
  • Many coaches and trainers set goals for their athletes and clients. Many physical therapists set goals for their patients. Many individuals set goals for themselves. Without doing this and measuring a specific quality, there will be no knowledge of improvement or progress.
  • For many, scores on a specific test may provide motivation to perform better. This may be because they did not perform as well as they thought they should, they performed well and want to set another personal record, or they may be competing with other participants. As another example, consider a situation where you have been running a 5k every other week, but don’t know your time when you finish. Would you train harder if you did? What if you knew your overall placement amongst those who ran?
  • Similar to achievement, programs themselves should be evaluated. Imagine you are a strength coach and you want to demonstrate that  you are doing a great job developing your athletes. If your team is very successful on the field, court, or pitch, this may not be too much more difficult than pointing to your win-loss record. But what if your are working with a team that is very young and not yet performing to their full potential. This is precisely where demonstrating improvement in key areas that are related to competition performance could demonstrate your value to those that pay your salary.

Technology in Quantitative Analysis

Data storage and analysis.

There are many different types of technology that will be beneficial in analysis and several will be introduced in this text. Microsoft Excel and JASP will primarily be used here due to their availability and price tag (often $0), but there are many other software programs and technologies that may be useful in your future careers. Depending on the specific type of work you are doing, some programs may be better than others. Or, more than likely, you may end up using a combination of resources. Each resource has its own advantages and disadvantages. This text will make an effort to highlight those along with potential solutions for any issues.

As mentioned previously, attributes such as availability and cost are quite important for many when selecting statistical analysis software. Historically, SPSS from IBM has been the most widely used software, but that is changing. SPSS can do quite a lot, but carries a large price tag for those not affiliated with a university where they can get affordable access. Free and open source resources such as R are increasing in usage in published research as is Python in quantitative based job requirements. Meanwhile programs such as SPSS are declining in usage and desirability from potential employers. [17] [18] [19] There are many that still prefer the SPSS “point and click” style over learning coding syntax, so it will likely stick around. Many learn to use SPSS during their time as a student at a university that provides access. Once they graduate, however, they are confronted with the fact that they will need to pay for SPSS use which can be expensive (≅ $1,200/year as of 2021 [20] ). This pushes more users to options such as Excel or a coding-based solution like R and Python. JASP , a relatively new and free use product recently became available that has a similar user interface to SPSS, which many may prefer. For many of the reasons above, this text will focus on the usage of Excel and JASP. Each technique described in this text will include solutions in both programs, [21] so readers can follow the path they find most useful in their specific situations. Solution tutorials for Excel will be shown in green/teal boxes, while solutions in JASP will be shown in purple boxes (examples below).

Example MS Excel Solution Tutorial

All solutions in Excel will be in this color scheme and will have the word “Excel” somewhere in the title.

Example JASP Solution Tutorial

Data collection.

Along with data storage and analysis software, we might also be using technology in the data collection process. Take a look at the image below. Here we see an example of data collection happening in a boxing training session. Notice that the coach is viewing near real-time data on his tablet. How is this occurring? It’s not magic. In fact, many of you probably use this technological process daily. If you have a smart watch that is connected to your phone, it is continuously sending data via Bluetooth throughout the day. The same process is happening in the picture above. Each of the punching bags is instrumented with an accelerometer, which measures acceleration of the bag after it is hit, and is connected to the tablet via Bluetooth. This data is often automatically saved to a cloud storage account that can also be retrieved later. Many of our data collection instruments are now equipped with some form of telemetry (WiFi or Bluetooth) that can send the collected data directly to a storage site. Can you think of one besides your smart watch and the example on the screen?

Image of boxing coach holding a tablet displaying boxing related data, while several students are hitting punching bags in the distance.

Specifically concerning the field of kinesiology, the usage of technology and the digitization process of data has solved quite a few issues from the past. Previously, data had to be manually tabulated by hand and then transcribed into a computer for analysis. This could result in many errors when typing in the data that could negatively impact our results. Now, much of our data collection involves equipment that automatically collects the digital data for us and often saves it in the cloud. Many patient and athlete management systems utilize these methods to track progress and performance.

Actually, we could go back a couple of decades before this, when much of the analysis was also done by hand. Thankfully, we won’t have to worry about that. We can now utilize computers and software to run the analysis for us and we rarely have to recall any formulas.

Beyond directly collecting data, computers and technology can be used to collect data in other ways. Public data can be taken from websites and other sources digitally from a process known as ”web-scraping.” This can be done in MS Excel, but is more often done with coding languages such as R or Python that can more precisely pull and then reformat the data into a usable format. There are also many freely available and open databases that we can use for research purposes. Many sports organizations and leagues produce these. Many data and sport scientists are trained to retrieve and analyze much of these types data on a regular basis.

Data Tables and Spreadsheets

While data tables and spreadsheets are terms that are often used interchangeably, the are not the same thing. A data table is simply a way to organize data into rows and columns, where each intersection of a row and column is a cell and this may also be referred to as a data set. Many who use MS Excel, Google Sheets, or Apple’s Numbers may refer to this as a spreadsheet, but this is technically incorrect as spreadsheets also allow for formatting and manipulation of the data in each cell. A simple spreadsheet can be used as a data table or it may include a data table. Spreadsheet software incorporates many of the analysis processes into the same spot, which can be a benefit depending on the complexity of your analyses. If you want to go further than some of the more basic analyses, you may not be able to complete the job with products such as MS Excel. This creates a potential issue for those who have stored their data in the standard .xlsx or .xls formats in MS Excel, as many other programs cannot import the data. Fortunately, MS Excel provides many options for saving your files as different extensions that are more usable in other programs. Currently, most common among these is the .csv file extension which stands for comma separated values. If you were to open this file in a text editor, you would literally see a list of all the data with each cell separated by a comma. Unfortunately, the .csv format will not save any of the equations one might use to manipulate data, any plots, or formatting. So it is a good idea to save the data tables created in Excel as a .csv file, but also to save any analysis files in the standard format (.xlsx).

Data Table Organization

No matter what software you use to store your data, it is always a good idea to standardize the organization. While you may like a specific format at one point in time, it’s important to remember that in needs to make sense to everyone who views it and other programs may not recognize it if it’s not organized in a traditional manner. That being said, there are some best practices to organizing our data. Within a data table or dataset, we have 3 main pieces: variables, observations, and values.

  • Variables are a specific attribute being measured. These are generally set up as columns.
  • Observations are all measures on specific entities (for example, a name or date). These are generally set up as rows.
  • Values are the intersections of our variables and observations. You would consider this an individual cell in a spreadsheet. Each value is one specific measure of an attribute for a specific date or individual.

Consider the table below in Figure 1.4 that depicts some objective and subjective data on exercise intensity collected at exhaustion in a graded treadmill test. Notice that each column is a variable. So we have 3 variables, which include the subject ID, % HRmax, and RPE. We also have several observations shown as rows. Each subject has 1 ID number, 1 % HRmax value, and 1 RPE value. Speaking of values, a specific value for a given variable and observation can be found at their intersection. For example, if we want to know what subject 314159’s RPE value is we must find where they intersect. The observation is shaded in red, the variable is shaded in blue, and the value (intersection) of 17 is shaded in purple for emphasis.

a table demonstrating best practices of organization

An Important Caveat for MS Excel/Spreadsheet Users

Consider a sample of 200 university students that were enrolled in a study measuring resting heart rates during finals week. How many rows should there be? If all 200 were tested once, we should have 200 rows. One caveat to that is if you are working in MS Excel or a similar spreadsheet application, the first row is often used to name your variables. So, row 1 wouldn’t contain any data yet. This would mean you would technically have 201 rows if you had 200 observations and your first row of data would be row 2. For other programs, variable names may be included separately and the type of data will also need to be selected. Data types will be discussed in the next chapter.

When logging data for use in an analysis program, it can be perfectly straightforward for many variables like weight or height (in cm). You just type in the value. But what about gender or class? Can you just type that in as a word? Most often you can’t. Many of the analysis programs do not know how to deal with strings or words. So you might code that as a number. For example, a value of 1 might refer to freshmen, 2 might refer to sophomore, and so on. This will be discussed this further later on when segmenting data into groups is desired.

Enabling the Data Analysis Toolpak in MS Excel

Excel can handle many of the same analysis that other statistical programs can, although it’s not always as easy as the other programs. But, it is much more available than those programs, so there are tradeoffs. In order to be able to run many of these types of analysis, you will need to enable the “ Data Analysis Toolpak ” as that is not automatically available. Please refer to the Microsoft support page in order to do this, which has step by step instructions for PCs and Macs.

Enable the Data Analysis Toolpak for MS Excel

Installing JASP

If you choose to utilize a true statistical analysis software, JASP is a good option. It is free and has easy solutions for nearly all types of analyses. JASP can be installed on PC, Mac, and Linux operating systems.

Download and Install JASP

  • Bias means that we lean more towards a specific notion and it is often thought of in a negative light. From a statistical perspective, the motivation for why we think a certain way does not matter. It can be negative or positive. All that matters is that our biases could result in beliefs that are not consistent with what the data actually tell us. For example, we might think very highly of a specific person we are testing and therefore give them a slightly better score than if we did not know that person at all. This type of bias may not be considered negative in motivation, but it is negative in that we are potentially misleading ourselves and others. Whether or not we like to admit it, we all have biases and relying on quantitative data to justify our decisions may help us to avoid them or avoid making decision because of them. ↵
  • 2020. Adult Physical Inactivity Prevalence Maps by Race/Ethnicity . https://www.cdc.gov/physicalactivity/data/inactivity-prevalence-maps/index.html ↵
  • If you would like to take a more granular look at this data, please visit https://www.cdc.gov/physicalactivity/data/inactivity-prevalence-maps/index.html . ↵
  • Riccardo Pozzo (2004) The impact of Aristotelianism on modern philosophy. CUA Press. p. 41. ↵
  • https://en.wikipedia.org/wiki/Scientific_method ↵
  • https://en.wikipedia.org/wiki/Spring_training ↵
  • Bowen L, Gross AS, Gimpel M, Bruce-Low S, Li FX. Spikes in acute:chronic workload ratio (ACWR) associated with a 5-7 times greater injury rate in English Premier League football players: a comprehensive 3-year study. Br J Sports Med. 2020 Jun;54(12):731-738. doi: 10.1136/bjsports-2018-099422. Epub 2019 Feb 21. PMID: 30792258; PMCID: PMC7285788. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7285788/ ↵
  • Bowen L, Gross AS, Gimpel M, Li FX. Accumulated workloads and the acute:chronic workload ratio relate to injury risk in elite youth football players. Br J Sports Med. 2017 Mar;51(5):452-459. doi: 10.1136/bjsports-2015-095820. Epub 2016 Jul 22. PMID: 27450360; PMCID: PMC5460663. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5460663/ ↵
  • Current US Population as checked in 2021. https://www.census.gov/popclock/ ↵
  • https://en.wikipedia.org/wiki/Intelligence_quotient#Precursors_to_IQ_testing ↵
  • Flynn Effect. https://en.wikipedia.org/wiki/Flynn_effect ↵
  • Morrow, J., Mood, D., Disch, J., and Kang, M. 2016. Measurement and Evaluation in Human Performance. Human Kinetics. Champaign, IL. ↵
  • Gabbett TJ. The training-injury prevention paradox: should athletes be training smarter and harder? Br J Sports Med. 2016 Mar;50(5):273-80. doi: 10.1136/bjsports-2015-095788. Epub 2016 Jan 12. PMID: 26758673; PMCID: PMC4789704. ↵
  • Bourdon PC, Cardinale M, Murray A, Gastin P, Kellmann M, Varley MC, Gabbett TJ, Coutts AJ, Burgess DJ, Gregson W, Cable NT. Monitoring Athlete Training Loads: Consensus Statement. Int J Sports Physiol Perform. 2017 Apr;12(Suppl 2):S2161-S2170. doi: 10.1123/IJSPP.2017-0208. PMID: 28463642. ↵
  • Eckard TG, Padua DA, Hearn DW, Pexa BS, Frank BS. The Relationship Between Training Load and Injury in Athletes: A Systematic Review. Sports Med. 2018 Aug;48(8):1929-1961. doi: 10.1007/s40279-018-0951-z. Erratum in: Sports Med. 2020 Jun;50(6):1223. PMID: 29943231. ↵
  • Morrow et al. (2016) also include Diagnosis as a function of quantitative analysis, but that is not included here as most professionals in human performance and kinesiology do not possess the the authority to diagnose. They may be asked to perform a test and those result may help diagnose an issue, but diagnosis is generally reserved to those practicing medicine. ↵
  • http://r4stats.com/2014/08/20/r-passes-spss-in-scholarly-use-stata-growing-rapidly/ ↵
  • http://r4stats.com/2019/04/01/scholarly-datasci-popularity-2019/ ↵
  • https://lindeloev.net/spss-is-dying/ ↵
  • https://www.ibm.com/products/spss-statistics/pricing ↵
  • When possible. There are some instances when MS Excel does not have the capability to run the same analyses as JASP. ↵
  • Wickham, H. (2014). Tidy Data. Journal of Statistical Software. https://www.jstatsoft.org/article/view/v059i10/ ↵

how well scores represent the variable they are supposed to; or how well the measurement measures what it is supposed to.

refers to the consistency of data. Often includes various types: test-retest (across time), between raters (interrater), within rater (intrarater), or internal consistency (across items).

evidence that is collected by personal experiences and not in systematic manner. Most often considered of lower value in scientific occupations.

a research and scientific-based guess to answer a specific question or phenomenon

includes every single member of a specific group

Variable of interest measured in the population

a subset of the population that should generally be representative of that population. Samples are often used when collecting data on the entire population is unrealistic.

Variable of interest measured in the sample

evidence that is collected and documented by systematic experimentation

a statement about quality that generally is decided upon after comparing other observations.

quantification of a specific quality being assessed.

a tool used to measure a specific quality

Pretest, mid-test, or any evaluation prior to the final evaluation that helps to track changes in the quantity being measured.

Final evaluation that helps to demonstrate achievement.

compares a performance to a specific preset requirement.

compares performance(s) to the sample that the performer tested with or with a similar population.

Quantitative Analysis in Exercise and Sport Science by Chris Bailey, PhD, CSCS, RSCC is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Share This Book

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

quantitative analysis and research

Home Market Research

Quantitative Analysis: Definition, Importance + Types

Quantitative analysis (QA) is a way to determine how people act using mathematical and statistical models, measurements, and research.

Quantitative data is what we talk about when we ask questions like “How many?” or “How often?” or “How much?” Mathematical approaches can be utilized to check and evaluate the accuracy of this information in a time-efficient manner. Quantitative analysis analyzes and interprets quantitative data using numerical and statistical methods. This analysis seeks to identify data patterns, trends, and linkages to inform decisions and predictions.

Quantitative data analysis uses statistics and math to solve problems in business, finance, and risk management problems. It is an important technique that helps financial analysts, scientists, and researchers understand challenging ideas and issues.

This blog discusses quantitative analysis, its types, and techniques used in business.

What is quantitative analysis?

Quantitative analysis is a way to figure out how good an investment or asset is by using numbers and statistics. It involves using mathematical and statistical models to look at data and make decisions about investments, business operations, or other complex systems. 

Quantitative analysis aims to make better informed and more logical decisions by using data and objective analysis instead of relying on subjective judgment or intuition.

Quantitative analysts, also called “quants,” use tools and methods like statistical analysis , econometric modeling, machine learning, and computer programming to look at large amounts of data.

They could use this analysis to help them make financial decisions, like which trading strategies to use, how to handle risks, and how to divide up their assets. Quantitative analysis can be used in many fields, such as finance, economics, marketing, and political science.

Quantitative analysis can be used to answer some of the following types of questions:

  • How does the performance of an investment or portfolio compare to a benchmark or another benchmark?
  • How does a certain asset or security price change when the market or other things change?
  • How likely is it that a certain event will happen, and how might it affect how well an investment or business operation does?

This analysis uses data and objective analysis to make more informed and logical decisions rather than relying on subjective judgment or intuition.

LEARN ABOUT: Research Process Steps

Importance of quantitative analysis

Quantitative analysis plays a crucial role in various fields because it provides objective, numerical insights and supports informed decision-making. Here are some key reasons why quantitative analysis is important:

  • Objective Decision-Making

Quantitative analysis relies on data and mathematical/statistical methods, which help minimize subjectivity and bias in decision-making. This objectivity is particularly valuable when dealing with complex issues that require evidence-based conclusions.

  • Data-driven Insights

It allows researchers and analysts to extract meaningful insights from large datasets. Patterns, trends, relationships, and anomalies that might not be apparent through qualitative methods can be uncovered using quantitative research techniques. 

  • Comparison and Benchmarking

Quantitative analysis deals with the comparison of different variables, scenarios, or strategies in a systematic and measurable way. This aids in identifying the most effective or efficient approach among options.

  • Risk Assessment and Management

It assesses and quantifies risks in various contexts, from financial markets to engineering projects. This helps understand the potential impact of different risk factors and make informed decisions to mitigate them.

  • Predictive Modeling

Many quantitative techniques, such as regression analysis and time series analysis, are used to build predictive models. These models help forecast future outcomes, allowing businesses and organizations to plan ahead and make proactive decisions.

  • Resource Allocation

Quantitative analysis assists in optimizing the allocation of resources, whether it’s allocating budgets, manpower, or time. Organizations can make efficient use of their resources by understanding the relationships between variables.

  • Performance Evaluation

Evaluating the performance of investments, projects, or products in fields like business and finance is crucial. It provides a structured way to assess whether goals and targets are being met.

  • Evidence in Research

Quantitative analysis provides empirical evidence to support or refute hypotheses in scientific research. It helps establish causation and correlation relationships by analyzing data objectively.

  • Quality Control and Assurance

Quantitative methods are often used to monitor and control quality in manufacturing and production processes. Statistical process control helps detect deviations from expected norms and ensures consistent product quality.

  • Policy Formulation

Quantitative models inform policy decisions by providing data-driven insights into the potential impact of different policy options. This is essential in areas such as economics, public health, and social sciences.

  • Market Research and Consumer Behavior

In marketing, quantitative analysis helps understand consumer behavior, preferences, and trends. It assists businesses in tailoring their products and marketing strategies to target audiences effectively.

  • Validation and Verification

In engineering and computer science fields, quantitative analysis is used to validate and verify designs, simulations, and software systems. This ensures that products and systems meet predefined specifications and standards.

LEARN ABOUT: Data Asset Management

Types of quantitative analysis

Despite the fact that quantitative analysis involves numbers, there are various approaches to this type of analysis. This analysis comes in a variety of types, including:

1. Regression Analysis

Regression analysis is a typical form used by statisticians, economists, company owners, and other professionals. It entails making predictions or estimating the effects of one variable on another using statistical equations.

For example, it can reveal how interest rates impact customers’ asset investment decisions. Establishing the impact of education and work experience on employees’ annual salaries is another essential use of regression analysis.

Business owners can use regression analysis to ascertain how advertising costs affect revenue. This method allows a business owner to determine whether there is a positive or negative correlation between two factors.

2. Linear programming

Most businesses occasionally have a scarcity of resources, including office space, equipment for production, and manpower. Company managers must devise strategies to deploy resources wisely in such circumstances.

A quantitative research approach that specifies how to arrive at such an ideal solution is linear programming. It is also used to assess, under a set of limitations, such as labor, how a business may maximize earnings and cut expenditures.

3. Data Mining

Data mining combines statistical techniques with computer programming knowledge. As the variety and size of available data sets increase, data mining’s popularity also rises.

Many data are analyzed using mining techniques to look for hidden patterns or relationships.

LEARN ABOUT:  Data Mining Techniques

Difference between quantitative analysis and qualitative analysis?

The choice between quantitative and qualitative analysis depends on the research question, objectives, available data, and desired insights. The key differences between qualitative and quantitative analysis:

Quantitative analysis is figuring out how well a business is doing using math and statistics. Before this analysis was invented, many company directors made decisions based on their experience and gut feelings. 

Business owners can now use quantitative analysis techniques to predict trends, decide how to use resources, and manage projects. Quantitative analysis types and methods are also used to evaluate investments. This way, organizations can determine which assets to invest in and when to do it.

The QuestionPro Research Suite is a set of tools that can help with quantitative analysis by giving a place to run and look at surveys. It lets you make and send out surveys to get information from many people. It also gives you various tools for analyzing data and making reports.

If you want to see a demo or find out more, you can get a free trial from QuestionPro.

FREE TRIAL         LEARN MORE

MORE LIKE THIS

customer experience automation

Customer Experience Automation: Benefits and Best Tools

Apr 1, 2024

market segmentation tools

7 Best Market Segmentation Tools in 2024

in-app feedback tools

In-App Feedback Tools: How to Collect, Uses & 14 Best Tools

Mar 29, 2024

Customer Journey Analytics Software

11 Best Customer Journey Analytics Software in 2024

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence
  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Quantifying possible...

Quantifying possible bias in clinical and epidemiological studies with quantitative bias analysis: common approaches and limitations

  • Related content
  • Peer review

Article usage

Article metrics provide readers and authors with an indication of how often a specific article has been accessed month by month. It counts three formats - abstract/extract, full text and pdf. The page is updated each day.

Usage statistics:

quantitative analysis and research

Introduction to Empirical Analysis and Quantitative Methods

This course is an introduction to the methods employed in empirical political science research. We will cover basic topics in research design, statistics, and formal modeling, considering many examples along the way. The two primary goals of the course are: (1) to provide students with analytic tools that will help them to understand how political scientists do empirical research, and (2) to improve students' ability to pose and answer research questions on their own. There are no prerequisites.

Note: Course description is from Fall 2013

FALL 2024 Lecture will be delivered Synchronously, ONLINE only. All Discussion sections will be in-person.

  • Search Search Please fill out this field.

What Is Quantitative Analysis?

  • Quantitative vs. Qualitative

Risk Reduction

  • Pros and Cons

The Bottom Line

  • Quantitative Analysis

Quantitative Analysis: A Simple Overview

Gordon Scott has been an active investor and technical analyst or 20+ years. He is a Chartered Market Technician (CMT).

quantitative analysis and research

Quantitative analysis (also known as quant analysis or QA) in finance is an approach that emphasizes mathematical and statistical analysis to help determine the value of a financial asset, such as a stock or option. Quantitative trading analysts (also known as " quants ") use a variety of data to develop trading algorithms and computer models, including historical investment and stock market data.

The information generated by these computer models helps investors analyze investment opportunities and develop what they believe will be a successful trading strategy . Typically, this trading strategy will include very specific information about entry and exit points , the expected risk of the trade, and the expected return.

The ultimate goal of financial quantitative analysis is to use quantifiable statistics and metrics to assist investors in making profitable investment decisions. In this article, we review the history of quantitative investing, compare it to qualitative analysis , and provide an example of a quant-based strategy in action.

Key Takeaways

  • Quantitative analysis emerged from the rise of the computer era, which made it easier than ever before to analyze huge amounts of data in short amounts of time.
  • Quantitative trading analysts (quants) identify trading patterns, build models to assess those patterns, and make predictions about the price and direction of securities.
  • Once the models are built and the information is gathered, quants use the data to set up automated trades of securities.
  • Quantitative analysis is different from qualitative analysis, which looks at non-statistical aspects of a company to make predictions.
  • Quantitative analysis can be used to mitigate risk by identifying which investments provide the best level of return relative to an investor's preferred level of risk.

Origins of Quant Investing

Nobel Prize-winning economist Harry Markowitz is generally credited with beginning the quantitative investment movement when he published “Portfolio Selection” in the Journal of Finance in March 1952.   Markowitz introduced modern portfolio theory (MPT), which showed investors how to construct a diversified portfolio of assets capable of maximizing returns for various risk levels. Markowitz used math to quantify diversification and is cited as an early adopter of the concept that mathematical models could be applied to investing.

Robert Merton, a pioneer in modern financial theory, won a Nobel Prize for his research into mathematical methods for pricing derivatives .   The work of Markowitz and Merton laid the foundation for the quantitative (quant) approach to investing.

Quantitative vs. Qualitative Analysis

Unlike traditional qualitative investment analysts , quants don’t visit companies, meet the management teams, or research the products the firms sell to identify a competitive edge. They often don’t know or care about the qualitative aspects of the companies they invest in or the products or services these companies provide. Instead, they rely purely on math to make investment decisions.

Quants—who frequently have a scientific background and a degree in statistics or math—will use their knowledge of computers and programming languages to build customized trading systems that automate the trading process. The inputs to their programs might range from key financial ratios (such as the price-to-earnings ratio ) to more complex calculations, such as discounted cash flow (DCF) valuations.

Hedge fund managers embraced the methodology. Advances in computing technology further advanced the field, allowing complex algorithms could be calculated in the blink of an eye and creating automated trading strategies. The field flourished during the dotcom boom and bust .

Quant strategies stumbled in the Great Recession as they failed to account for the impact mortgage-backed securities had on the market and economy as a whole. However, quant strategies remain in use today and have gained notable attention for their role in high-frequency trading (HFT), which relies on math to make trading decisions.

Quantitative investing is also widely practiced both as a stand-alone discipline and in conjunction with traditional qualitative analysis for both return enhancement and risk mitigation.

Quantitative analysts don't look at who manages a company, what its balance sheet looks like, what products it makes, or any other qualitative factor. They focus entirely on the numbers and choose the investment that, mathematically speaking, offers the best return for the lowest level of risk.

Data Used in Quantitative Analysis

The rise of the computer era made it possible to crunch enormous volumes of data in extraordinarily short periods of time. This has led to increasingly complex quantitative trading strategies, as traders seek to identify consistent patterns, model those patterns, and use them to predict price movements in securities.

Quants implement their strategies using publicly available data. The identification of patterns enables them to set up automatic triggers to buy or sell securities.

For example, a trading strategy based on trading volume patterns may have identified a correlation between trading volume and prices. So if the trading volume on a particular stock rises when the stock’s price hits $25 per share and drops when the price hits $30, a quant might set up an automatic buy at $25.50 and an automatic sell at $29.50.

Similar strategies can be based on earnings, earnings forecasts , earnings surprises, and a host of other factors. In each case, pure quant traders don’t care about the company’s sales prospects, management team, product quality, or any other aspect of its business. They are placing their orders to buy and sell based strictly on the numbers accounted for in the patterns they have identified.

Quantitative analysis can be used to identify patterns that may lend themselves to profitable security trades, but that isn’t its only value. While making money is a goal every investor can understand, quantitative analysis can also be used to reduce risk.

The pursuit of so-called “risk-adjusted returns” involves comparing risk measures such as alpha, beta, r-squared, standard deviation, and the Sharpe ratio to identify the investment that will deliver the highest level of return for the given level of risk. The idea is that investors should take no more risk than is necessary to achieve their targeted level of return.

So if the data reveals that two investments are likely to generate similar returns, but that one will be significantly more volatile in terms of up and down price swings, quants (and common sense) would recommend the less risky investment.

Risk-parity portfolios are an example of quant-based strategies in action. The basic concept involves making asset allocation decisions based on market volatility . When volatility declines, the level of risk-taking in the portfolio goes up. When volatility increases, the level of risk-taking in the portfolio goes down .

Example of Quantitative Analysis

To make the example a little more realistic, consider a portfolio that divides its assets between cash and an S&P 500 index fund . Using the Chicago Board Options Exchange Volatility Index ( VIX ) as a proxy for stock market volatility, when volatility rises, our hypothetical portfolio would shift its assets toward cash.

When volatility declines, our portfolio would shift assets to the S&P 500 index fund. Models can be significantly more complex than the one we reference here, perhaps including stocks, bonds, commodities, currencies, and other investments, but the concept remains the same.

Pros and Cons of Quant Trading

Like any trading strategy, quantitative analysis offers both advantages and disadvantages.

  • Unemotional : In quant trading, the patterns and numbers are all that matter. It is an effective buy-sell discipline, as it can be executed consistently, unhindered by the emotion that is often associated with financial decisions.
  • Cost-effective : Firms that rely on quant strategies don't need to hire large teams of analysts and portfolio managers or travel to assess potential investments. They use computers to analyze the data and execute the trades.

Disadvantages

  • Vulnerable to manipulated data : Quant analysis involves culling through vast amounts of data. Choosing the right data is by no means a guarantee, just as trading patterns that appear to suggest certain outcomes may work perfectly until they don’t. Even when a pattern appears to work, validating the patterns can be a challenge.
  • Qualitative factors matter : Inflection points , such as the stock market downturn of 2008-09, can be tough on these strategies, as patterns can change suddenly. Humans can see a scandal or management change as it is developing, while a purely mathematical approach cannot necessarily do so.
  • Widely used : A strategy becomes less effective as an increasing number of investors attempt to employ it. Patterns that work will become less effective as more and more investors try to profit from them.

What Is Quant Finance?

Quant finance, short for quantitative finance, is using large datasets and mathematical models to analyze patterns in financial markets. It is used by traders to make predictions about how markets will behave, then buy or sell securities based on those predictions.

What Is a Quant?

Quants or quant traders are traders who use quantitative analysis to analyze financial markets and make trading decisions.

What Is the Difference Between Quantitative Analysis and Qualitative Analysis?

Quantitative analysis uses statistical models to make predictions or reach conclusions based solely on things that can be measured. Qualitative analysis makes predictions using subjective, non-numerical data, such as opinions, attitudes, or experiences.

Many investment strategies use a blend of both quantitative and qualitative strategies. They use quant strategies to identify potential investments and then use qualitative analysis to take their research efforts to the next level in identifying the final investment.

They may also use qualitative insight to select investments and quant data for risk management . While both quantitative and qualitative investment strategies have their proponents and their critics, the strategies do not need to be mutually exclusive.

Cowles Foundation for Research in Economics at Yale University. " Portfolio Selection, Efficient Diversification of Investments ."

CFA Institute Research Foundation. " Robert C. Merton and the Science of Finance ," Page 1.

quantitative analysis and research

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

Internal factors promoting research collaboration problems: an input-process-output analysis

  • Published: 02 April 2024

Cite this article

  • Malte Hückstädt   ORCID: orcid.org/0000-0002-0185-4230 1 &
  • Luca M. Leisten 2  

Research collaborations are crucial for scientific progress, but their success is often compromised by internal collaboration problems. While previous work is often small-scaled and largely based on case studies and qualitative work, we present a large-scale, quantitative, and representative study to investigate important drivers behind research collaboration problems in various disciplines. Based on an input-process-output framework and with a focus on research clusters, we investigated the occurrence of four crucial research collaboration problems: fairness , commitment , difference , and cohesion problems. Based on a sample of 5.306 researchers, we identified several input and process variables that could reduce collaboration problems in research collaborations including gender heterogeneity, conflict mediation by a cluster’s spokesperson, the synthesis of results, and the collaborative development of common goals. We discuss that these problems are often rooted in the science system itself and provide important guidelines and implications for stakeholders, funding bodies, and involved researchers on how to reduce collaboration problems in research collaborations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

quantitative analysis and research

This selection was made for two reasons: Our study focuses on the origin of research collaboration problems that arise at the cluster level of RC (Fig.  2 ). The additional consideration of collaboration problems occurring within sub-projects would have (1) led to an over-complexity of our survey. Furthermore, (2) the generation of a representative sample of all scientific staff, doctoral students, and postdocs of the DFG research clusters would not have been possible without disproportionately high effort.

DFG RC differ from normal (author-)teams in several respects: They are highly institutionalised and have formalised memberships, goals and purposes. RC of the DFG are furthermore organised in a project form, are financed by third-party funding, often have long and fixed terms, fixed organisational structures and are designed to promote and enable extensive research in specific areas (for a more detailed overview see: Defila et al., 2008 ; Torka, 2012 ).

For an overview and more detailed description of the variables used in our four analytical models, see Table A3 in the Appendix.

Translated by the authors.

The n = 5.306 PIs and Spokesperson are clustered into n = 948 research collaborations and are thus not statistically independent of each other.

To reduce complexity, only those hypotheses that are based on a significant effect are reported. Details on insignificant effects and the corresponding (unconfirmed) hypotheses as well as indirect effects of the four structural equation models can be found in Fig.  3 and Table A1 in the Appendix.

In the course of the model specification, we have assumed that the coherent use of terms across sub-projects is correlated with the use of common theories (Defila et al., 2006 ).

In the course of the model specification, following Loibl ( 2005 ) and Defila et al. ( 2006 ), we assumed that conflicts arising in the context of content-related decisions are correlated with resource conflicts.

Abramo, G., D’Angelo, C. A., & Murgia, G. (2013). Gender differences in research collaboration. Journal of Informetrics, 7 (4), 811–822. https://doi.org/10.1016/j.joi.2013.07.002

Article   Google Scholar  

Anderson, N., Brodeck, F. C., & West, M. A. (2000). The team climate inventory: Manual and validation of the German version - WOP Working Paper No. 2000/2. Heidelberg: Hogrefe

Bagshaw, D., Lepp, M., & Zorn, C. R. (2007). International research collaboration: Building teams and managing conflicts. Conflict Resolution Quarterly, 24 (4), 433–446. https://doi.org/10.1002/crq.183

Barrick, M. R., Stewart, G. L., Neubert, M. J., & Mount, M. K. (1998). Relating member ability and personality to work-team processes and team effectiveness. Journal of Applied Psychology, 83 (3), 377–391. https://doi.org/10.1037/0021-9010.83.3.377

Baurmann, M., & Vowe, G. (2014). “Governing the research club: Wie lassen sich Kooperationsprobleme in Forschungsverbünden lösen?” Forschung Politik - Strategie - Management, 2 , 73–84.

Google Scholar  

Beal, D. J., Cohen, R. R., Burke, M. J., & McLendon, C. L. (2003). Cohesion and performance in groups: A meta-analytic clarification of construct relations. Journal of Applied Psychology, 88 (6), 989–1004. https://doi.org/10.1037/0021-9010.88.6.989

Becher, T., & Trowler, P. (2001). Academic tribes and territories: Intellectual enquiry and the culture of disciplines (2nd ed.). Open University Press.

Blanckenburg, C., Birgit, B., Hans-Liudger, D., & Heiner, L. (2005). Leitfaden für interdisziplinäre Forschergruppen: Projekte initiieren - Zusammenarbeit gestalten . Edited by Hans-Liudger Dienel and Susanne Schön. Stuttgart: Steiner

Bozeman, B., & Gaughan, M. (2011). How do men and women differ in research collaborations? An analysis of the collaborative motives and strategies of academic researchers. Research Policy, 40 (10), 1393–1402. https://doi.org/10.1016/j.respol.2011.07.002

Bozeman, B., Gaughan, M., Youtie, J., Slade, C. P., & Rimes, H. (2016). Research collaboration experiences, good and bad: dispatches from the front lines. Science and Public Policy, 43 (2), 226–244. https://doi.org/10.1093/scipol/scv035

Bozeman, B., & Youtie, J. L. (2017). The strength in numbers: The new science of team science . Princeton University Press.

Book   Google Scholar  

Brown, R., Werbeloff, L., & Raven, R. (2019). Interdisciplinary research and impact. Global Challenges, 3 (4), 1970041. https://doi.org/10.1002/gch2.201970041

Chompalov, I., Genuth, J., & Shrum, W. (2002). The organization of scientific collaborations. Research Policy, 31 (5), 749–767. https://doi.org/10.1016/S0048-7333(01)00145-7

Corley, E. A., Craig Boardman, P., & Bozeman, B. (2006). Design and the management of multi-institutional research collaborations: Theoretical implications from two case studies. Research Policy, 35 (7), 975–993. https://doi.org/10.1016/j.respol.2006.05.003

Defila, R., Di Antonietta, G., & Michael, S. (2006). Forschungsverbundmanagement: Handbuch für die Gestaltung inter- und transdisziplinärer Projekte . Hochschulverlag.

Defila, R., Di Antonietta, G., & Michael, S. (2008). Management von Forschungsverbünden: Möglichkeiten der Professionalisierung und Unterstützung . Wiley-VCH.

Derry, S. J., Gernsbacher, M. A., & Schunn, C. D. (2005). Interdisciplinary collaboration: An emerging cognitive science . Lawrence Erlbaum.

Edelenbos, J., Bressers, N., & Vandenbussche, L. (2017). Evolution of interdisciplinary collaboration: What are stimulating conditions? Science and Public Policy, 44 (4), 451–463. https://doi.org/10.1093/scipol/scw035

Frost-Arnold, K. (2013). Moral trust & scientific collaboration. Studies in History and Philosophy of Science Part A, 44 (3), 301–310. https://doi.org/10.1016/j.shpsa.2013.04.002

German Research Foundation. (2010). Guideline research centres. https://www.dfg.de/formulare/67_10e/67_10e.pdf

German Research Foundation. (2015). Guideline priority programmes. https://www.dfg.de/formulare/50_05/50_05_en.pdf

German Research Foundation. (2019). Guideline clusters of excellence. https://www.dfg.de/en/research_funding/programmes/excellence_initiative/clusters_excellence/

German Research Foundation. (2020a). Excellence strategy. https://www.dfg.de/en/research_funding/excellence_strategy/index.html

German Research Foundation. (2020b). Facts and figures. https://www.dfg.de/en/dfg_profile/facts_figures/index.html

German Research Foundation. (2020c). Guideline collaborative research centres. https://www.dfg.de/formulare/50_06/50_06_en.pdf

German Research Foundation. (2021a). 2021 in Numbers. https://www.dfg.de/en/dfg_profile/facts_figures/statistics/dfg_in_numbers/index.html

German Research Foundation. (2021b). GEPRIS. https://gepris.dfg.de/gepris/OCTOPUS

German Research Foundation. (2021c). Guideline research units. https://www.dfg.de/formulare/50_04/50_04_en.pdf

German Research Foundation. (2022). Subject areas of the German research foundation. https://www.dfg.de/en/dfg_profile/statutory_bodies/review_boards/subject_areas/index.jsp

Hall, K. L., Vogel, A., Huang, G., Serrano, K., Rice, E., Tsakraklides, S., & Fiore, S. (2018). The science of team science: A review of the empirical evidence and research gaps on collaboration in science. American Psychologist, 73 , 532–548. https://doi.org/10.1037/amp0000319

Hendren, C. O., & Sharon Tsai-Hsuan, K. (2019). The interdisciplinary executive scientist: Connecting scientific ideas, resources and people. In K. L. Hall, A. L. Vogel, & R. T. Croyle (Eds.), Strategies for team science success: Handbook of evidence-based principles for cross-disciplinary science and practical lessons learned from health researchers (pp. 363–373). Springer.

Chapter   Google Scholar  

Huang, S., Chen, J., Mei, L., & Mo, W. (2019). The effect of heterogeneity and leadership on innovation performance: evidence from university research teams in China. Sustainability, 11 (16), 4441. https://doi.org/10.3390/su11164441

Hückstädt, M. (2022). Coopetition between frenemiesinterrelations and effects of seven collaboration problems in research clusters. Scientometrics, 127 , 5191–5224. https://doi.org/10.1007/s11192-022-04472-w

Hückstädt, M. (2023). Ten reasons why research collaborations succeed—a random forest approach. Scientometrics, 128 (3), 1923–1950. https://doi.org/10.1007/s11192-022-04629-7

Hückstädt, M., Jungbauer-Gans, M., & Kleimann, B. (2023). Quantitative partial survey of the project DEKiF. Dataset . https://doi.org/10.21249/DZHW:decquant:1.0.0

Hülsheger, U., Anderson, N., & Salgado, J. (2009). Team-level predictors of innovation at work: A comprehensive meta-analysis spanning three decades of research. The Journal of Applied Psychology, 94 , 1128–1145. https://doi.org/10.1037/a0015978

Jehn, K. A., & Shah, P. P. (1997). Interpersonal relationships and task performance: An examination of mediation processes in friendship and acquaintance groups. Journal of Personality and Social Psychology, 72 (4), 775–790. https://doi.org/10.1037/0022-3514.72.4.775

John, M. (2019). Management interdisziplinärer Forschungsverbünde: Institutionelle Bedingungen nachhaltiger Kooperation in der Medizin . Springer Gabler.

Joshi, A. (2014). By whom and when is women’s expertise recognized? The interactive effects of gender and education in science and engineering teams. Administrative Science Quarterly, 59 (2), 202–239. https://doi.org/10.1177/0001839214528331

Kerr, N. L. (1983). Motivation losses in small groups: A social dilemma analysis. Journal of Personality and Social Psychology, 45 (4), 819–828. https://doi.org/10.1037/0022-3514.45.4.819

Kleimann, B., Annett, D., Sebastian, N., Nick, W., & Winde, M. (2019). Kooperationsgovernance—Herausforderungen bei der Organisation und Gestaltung kooperativer Wissenschaft. Diskussionspapier 1. Stifterverband für die Deutsche Wissenschaft e.V.; Future Lab Diskussionspapier 1

Klein, J. T. (2005). Interdisciplinary teamwork: The dynamics of collaboration and integration. In S. J. Derry, C. D. Schunn, & M. A. Gernsbacher (Eds.), Interdisciplinary collaboration: An emerging cognitive science (pp. 23–50). Lawrence Erlbaum.

Kline, R. B. (2016). Principles and practice of structural equation modeling (4th ed.). The Guilford Press.

König, B., Diehl, K., Tscherning, K., & Helming, K. (2013). A framework for structuring interdisciplinary research management. Research Policy, 42 (1), 261–272. https://doi.org/10.1016/j.respol.2012.05.006

Kozlowski, S., & Bell, B. S. (2019). Evidence-based principles and strategies for optimizing team functioning and performance in science teams. In K. L. Hall, A. L. Vogel, & R. T. Croyle (Eds.), Strategies for team science success: Handbook of evidence-based principles for cross-disciplinary science and practical lessons learned from health researchers (pp. 269–293). Springer.

Kuhlmann, S., Ulrich, S., & Thomas, H. (2003). Governance der Kooperation heterogener Partner im deutschen Forschungs-und Innovationssystem—Fraunhofer ISI Institute Systems and Innovation Research. https://www.isi.fraunhofer.de/content/dam/isi/dokumente/cci/innovation-systems-policy-analysis/2003/discussionpaper_01_2003.pdf

Loibl, M. C. (2005). Spannungen in Forschungsteams: Hintergründe und Methoden zum konstruktiven Abbau von Konflikten in inter- und transdisziplinären Projekten . Carl-Auer-Systeme.

Lumley, T. (2010). Complex surveys: A guide to analysis using R . John Wiley.

Lungeanu, A., Huang, Y., & Contractor, N. S. (2014). Understanding the assembly of interdisciplinary teams and its impact on performance. Journal of Informetrics, 8 (1), 59–70. https://doi.org/10.1016/j.joi.2013.10.006

Mayrose, I., & Freilich, S. (2015). The interplay between scientific overlap and cooperation and the resulting gain in co-authorship interactions. PLoS ONE, 10 (9), e0137856. https://doi.org/10.1371/journal.pone.0137856

McGrath, J. E. (1964). Social psychology: A brief introduction . Holt, Rinehart and Winston.

Meißner, F., Weinmann, C., & Vowe, G. (2022). Understanding and addressing problems in research collaboration: A qualitative interview study from a self-governance perspective. Frontiers in Research Metrics and Analytics . https://doi.org/10.3389/frma.2021.778176

Munzert, S., Rubba, C., Meißner, P., & Nyhuis, D. (2014). Automated data collection with R: A practical guide to web scraping and text mining . John Wiley & Sons, Ltd.

Muthén, L. K., & Bengt, M. (2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA. https://www.statmodel.com/download/usersguide/MplusUserGuideVer_8.pdf

Nurius, P., & Kemp, S. (2019). Individual-level competencies for team collaboration with cross-disciplinary researchers and stakeholders. In K. L. Hall, A. L. Vogel, & R. T. Croyle (Eds.), Strategies for team science success (pp. 171–187). Springer.

Olechnicka, A., Ploszaj, A., & Celinska-Janowicz, D. (2019). The geography of scientific collaboration . Routledge.

Rutting, L., Post, G., de Roo, M., Blad, S., & de Greef, L. (2016). An introduction to interdisciplinary research: Theory and practice . Amsterdam University Press.

Salazar, M. R., Widmer, K., Doiron, K., & Lant, T. K. (2019). Leader integrative capabilities: A catalyst for effective interdisciplinary teams. In K. L. Hall, A. L. Vogel, & R. T. Croyle (Eds.), Strategies for team science success: Handbook of evidence-based principles for cross-disciplinary science and practical lessons learned from health researchers (pp. 313–328). Springer.

Shrum, W., Genuth, J., & Chompalov, I. (2007). Structures of scientific collaboration . MIT Press.

Simon, D. (2019). Handbook on science and public policy . Edward Elgar Publishing.

Steinheider, B., Bayerl, P. S., Menold, N., & Bromme, R. (2009). Entwicklung und Validierung einer Skala zur Erfassung von Wissensintegrationsproblemen in interdisziplinären Projektteams (WIP). Zeitschrift Für Arbeits- Und Organisationspsychologie a&o, 53 (3), 121–130. https://doi.org/10.1026/0932-4089.53.3.121

Stokols, D., Fuqua, J., Gress, J., Harvey, R., Phillips, K., Baezconde-Garbanati, L., Unger, J., et al. (2003). Evaluating transdisciplinary science. Nicotine & Tobacco Research, 5 (1), 21–39. https://doi.org/10.1080/14622200310001625555

Stokols, D., Misra, S., Moser, R. P., Hall, K. L., & Taylor, B. K. (2008). The ecology of team science. American Journal of Preventive Medicine, 35 (2), 96–115. https://doi.org/10.1016/j.amepre.2008.05.003

Sweeney, J. W. (1974). Altruism, the free rider problem and group size. Theory and Decision, 4 (3), 259–275. https://doi.org/10.1007/BF00136649

Thomson, A. M., & Perry, J. L. (2006). Collaboration processes: Inside the black box. Public Administration Review, 66 (s1), 20–32. https://doi.org/10.1111/j.1540-6210.2006.00663.x

Torka, M. (2012). Neue arbeitsweisen: Projekte und vernetzungen. In S. Maasen, M. Kaiser, M. Reinhart, & B. Sutter (Eds.), Handbuch wissenschaftssoziologie (pp. 329–340). Springer.

Twyman, M., & Contractor, N. (2019). Team assembly. In K. L. Hall, A. L. Vogel, & R. T. Croyle (Eds.), Strategies for team science success: Handbook of evidence-based principles for cross-disciplinary science and practical lessons learned from health researchers (pp. 217–240). Springer.

Weisberg, H. F. (2009). The total survey error approach: A guide to the new science of survey research . University of Chicago Press.

West, M. A. (2002). Sparkling fountains or stagnant ponds: An integrative model of creativity and innovation implementation in work groups: Creativity and innovation implementation. Applied Psychology, 51 (3), 355–387. https://doi.org/10.1111/1464-0597.00951

Download references

Acknowledgements

The authors would like to thank Bernd Kleimann and Judith Block for their valuable comments on earlier versions of this manuscript.

This research was supported by the German Federal Ministry of Education and Research [Grant Number M527800].

Author information

Authors and affiliations.

German Centre for Higher Education Research and Science Studies, Lange Laube 12, 30159, Hanover, Germany

Malte Hückstädt

ETH Zürich, Stampfenbachstrasse 69, 8092, Zürich, Switzerland

Luca M. Leisten

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Malte Hückstädt .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

See Figs. A1 , A2 , A3 and Tables A1 , A2 , A3 , A4 .

figure 4

Relative frequencies of the disciplinary affiliation of the PIs and spokespersons in the population and the sample

figure 5

Relative frequency of researcher’s gender and roles, RC status and their funding line in the population and in the sample

figure 6

Frequencies of the proportions of female PIs in research clusters according to subject areas of the German research foundation

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Hückstädt, M., Leisten, L.M. Internal factors promoting research collaboration problems: an input-process-output analysis. Scientometrics (2024). https://doi.org/10.1007/s11192-024-04957-w

Download citation

Received : 11 January 2023

Accepted : 25 January 2024

Published : 02 April 2024

DOI : https://doi.org/10.1007/s11192-024-04957-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Team science
  • Research collaboration
  • Collaboration problems
  • Collaboration success
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 March 2024

Predicting and improving complex beer flavor through machine learning

  • Michiel Schreurs   ORCID: orcid.org/0000-0002-9449-5619 1 , 2 , 3   na1 ,
  • Supinya Piampongsant 1 , 2 , 3   na1 ,
  • Miguel Roncoroni   ORCID: orcid.org/0000-0001-7461-1427 1 , 2 , 3   na1 ,
  • Lloyd Cool   ORCID: orcid.org/0000-0001-9936-3124 1 , 2 , 3 , 4 ,
  • Beatriz Herrera-Malaver   ORCID: orcid.org/0000-0002-5096-9974 1 , 2 , 3 ,
  • Christophe Vanderaa   ORCID: orcid.org/0000-0001-7443-5427 4 ,
  • Florian A. Theßeling 1 , 2 , 3 ,
  • Łukasz Kreft   ORCID: orcid.org/0000-0001-7620-4657 5 ,
  • Alexander Botzki   ORCID: orcid.org/0000-0001-6691-4233 5 ,
  • Philippe Malcorps 6 ,
  • Luk Daenen 6 ,
  • Tom Wenseleers   ORCID: orcid.org/0000-0002-1434-861X 4 &
  • Kevin J. Verstrepen   ORCID: orcid.org/0000-0002-3077-6219 1 , 2 , 3  

Nature Communications volume  15 , Article number:  2368 ( 2024 ) Cite this article

42k Accesses

786 Altmetric

Metrics details

  • Chemical engineering
  • Gas chromatography
  • Machine learning
  • Metabolomics
  • Taste receptors

The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine extensive chemical and sensory analyses of 250 different beers to train machine learning models that allow predicting flavor and consumer appreciation. For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 different machine learning models. The best-performing algorithm, Gradient Boosting, yields models that significantly outperform predictions based on conventional statistics and accurately predict complex food features and consumer appreciation from chemical profiles. Model dissection allows identifying specific and unexpected compounds as drivers of beer flavor and appreciation. Adding these compounds results in variants of commercial alcoholic and non-alcoholic beers with improved consumer appreciation. Together, our study reveals how big data and machine learning uncover complex links between food chemistry, flavor and consumer perception, and lays the foundation to develop novel, tailored foods with superior flavors.

Similar content being viewed by others

quantitative analysis and research

BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules

Rudraksh Tuwani, Somin Wadhwa & Ganesh Bagler

quantitative analysis and research

Sensory lexicon and aroma volatiles analysis of brewing malt

Xiaoxia Su, Miao Yu, … Tianyi Du

quantitative analysis and research

Predicting odor from molecular structure: a multi-label classification approach

Kushagra Saini & Venkatnarayan Ramanathan

Introduction

Predicting and understanding food perception and appreciation is one of the major challenges in food science. Accurate modeling of food flavor and appreciation could yield important opportunities for both producers and consumers, including quality control, product fingerprinting, counterfeit detection, spoilage detection, and the development of new products and product combinations (food pairing) 1 , 2 , 3 , 4 , 5 , 6 . Accurate models for flavor and consumer appreciation would contribute greatly to our scientific understanding of how humans perceive and appreciate flavor. Moreover, accurate predictive models would also facilitate and standardize existing food assessment methods and could supplement or replace assessments by trained and consumer tasting panels, which are variable, expensive and time-consuming 7 , 8 , 9 . Lastly, apart from providing objective, quantitative, accurate and contextual information that can help producers, models can also guide consumers in understanding their personal preferences 10 .

Despite the myriad of applications, predicting food flavor and appreciation from its chemical properties remains a largely elusive goal in sensory science, especially for complex food and beverages 11 , 12 . A key obstacle is the immense number of flavor-active chemicals underlying food flavor. Flavor compounds can vary widely in chemical structure and concentration, making them technically challenging and labor-intensive to quantify, even in the face of innovations in metabolomics, such as non-targeted metabolic fingerprinting 13 , 14 . Moreover, sensory analysis is perhaps even more complicated. Flavor perception is highly complex, resulting from hundreds of different molecules interacting at the physiochemical and sensorial level. Sensory perception is often non-linear, characterized by complex and concentration-dependent synergistic and antagonistic effects 15 , 16 , 17 , 18 , 19 , 20 , 21 that are further convoluted by the genetics, environment, culture and psychology of consumers 22 , 23 , 24 . Perceived flavor is therefore difficult to measure, with problems of sensitivity, accuracy, and reproducibility that can only be resolved by gathering sufficiently large datasets 25 . Trained tasting panels are considered the prime source of quality sensory data, but require meticulous training, are low throughput and high cost. Public databases containing consumer reviews of food products could provide a valuable alternative, especially for studying appreciation scores, which do not require formal training 25 . Public databases offer the advantage of amassing large amounts of data, increasing the statistical power to identify potential drivers of appreciation. However, public datasets suffer from biases, including a bias in the volunteers that contribute to the database, as well as confounding factors such as price, cult status and psychological conformity towards previous ratings of the product.

Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential biological activities or linking concentrations of specific compounds to sensory profiles 1 , 26 . Importantly, most previous studies focused on predicting organoleptic properties of single compounds (often based on their chemical structure) 27 , 28 , 29 , 30 , 31 , 32 , 33 , thus ignoring the fact that these compounds are present in a complex matrix in food or beverages and excluding complex interactions between compounds. Moreover, the classical statistics commonly used in sensory science 34 , 35 , 36 , 37 , 38 , 39 require a large sample size and sufficient variance amongst predictors to create accurate models. They are not fit for studying an extensive set of hundreds of interacting flavor compounds, since they are sensitive to outliers, have a high tendency to overfit and are less suited for non-linear and discontinuous relationships 40 .

In this study, we combine extensive chemical analyses and sensory data of a set of different commercial beers with machine learning approaches to develop models that predict taste, smell, mouthfeel and appreciation from compound concentrations. Beer is particularly suited to model the relationship between chemistry, flavor and appreciation. First, beer is a complex product, consisting of thousands of flavor compounds that partake in complex sensory interactions 41 , 42 , 43 . This chemical diversity arises from the raw materials (malt, yeast, hops, water and spices) and biochemical conversions during the brewing process (kilning, mashing, boiling, fermentation, maturation and aging) 44 , 45 . Second, the advent of the internet saw beer consumers embrace online review platforms, such as RateBeer (ZX Ventures, Anheuser-Busch InBev SA/NV) and BeerAdvocate (Next Glass, inc.). In this way, the beer community provides massive data sets of beer flavor and appreciation scores, creating extraordinarily large sensory databases to complement the analyses of our professional sensory panel. Specifically, we characterize over 200 chemical properties of 250 commercial beers, spread across 22 beer styles, and link these to the descriptive sensory profiling data of a 16-person in-house trained tasting panel and data acquired from over 180,000 public consumer reviews. These unique and extensive datasets enable us to train a suite of machine learning models to predict flavor and appreciation from a beer’s chemical profile. Dissection of the best-performing models allows us to pinpoint specific compounds as potential drivers of beer flavor and appreciation. Follow-up experiments confirm the importance of these compounds and ultimately allow us to significantly improve the flavor and appreciation of selected commercial beers. Together, our study represents a significant step towards understanding complex flavors and reinforces the value of machine learning to develop and refine complex foods. In this way, it represents a stepping stone for further computer-aided food engineering applications 46 .

To generate a comprehensive dataset on beer flavor, we selected 250 commercial Belgian beers across 22 different beer styles (Supplementary Fig.  S1 ). Beers with ≤ 4.2% alcohol by volume (ABV) were classified as non-alcoholic and low-alcoholic. Blonds and Tripels constitute a significant portion of the dataset (12.4% and 11.2%, respectively) reflecting their presence on the Belgian beer market and the heterogeneity of beers within these styles. By contrast, lager beers are less diverse and dominated by a handful of brands. Rare styles such as Brut or Faro make up only a small fraction of the dataset (2% and 1%, respectively) because fewer of these beers are produced and because they are dominated by distinct characteristics in terms of flavor and chemical composition.

Extensive analysis identifies relationships between chemical compounds in beer

For each beer, we measured 226 different chemical properties, including common brewing parameters such as alcohol content, iso-alpha acids, pH, sugar concentration 47 , and over 200 flavor compounds (Methods, Supplementary Table  S1 ). A large portion (37.2%) are terpenoids arising from hopping, responsible for herbal and fruity flavors 16 , 48 . A second major category are yeast metabolites, such as esters and alcohols, that result in fruity and solvent notes 48 , 49 , 50 . Other measured compounds are primarily derived from malt, or other microbes such as non- Saccharomyces yeasts and bacteria (‘wild flora’). Compounds that arise from spices or staling are labeled under ‘Others’. Five attributes (caloric value, total acids and total ester, hop aroma and sulfur compounds) are calculated from multiple individually measured compounds.

As a first step in identifying relationships between chemical properties, we determined correlations between the concentrations of the compounds (Fig.  1 , upper panel, Supplementary Data  1 and 2 , and Supplementary Fig.  S2 . For the sake of clarity, only a subset of the measured compounds is shown in Fig.  1 ). Compounds of the same origin typically show a positive correlation, while absence of correlation hints at parameters varying independently. For example, the hop aroma compounds citronellol, and alpha-terpineol show moderate correlations with each other (Spearman’s rho=0.39 and 0.57), but not with the bittering hop component iso-alpha acids (Spearman’s rho=0.16 and −0.07). This illustrates how brewers can independently modify hop aroma and bitterness by selecting hop varieties and dosage time. If hops are added early in the boiling phase, chemical conversions increase bitterness while aromas evaporate, conversely, late addition of hops preserves aroma but limits bitterness 51 . Similarly, hop-derived iso-alpha acids show a strong anti-correlation with lactic acid and acetic acid, likely reflecting growth inhibition of lactic acid and acetic acid bacteria, or the consequent use of fewer hops in sour beer styles, such as West Flanders ales and Fruit beers, that rely on these bacteria for their distinct flavors 52 . Finally, yeast-derived esters (ethyl acetate, ethyl decanoate, ethyl hexanoate, ethyl octanoate) and alcohols (ethanol, isoamyl alcohol, isobutanol, and glycerol), correlate with Spearman coefficients above 0.5, suggesting that these secondary metabolites are correlated with the yeast genetic background and/or fermentation parameters and may be difficult to influence individually, although the choice of yeast strain may offer some control 53 .

figure 1

Spearman rank correlations are shown. Descriptors are grouped according to their origin (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)), and sensory aspect (aroma, taste, palate, and overall appreciation). Please note that for the chemical compounds, for the sake of clarity, only a subset of the total number of measured compounds is shown, with an emphasis on the key compounds for each source. For more details, see the main text and Methods section. Chemical data can be found in Supplementary Data  1 , correlations between all chemical compounds are depicted in Supplementary Fig.  S2 and correlation values can be found in Supplementary Data  2 . See Supplementary Data  4 for sensory panel assessments and Supplementary Data  5 for correlation values between all sensory descriptors.

Interestingly, different beer styles show distinct patterns for some flavor compounds (Supplementary Fig.  S3 ). These observations agree with expectations for key beer styles, and serve as a control for our measurements. For instance, Stouts generally show high values for color (darker), while hoppy beers contain elevated levels of iso-alpha acids, compounds associated with bitter hop taste. Acetic and lactic acid are not prevalent in most beers, with notable exceptions such as Kriek, Lambic, Faro, West Flanders ales and Flanders Old Brown, which use acid-producing bacteria ( Lactobacillus and Pediococcus ) or unconventional yeast ( Brettanomyces ) 54 , 55 . Glycerol, ethanol and esters show similar distributions across all beer styles, reflecting their common origin as products of yeast metabolism during fermentation 45 , 53 . Finally, low/no-alcohol beers contain low concentrations of glycerol and esters. This is in line with the production process for most of the low/no-alcohol beers in our dataset, which are produced through limiting fermentation or by stripping away alcohol via evaporation or dialysis, with both methods having the unintended side-effect of reducing the amount of flavor compounds in the final beer 56 , 57 .

Besides expected associations, our data also reveals less trivial associations between beer styles and specific parameters. For example, geraniol and citronellol, two monoterpenoids responsible for citrus, floral and rose flavors and characteristic of Citra hops, are found in relatively high amounts in Christmas, Saison, and Brett/co-fermented beers, where they may originate from terpenoid-rich spices such as coriander seeds instead of hops 58 .

Tasting panel assessments reveal sensorial relationships in beer

To assess the sensory profile of each beer, a trained tasting panel evaluated each of the 250 beers for 50 sensory attributes, including different hop, malt and yeast flavors, off-flavors and spices. Panelists used a tasting sheet (Supplementary Data  3 ) to score the different attributes. Panel consistency was evaluated by repeating 12 samples across different sessions and performing ANOVA. In 95% of cases no significant difference was found across sessions ( p  > 0.05), indicating good panel consistency (Supplementary Table  S2 ).

Aroma and taste perception reported by the trained panel are often linked (Fig.  1 , bottom left panel and Supplementary Data  4 and 5 ), with high correlations between hops aroma and taste (Spearman’s rho=0.83). Bitter taste was found to correlate with hop aroma and taste in general (Spearman’s rho=0.80 and 0.69), and particularly with “grassy” noble hops (Spearman’s rho=0.75). Barnyard flavor, most often associated with sour beers, is identified together with stale hops (Spearman’s rho=0.97) that are used in these beers. Lactic and acetic acid, which often co-occur, are correlated (Spearman’s rho=0.66). Interestingly, sweetness and bitterness are anti-correlated (Spearman’s rho = −0.48), confirming the hypothesis that they mask each other 59 , 60 . Beer body is highly correlated with alcohol (Spearman’s rho = 0.79), and overall appreciation is found to correlate with multiple aspects that describe beer mouthfeel (alcohol, carbonation; Spearman’s rho= 0.32, 0.39), as well as with hop and ester aroma intensity (Spearman’s rho=0.39 and 0.35).

Similar to the chemical analyses, sensorial analyses confirmed typical features of specific beer styles (Supplementary Fig.  S4 ). For example, sour beers (Faro, Flanders Old Brown, Fruit beer, Kriek, Lambic, West Flanders ale) were rated acidic, with flavors of both acetic and lactic acid. Hoppy beers were found to be bitter and showed hop-associated aromas like citrus and tropical fruit. Malt taste is most detected among scotch, stout/porters, and strong ales, while low/no-alcohol beers, which often have a reputation for being ‘worty’ (reminiscent of unfermented, sweet malt extract) appear in the middle. Unsurprisingly, hop aromas are most strongly detected among hoppy beers. Like its chemical counterpart (Supplementary Fig.  S3 ), acidity shows a right-skewed distribution, with the most acidic beers being Krieks, Lambics, and West Flanders ales.

Tasting panel assessments of specific flavors correlate with chemical composition

We find that the concentrations of several chemical compounds strongly correlate with specific aroma or taste, as evaluated by the tasting panel (Fig.  2 , Supplementary Fig.  S5 , Supplementary Data  6 ). In some cases, these correlations confirm expectations and serve as a useful control for data quality. For example, iso-alpha acids, the bittering compounds in hops, strongly correlate with bitterness (Spearman’s rho=0.68), while ethanol and glycerol correlate with tasters’ perceptions of alcohol and body, the mouthfeel sensation of fullness (Spearman’s rho=0.82/0.62 and 0.72/0.57 respectively) and darker color from roasted malts is a good indication of malt perception (Spearman’s rho=0.54).

figure 2

Heatmap colors indicate Spearman’s Rho. Axes are organized according to sensory categories (aroma, taste, mouthfeel, overall), chemical categories and chemical sources in beer (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)). See Supplementary Data  6 for all correlation values.

Interestingly, for some relationships between chemical compounds and perceived flavor, correlations are weaker than expected. For example, the rose-smelling phenethyl acetate only weakly correlates with floral aroma. This hints at more complex relationships and interactions between compounds and suggests a need for a more complex model than simple correlations. Lastly, we uncovered unexpected correlations. For instance, the esters ethyl decanoate and ethyl octanoate appear to correlate slightly with hop perception and bitterness, possibly due to their fruity flavor. Iron is anti-correlated with hop aromas and bitterness, most likely because it is also anti-correlated with iso-alpha acids. This could be a sign of metal chelation of hop acids 61 , given that our analyses measure unbound hop acids and total iron content, or could result from the higher iron content in dark and Fruit beers, which typically have less hoppy and bitter flavors 62 .

Public consumer reviews complement expert panel data

To complement and expand the sensory data of our trained tasting panel, we collected 180,000 reviews of our 250 beers from the online consumer review platform RateBeer. This provided numerical scores for beer appearance, aroma, taste, palate, overall quality as well as the average overall score.

Public datasets are known to suffer from biases, such as price, cult status and psychological conformity towards previous ratings of a product. For example, prices correlate with appreciation scores for these online consumer reviews (rho=0.49, Supplementary Fig.  S6 ), but not for our trained tasting panel (rho=0.19). This suggests that prices affect consumer appreciation, which has been reported in wine 63 , while blind tastings are unaffected. Moreover, we observe that some beer styles, like lagers and non-alcoholic beers, generally receive lower scores, reflecting that online reviewers are mostly beer aficionados with a preference for specialty beers over lager beers. In general, we find a modest correlation between our trained panel’s overall appreciation score and the online consumer appreciation scores (Fig.  3 , rho=0.29). Apart from the aforementioned biases in the online datasets, serving temperature, sample freshness and surroundings, which are all tightly controlled during the tasting panel sessions, can vary tremendously across online consumers and can further contribute to (among others, appreciation) differences between the two categories of tasters. Importantly, in contrast to the overall appreciation scores, for many sensory aspects the results from the professional panel correlated well with results obtained from RateBeer reviews. Correlations were highest for features that are relatively easy to recognize even for untrained tasters, like bitterness, sweetness, alcohol and malt aroma (Fig.  3 and below).

figure 3

RateBeer text mining results can be found in Supplementary Data  7 . Rho values shown are Spearman correlation values, with asterisks indicating significant correlations ( p  < 0.05, two-sided). All p values were smaller than 0.001, except for Esters aroma (0.0553), Esters taste (0.3275), Esters aroma—banana (0.0019), Coriander (0.0508) and Diacetyl (0.0134).

Besides collecting consumer appreciation from these online reviews, we developed automated text analysis tools to gather additional data from review texts (Supplementary Data  7 ). Processing review texts on the RateBeer database yielded comparable results to the scores given by the trained panel for many common sensory aspects, including acidity, bitterness, sweetness, alcohol, malt, and hop tastes (Fig.  3 ). This is in line with what would be expected, since these attributes require less training for accurate assessment and are less influenced by environmental factors such as temperature, serving glass and odors in the environment. Consumer reviews also correlate well with our trained panel for 4-vinyl guaiacol, a compound associated with a very characteristic aroma. By contrast, correlations for more specific aromas like ester, coriander or diacetyl are underrepresented in the online reviews, underscoring the importance of using a trained tasting panel and standardized tasting sheets with explicit factors to be scored for evaluating specific aspects of a beer. Taken together, our results suggest that public reviews are trustworthy for some, but not all, flavor features and can complement or substitute taste panel data for these sensory aspects.

Models can predict beer sensory profiles from chemical data

The rich datasets of chemical analyses, tasting panel assessments and public reviews gathered in the first part of this study provided us with a unique opportunity to develop predictive models that link chemical data to sensorial features. Given the complexity of beer flavor, basic statistical tools such as correlations or linear regression may not always be the most suitable for making accurate predictions. Instead, we applied different machine learning models that can model both simple linear and complex interactive relationships. Specifically, we constructed a set of regression models to predict (a) trained panel scores for beer flavor and quality and (b) public reviews’ appreciation scores from beer chemical profiles. We trained and tested 10 different models (Methods), 3 linear regression-based models (simple linear regression with first-order interactions (LR), lasso regression with first-order interactions (Lasso), partial least squares regressor (PLSR)), 5 decision tree models (AdaBoost regressor (ABR), extra trees (ET), gradient boosting regressor (GBR), random forest (RF) and XGBoost regressor (XGBR)), 1 support vector regression (SVR), and 1 artificial neural network (ANN) model.

To compare the performance of our machine learning models, the dataset was randomly split into a training and test set, stratified by beer style. After a model was trained on data in the training set, its performance was evaluated on its ability to predict the test dataset obtained from multi-output models (based on the coefficient of determination, see Methods). Additionally, individual-attribute models were ranked per descriptor and the average rank was calculated, as proposed by Korneva et al. 64 . Importantly, both ways of evaluating the models’ performance agreed in general. Performance of the different models varied (Table  1 ). It should be noted that all models perform better at predicting RateBeer results than results from our trained tasting panel. One reason could be that sensory data is inherently variable, and this variability is averaged out with the large number of public reviews from RateBeer. Additionally, all tree-based models perform better at predicting taste than aroma. Linear models (LR) performed particularly poorly, with negative R 2 values, due to severe overfitting (training set R 2  = 1). Overfitting is a common issue in linear models with many parameters and limited samples, especially with interaction terms further amplifying the number of parameters. L1 regularization (Lasso) successfully overcomes this overfitting, out-competing multiple tree-based models on the RateBeer dataset. Similarly, the dimensionality reduction of PLSR avoids overfitting and improves performance, to some extent. Still, tree-based models (ABR, ET, GBR, RF and XGBR) show the best performance, out-competing the linear models (LR, Lasso, PLSR) commonly used in sensory science 65 .

GBR models showed the best overall performance in predicting sensory responses from chemical information, with R 2 values up to 0.75 depending on the predicted sensory feature (Supplementary Table  S4 ). The GBR models predict consumer appreciation (RateBeer) better than our trained panel’s appreciation (R 2 value of 0.67 compared to R 2 value of 0.09) (Supplementary Table  S3 and Supplementary Table  S4 ). ANN models showed intermediate performance, likely because neural networks typically perform best with larger datasets 66 . The SVR shows intermediate performance, mostly due to the weak predictions of specific attributes that lower the overall performance (Supplementary Table  S4 ).

Model dissection identifies specific, unexpected compounds as drivers of consumer appreciation

Next, we leveraged our models to infer important contributors to sensory perception and consumer appreciation. Consumer preference is a crucial sensory aspects, because a product that shows low consumer appreciation scores often does not succeed commercially 25 . Additionally, the requirement for a large number of representative evaluators makes consumer trials one of the more costly and time-consuming aspects of product development. Hence, a model for predicting chemical drivers of overall appreciation would be a welcome addition to the available toolbox for food development and optimization.

Since GBR models on our RateBeer dataset showed the best overall performance, we focused on these models. Specifically, we used two approaches to identify important contributors. First, rankings of the most important predictors for each sensorial trait in the GBR models were obtained based on impurity-based feature importance (mean decrease in impurity). High-ranked parameters were hypothesized to be either the true causal chemical properties underlying the trait, to correlate with the actual causal properties, or to take part in sensory interactions affecting the trait 67 (Fig.  4A ). In a second approach, we used SHAP 68 to determine which parameters contributed most to the model for making predictions of consumer appreciation (Fig.  4B ). SHAP calculates parameter contributions to model predictions on a per-sample basis, which can be aggregated into an importance score.

figure 4

A The impurity-based feature importance (mean deviance in impurity, MDI) calculated from the Gradient Boosting Regression (GBR) model predicting RateBeer appreciation scores. The top 15 highest ranked chemical properties are shown. B SHAP summary plot for the top 15 parameters contributing to our GBR model. Each point on the graph represents a sample from our dataset. The color represents the concentration of that parameter, with bluer colors representing low values and redder colors representing higher values. Greater absolute values on the horizontal axis indicate a higher impact of the parameter on the prediction of the model. C Spearman correlations between the 15 most important chemical properties and consumer overall appreciation. Numbers indicate the Spearman Rho correlation coefficient, and the rank of this correlation compared to all other correlations. The top 15 important compounds were determined using SHAP (panel B).

Both approaches identified ethyl acetate as the most predictive parameter for beer appreciation (Fig.  4 ). Ethyl acetate is the most abundant ester in beer with a typical ‘fruity’, ‘solvent’ and ‘alcoholic’ flavor, but is often considered less important than other esters like isoamyl acetate. The second most important parameter identified by SHAP is ethanol, the most abundant beer compound after water. Apart from directly contributing to beer flavor and mouthfeel, ethanol drastically influences the physical properties of beer, dictating how easily volatile compounds escape the beer matrix to contribute to beer aroma 69 . Importantly, it should also be noted that the importance of ethanol for appreciation is likely inflated by the very low appreciation scores of non-alcoholic beers (Supplementary Fig.  S4 ). Despite not often being considered a driver of beer appreciation, protein level also ranks highly in both approaches, possibly due to its effect on mouthfeel and body 70 . Lactic acid, which contributes to the tart taste of sour beers, is the fourth most important parameter identified by SHAP, possibly due to the generally high appreciation of sour beers in our dataset.

Interestingly, some of the most important predictive parameters for our model are not well-established as beer flavors or are even commonly regarded as being negative for beer quality. For example, our models identify methanethiol and ethyl phenyl acetate, an ester commonly linked to beer staling 71 , as a key factor contributing to beer appreciation. Although there is no doubt that high concentrations of these compounds are considered unpleasant, the positive effects of modest concentrations are not yet known 72 , 73 .

To compare our approach to conventional statistics, we evaluated how well the 15 most important SHAP-derived parameters correlate with consumer appreciation (Fig.  4C ). Interestingly, only 6 of the properties derived by SHAP rank amongst the top 15 most correlated parameters. For some chemical compounds, the correlations are so low that they would have likely been considered unimportant. For example, lactic acid, the fourth most important parameter, shows a bimodal distribution for appreciation, with sour beers forming a separate cluster, that is missed entirely by the Spearman correlation. Additionally, the correlation plots reveal outliers, emphasizing the need for robust analysis tools. Together, this highlights the need for alternative models, like the Gradient Boosting model, that better grasp the complexity of (beer) flavor.

Finally, to observe the relationships between these chemical properties and their predicted targets, partial dependence plots were constructed for the six most important predictors of consumer appreciation 74 , 75 , 76 (Supplementary Fig.  S7 ). One-way partial dependence plots show how a change in concentration affects the predicted appreciation. These plots reveal an important limitation of our models: appreciation predictions remain constant at ever-increasing concentrations. This implies that once a threshold concentration is reached, further increasing the concentration does not affect appreciation. This is false, as it is well-documented that certain compounds become unpleasant at high concentrations, including ethyl acetate (‘nail polish’) 77 and methanethiol (‘sulfury’ and ‘rotten cabbage’) 78 . The inability of our models to grasp that flavor compounds have optimal levels, above which they become negative, is a consequence of working with commercial beer brands where (off-)flavors are rarely too high to negatively impact the product. The two-way partial dependence plots show how changing the concentration of two compounds influences predicted appreciation, visualizing their interactions (Supplementary Fig.  S7 ). In our case, the top 5 parameters are dominated by additive or synergistic interactions, with high concentrations for both compounds resulting in the highest predicted appreciation.

To assess the robustness of our best-performing models and model predictions, we performed 100 iterations of the GBR, RF and ET models. In general, all iterations of the models yielded similar performance (Supplementary Fig.  S8 ). Moreover, the main predictors (including the top predictors ethanol and ethyl acetate) remained virtually the same, especially for GBR and RF. For the iterations of the ET model, we did observe more variation in the top predictors, which is likely a consequence of the model’s inherent random architecture in combination with co-correlations between certain predictors. However, even in this case, several of the top predictors (ethanol and ethyl acetate) remain unchanged, although their rank in importance changes (Supplementary Fig.  S8 ).

Next, we investigated if a combination of RateBeer and trained panel data into one consolidated dataset would lead to stronger models, under the hypothesis that such a model would suffer less from bias in the datasets. A GBR model was trained to predict appreciation on the combined dataset. This model underperformed compared to the RateBeer model, both in the native case and when including a dataset identifier (R 2  = 0.67, 0.26 and 0.42 respectively). For the latter, the dataset identifier is the most important feature (Supplementary Fig.  S9 ), while most of the feature importance remains unchanged, with ethyl acetate and ethanol ranking highest, like in the original model trained only on RateBeer data. It seems that the large variation in the panel dataset introduces noise, weakening the models’ performances and reliability. In addition, it seems reasonable to assume that both datasets are fundamentally different, with the panel dataset obtained by blind tastings by a trained professional panel.

Lastly, we evaluated whether beer style identifiers would further enhance the model’s performance. A GBR model was trained with parameters that explicitly encoded the styles of the samples. This did not improve model performance (R2 = 0.66 with style information vs R2 = 0.67). The most important chemical features are consistent with the model trained without style information (eg. ethanol and ethyl acetate), and with the exception of the most preferred (strong ale) and least preferred (low/no-alcohol) styles, none of the styles were among the most important features (Supplementary Fig.  S9 , Supplementary Table  S5 and S6 ). This is likely due to a combination of style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original models, as well as the low number of samples belonging to some styles, making it difficult for the model to learn style-specific patterns. Moreover, beer styles are not rigorously defined, with some styles overlapping in features and some beers being misattributed to a specific style, all of which leads to more noise in models that use style parameters.

Model validation

To test if our predictive models give insight into beer appreciation, we set up experiments aimed at improving existing commercial beers. We specifically selected overall appreciation as the trait to be examined because of its complexity and commercial relevance. Beer flavor comprises a complex bouquet rather than single aromas and tastes 53 . Hence, adding a single compound to the extent that a difference is noticeable may lead to an unbalanced, artificial flavor. Therefore, we evaluated the effect of combinations of compounds. Because Blond beers represent the most extensive style in our dataset, we selected a beer from this style as the starting material for these experiments (Beer 64 in Supplementary Data  1 ).

In the first set of experiments, we adjusted the concentrations of compounds that made up the most important predictors of overall appreciation (ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate) together with correlated compounds (ethyl hexanoate, isoamyl acetate, glycerol), bringing them up to 95 th percentile ethanol-normalized concentrations (Methods) within the Blond group (‘Spiked’ concentration in Fig.  5A ). Compared to controls, the spiked beers were found to have significantly improved overall appreciation among trained panelists, with panelist noting increased intensity of ester flavors, sweetness, alcohol, and body fullness (Fig.  5B ). To disentangle the contribution of ethanol to these results, a second experiment was performed without the addition of ethanol. This resulted in a similar outcome, including increased perception of alcohol and overall appreciation.

figure 5

Adding the top chemical compounds, identified as best predictors of appreciation by our model, into poorly appreciated beers results in increased appreciation from our trained panel. Results of sensory tests between base beers and those spiked with compounds identified as the best predictors by the model. A Blond and Non/Low-alcohol (0.0% ABV) base beers were brought up to 95th-percentile ethanol-normalized concentrations within each style. B For each sensory attribute, tasters indicated the more intense sample and selected the sample they preferred. The numbers above the bars correspond to the p values that indicate significant changes in perceived flavor (two-sided binomial test: alpha 0.05, n  = 20 or 13).

In a last experiment, we tested whether using the model’s predictions can boost the appreciation of a non-alcoholic beer (beer 223 in Supplementary Data  1 ). Again, the addition of a mixture of predicted compounds (omitting ethanol, in this case) resulted in a significant increase in appreciation, body, ester flavor and sweetness.

Predicting flavor and consumer appreciation from chemical composition is one of the ultimate goals of sensory science. A reliable, systematic and unbiased way to link chemical profiles to flavor and food appreciation would be a significant asset to the food and beverage industry. Such tools would substantially aid in quality control and recipe development, offer an efficient and cost-effective alternative to pilot studies and consumer trials and would ultimately allow food manufacturers to produce superior, tailor-made products that better meet the demands of specific consumer groups more efficiently.

A limited set of studies have previously tried, to varying degrees of success, to predict beer flavor and beer popularity based on (a limited set of) chemical compounds and flavors 79 , 80 . Current sensitive, high-throughput technologies allow measuring an unprecedented number of chemical compounds and properties in a large set of samples, yielding a dataset that can train models that help close the gaps between chemistry and flavor, even for a complex natural product like beer. To our knowledge, no previous research gathered data at this scale (250 samples, 226 chemical parameters, 50 sensory attributes and 5 consumer scores) to disentangle and validate the chemical aspects driving beer preference using various machine-learning techniques. We find that modern machine learning models outperform conventional statistical tools, such as correlations and linear models, and can successfully predict flavor appreciation from chemical composition. This could be attributed to the natural incorporation of interactions and non-linear or discontinuous effects in machine learning models, which are not easily grasped by the linear model architecture. While linear models and partial least squares regression represent the most widespread statistical approaches in sensory science, in part because they allow interpretation 65 , 81 , 82 , modern machine learning methods allow for building better predictive models while preserving the possibility to dissect and exploit the underlying patterns. Of the 10 different models we trained, tree-based models, such as our best performing GBR, showed the best overall performance in predicting sensory responses from chemical information, outcompeting artificial neural networks. This agrees with previous reports for models trained on tabular data 83 . Our results are in line with the findings of Colantonio et al. who also identified the gradient boosting architecture as performing best at predicting appreciation and flavor (of tomatoes and blueberries, in their specific study) 26 . Importantly, besides our larger experimental scale, we were able to directly confirm our models’ predictions in vivo.

Our study confirms that flavor compound concentration does not always correlate with perception, suggesting complex interactions that are often missed by more conventional statistics and simple models. Specifically, we find that tree-based algorithms may perform best in developing models that link complex food chemistry with aroma. Furthermore, we show that massive datasets of untrained consumer reviews provide a valuable source of data, that can complement or even replace trained tasting panels, especially for appreciation and basic flavors, such as sweetness and bitterness. This holds despite biases that are known to occur in such datasets, such as price or conformity bias. Moreover, GBR models predict taste better than aroma. This is likely because taste (e.g. bitterness) often directly relates to the corresponding chemical measurements (e.g., iso-alpha acids), whereas such a link is less clear for aromas, which often result from the interplay between multiple volatile compounds. We also find that our models are best at predicting acidity and alcohol, likely because there is a direct relation between the measured chemical compounds (acids and ethanol) and the corresponding perceived sensorial attribute (acidity and alcohol), and because even untrained consumers are generally able to recognize these flavors and aromas.

The predictions of our final models, trained on review data, hold even for blind tastings with small groups of trained tasters, as demonstrated by our ability to validate specific compounds as drivers of beer flavor and appreciation. Since adding a single compound to the extent of a noticeable difference may result in an unbalanced flavor profile, we specifically tested our identified key drivers as a combination of compounds. While this approach does not allow us to validate if a particular single compound would affect flavor and/or appreciation, our experiments do show that this combination of compounds increases consumer appreciation.

It is important to stress that, while it represents an important step forward, our approach still has several major limitations. A key weakness of the GBR model architecture is that amongst co-correlating variables, the largest main effect is consistently preferred for model building. As a result, co-correlating variables often have artificially low importance scores, both for impurity and SHAP-based methods, like we observed in the comparison to the more randomized Extra Trees models. This implies that chemicals identified as key drivers of a specific sensory feature by GBR might not be the true causative compounds, but rather co-correlate with the actual causative chemical. For example, the high importance of ethyl acetate could be (partially) attributed to the total ester content, ethanol or ethyl hexanoate (rho=0.77, rho=0.72 and rho=0.68), while ethyl phenylacetate could hide the importance of prenyl isobutyrate and ethyl benzoate (rho=0.77 and rho=0.76). Expanding our GBR model to include beer style as a parameter did not yield additional power or insight. This is likely due to style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original model, as well as the smaller sample size per style, limiting the power to uncover style-specific patterns. This can be partly attributed to the curse of dimensionality, where the high number of parameters results in the models mainly incorporating single parameter effects, rather than complex interactions such as style-dependent effects 67 . A larger number of samples may overcome some of these limitations and offer more insight into style-specific effects. On the other hand, beer style is not a rigid scientific classification, and beers within one style often differ a lot, which further complicates the analysis of style as a model factor.

Our study is limited to beers from Belgian breweries. Although these beers cover a large portion of the beer styles available globally, some beer styles and consumer patterns may be missing, while other features might be overrepresented. For example, many Belgian ales exhibit yeast-driven flavor profiles, which is reflected in the chemical drivers of appreciation discovered by this study. In future work, expanding the scope to include diverse markets and beer styles could lead to the identification of even more drivers of appreciation and better models for special niche products that were not present in our beer set.

In addition to inherent limitations of GBR models, there are also some limitations associated with studying food aroma. Even if our chemical analyses measured most of the known aroma compounds, the total number of flavor compounds in complex foods like beer is still larger than the subset we were able to measure in this study. For example, hop-derived thiols, that influence flavor at very low concentrations, are notoriously difficult to measure in a high-throughput experiment. Moreover, consumer perception remains subjective and prone to biases that are difficult to avoid. It is also important to stress that the models are still immature and that more extensive datasets will be crucial for developing more complete models in the future. Besides more samples and parameters, our dataset does not include any demographic information about the tasters. Including such data could lead to better models that grasp external factors like age and culture. Another limitation is that our set of beers consists of high-quality end-products and lacks beers that are unfit for sale, which limits the current model in accurately predicting products that are appreciated very badly. Finally, while models could be readily applied in quality control, their use in sensory science and product development is restrained by their inability to discern causal relationships. Given that the models cannot distinguish compounds that genuinely drive consumer perception from those that merely correlate, validation experiments are essential to identify true causative compounds.

Despite the inherent limitations, dissection of our models enabled us to pinpoint specific molecules as potential drivers of beer aroma and consumer appreciation, including compounds that were unexpected and would not have been identified using standard approaches. Important drivers of beer appreciation uncovered by our models include protein levels, ethyl acetate, ethyl phenyl acetate and lactic acid. Currently, many brewers already use lactic acid to acidify their brewing water and ensure optimal pH for enzymatic activity during the mashing process. Our results suggest that adding lactic acid can also improve beer appreciation, although its individual effect remains to be tested. Interestingly, ethanol appears to be unnecessary to improve beer appreciation, both for blond beer and alcohol-free beer. Given the growing consumer interest in alcohol-free beer, with a predicted annual market growth of >7% 84 , it is relevant for brewers to know what compounds can further increase consumer appreciation of these beers. Hence, our model may readily provide avenues to further improve the flavor and consumer appreciation of both alcoholic and non-alcoholic beers, which is generally considered one of the key challenges for future beer production.

Whereas we see a direct implementation of our results for the development of superior alcohol-free beverages and other food products, our study can also serve as a stepping stone for the development of novel alcohol-containing beverages. We want to echo the growing body of scientific evidence for the negative effects of alcohol consumption, both on the individual level by the mutagenic, teratogenic and carcinogenic effects of ethanol 85 , 86 , as well as the burden on society caused by alcohol abuse and addiction. We encourage the use of our results for the production of healthier, tastier products, including novel and improved beverages with lower alcohol contents. Furthermore, we strongly discourage the use of these technologies to improve the appreciation or addictive properties of harmful substances.

The present work demonstrates that despite some important remaining hurdles, combining the latest developments in chemical analyses, sensory analysis and modern machine learning methods offers exciting avenues for food chemistry and engineering. Soon, these tools may provide solutions in quality control and recipe development, as well as new approaches to sensory science and flavor research.

Beer selection

250 commercial Belgian beers were selected to cover the broad diversity of beer styles and corresponding diversity in chemical composition and aroma. See Supplementary Fig.  S1 .

Chemical dataset

Sample preparation.

Beers within their expiration date were purchased from commercial retailers. Samples were prepared in biological duplicates at room temperature, unless explicitly stated otherwise. Bottle pressure was measured with a manual pressure device (Steinfurth Mess-Systeme GmbH) and used to calculate CO 2 concentration. The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. Samples were then prepared for measurements by targeted Headspace-Gas Chromatography-Flame Ionization Detector/Flame Photometric Detector (HS-GC-FID/FPD), Headspace-Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS), colorimetric analysis, enzymatic analysis, Near-Infrared (NIR) analysis, as described in the sections below. The mean values of biological duplicates are reported for each compound.

HS-GC-FID/FPD

HS-GC-FID/FPD (Shimadzu GC 2010 Plus) was used to measure higher alcohols, acetaldehyde, esters, 4-vinyl guaicol, and sulfur compounds. Each measurement comprised 5 ml of sample pipetted into a 20 ml glass vial containing 1.75 g NaCl (VWR, 27810.295). 100 µl of 2-heptanol (Sigma-Aldrich, H3003) (internal standard) solution in ethanol (Fisher Chemical, E/0650DF/C17) was added for a final concentration of 2.44 mg/L. Samples were flushed with nitrogen for 10 s, sealed with a silicone septum, stored at −80 °C and analyzed in batches of 20.

The GC was equipped with a DB-WAXetr column (length, 30 m; internal diameter, 0.32 mm; layer thickness, 0.50 µm; Agilent Technologies, Santa Clara, CA, USA) to the FID and an HP-5 column (length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.25 µm; Agilent Technologies, Santa Clara, CA, USA) to the FPD. N 2 was used as the carrier gas. Samples were incubated for 20 min at 70 °C in the headspace autosampler (Flow rate, 35 cm/s; Injection volume, 1000 µL; Injection mode, split; Combi PAL autosampler, CTC analytics, Switzerland). The injector, FID and FPD temperatures were kept at 250 °C. The GC oven temperature was first held at 50 °C for 5 min and then allowed to rise to 80 °C at a rate of 5 °C/min, followed by a second ramp of 4 °C/min until 200 °C kept for 3 min and a final ramp of (4 °C/min) until 230 °C for 1 min. Results were analyzed with the GCSolution software version 2.4 (Shimadzu, Kyoto, Japan). The GC was calibrated with a 5% EtOH solution (VWR International) containing the volatiles under study (Supplementary Table  S7 ).

HS-SPME-GC-MS

HS-SPME-GC-MS (Shimadzu GCMS-QP-2010 Ultra) was used to measure additional volatile compounds, mainly comprising terpenoids and esters. Samples were analyzed by HS-SPME using a triphase DVB/Carboxen/PDMS 50/30 μm SPME fiber (Supelco Co., Bellefonte, PA, USA) followed by gas chromatography (Thermo Fisher Scientific Trace 1300 series, USA) coupled to a mass spectrometer (Thermo Fisher Scientific ISQ series MS) equipped with a TriPlus RSH autosampler. 5 ml of degassed beer sample was placed in 20 ml vials containing 1.75 g NaCl (VWR, 27810.295). 5 µl internal standard mix was added, containing 2-heptanol (1 g/L) (Sigma-Aldrich, H3003), 4-fluorobenzaldehyde (1 g/L) (Sigma-Aldrich, 128376), 2,3-hexanedione (1 g/L) (Sigma-Aldrich, 144169) and guaiacol (1 g/L) (Sigma-Aldrich, W253200) in ethanol (Fisher Chemical, E/0650DF/C17). Each sample was incubated at 60 °C in the autosampler oven with constant agitation. After 5 min equilibration, the SPME fiber was exposed to the sample headspace for 30 min. The compounds trapped on the fiber were thermally desorbed in the injection port of the chromatograph by heating the fiber for 15 min at 270 °C.

The GC-MS was equipped with a low polarity RXi-5Sil MS column (length, 20 m; internal diameter, 0.18 mm; layer thickness, 0.18 µm; Restek, Bellefonte, PA, USA). Injection was performed in splitless mode at 320 °C, a split flow of 9 ml/min, a purge flow of 5 ml/min and an open valve time of 3 min. To obtain a pulsed injection, a programmed gas flow was used whereby the helium gas flow was set at 2.7 mL/min for 0.1 min, followed by a decrease in flow of 20 ml/min to the normal 0.9 mL/min. The temperature was first held at 30 °C for 3 min and then allowed to rise to 80 °C at a rate of 7 °C/min, followed by a second ramp of 2 °C/min till 125 °C and a final ramp of 8 °C/min with a final temperature of 270 °C.

Mass acquisition range was 33 to 550 amu at a scan rate of 5 scans/s. Electron impact ionization energy was 70 eV. The interface and ion source were kept at 275 °C and 250 °C, respectively. A mix of linear n-alkanes (from C7 to C40, Supelco Co.) was injected into the GC-MS under identical conditions to serve as external retention index markers. Identification and quantification of the compounds were performed using an in-house developed R script as described in Goelen et al. and Reher et al. 87 , 88 (for package information, see Supplementary Table  S8 ). Briefly, chromatograms were analyzed using AMDIS (v2.71) 89 to separate overlapping peaks and obtain pure compound spectra. The NIST MS Search software (v2.0 g) in combination with the NIST2017, FFNSC3 and Adams4 libraries were used to manually identify the empirical spectra, taking into account the expected retention time. After background subtraction and correcting for retention time shifts between samples run on different days based on alkane ladders, compound elution profiles were extracted and integrated using a file with 284 target compounds of interest, which were either recovered in our identified AMDIS list of spectra or were known to occur in beer. Compound elution profiles were estimated for every peak in every chromatogram over a time-restricted window using weighted non-negative least square analysis after which peak areas were integrated 87 , 88 . Batch effect correction was performed by normalizing against the most stable internal standard compound, 4-fluorobenzaldehyde. Out of all 284 target compounds that were analyzed, 167 were visually judged to have reliable elution profiles and were used for final analysis.

Discrete photometric and enzymatic analysis

Discrete photometric and enzymatic analysis (Thermo Scientific TM Gallery TM Plus Beermaster Discrete Analyzer) was used to measure acetic acid, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, and sulfite. 2 ml of sample volume was used for the analyses. Information regarding the reagents and standard solutions used for analyses and calibrations is included in Supplementary Table  S7 and Supplementary Table  S9 .

NIR analyses

NIR analysis (Anton Paar Alcolyzer Beer ME System) was used to measure ethanol. Measurements comprised 50 ml of sample, and a 10% EtOH solution was used for calibration.

Correlation calculations

Pairwise Spearman Rank correlations were calculated between all chemical properties.

Sensory dataset

Trained panel.

Our trained tasting panel consisted of volunteers who gave prior verbal informed consent. All compounds used for the validation experiment were of food-grade quality. The tasting sessions were approved by the Social and Societal Ethics Committee of the KU Leuven (G-2022-5677-R2(MAR)). All online reviewers agreed to the Terms and Conditions of the RateBeer website.

Sensory analysis was performed according to the American Society of Brewing Chemists (ASBC) Sensory Analysis Methods 90 . 30 volunteers were screened through a series of triangle tests. The sixteen most sensitive and consistent tasters were retained as taste panel members. The resulting panel was diverse in age [22–42, mean: 29], sex [56% male] and nationality [7 different countries]. The panel developed a consensus vocabulary to describe beer aroma, taste and mouthfeel. Panelists were trained to identify and score 50 different attributes, using a 7-point scale to rate attributes’ intensity. The scoring sheet is included as Supplementary Data  3 . Sensory assessments took place between 10–12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C. Two reference beers were added to each set and indicated as ‘Reference 1 & 2’, allowing panel members to calibrate their ratings. Not all panelists were present at every tasting. Scores were scaled by standard deviation and mean-centered per taster. Values are represented as z-scores and clustered by Euclidean distance. Pairwise Spearman correlations were calculated between taste and aroma sensory attributes. Panel consistency was evaluated by repeating samples on different sessions and performing ANOVA to identify differences, using the ‘stats’ package (v4.2.2) in R (for package information, see Supplementary Table  S8 ).

Online reviews from a public database

The ‘scrapy’ package in Python (v3.6) (for package information, see Supplementary Table  S8 ). was used to collect 232,288 online reviews (mean=922, min=6, max=5343) from RateBeer, an online beer review database. Each review entry comprised 5 numerical scores (appearance, aroma, taste, palate and overall quality) and an optional review text. The total number of reviews per reviewer was collected separately. Numerical scores were scaled and centered per rater, and mean scores were calculated per beer.

For the review texts, the language was estimated using the packages ‘langdetect’ and ‘langid’ in Python. Reviews that were classified as English by both packages were kept. Reviewers with fewer than 100 entries overall were discarded. 181,025 reviews from >6000 reviewers from >40 countries remained. Text processing was done using the ‘nltk’ package in Python. Texts were corrected for slang and misspellings; proper nouns and rare words that are relevant to the beer context were specified and kept as-is (‘Chimay’,’Lambic’, etc.). A dictionary of semantically similar sensorial terms, for example ‘floral’ and ‘flower’, was created and collapsed together into one term. Words were stemmed and lemmatized to avoid identifying words such as ‘acid’ and ‘acidity’ as separate terms. Numbers and punctuation were removed.

Sentences from up to 50 randomly chosen reviews per beer were manually categorized according to the aspect of beer they describe (appearance, aroma, taste, palate, overall quality—not to be confused with the 5 numerical scores described above) or flagged as irrelevant if they contained no useful information. If a beer contained fewer than 50 reviews, all reviews were manually classified. This labeled data set was used to train a model that classified the rest of the sentences for all beers 91 . Sentences describing taste and aroma were extracted, and term frequency–inverse document frequency (TFIDF) was implemented to calculate enrichment scores for sensorial words per beer.

The sex of the tasting subject was not considered when building our sensory database. Instead, results from different panelists were averaged, both for our trained panel (56% male, 44% female) and the RateBeer reviews (70% male, 30% female for RateBeer as a whole).

Beer price collection and processing

Beer prices were collected from the following stores: Colruyt, Delhaize, Total Wine, BeerHawk, The Belgian Beer Shop, The Belgian Shop, and Beer of Belgium. Where applicable, prices were converted to Euros and normalized per liter. Spearman correlations were calculated between these prices and mean overall appreciation scores from RateBeer and the taste panel, respectively.

Pairwise Spearman Rank correlations were calculated between all sensory properties.

Machine learning models

Predictive modeling of sensory profiles from chemical data.

Regression models were constructed to predict (a) trained panel scores for beer flavors and quality from beer chemical profiles and (b) public reviews’ appreciation scores from beer chemical profiles. Z-scores were used to represent sensory attributes in both data sets. Chemical properties with log-normal distributions (Shapiro-Wilk test, p  <  0.05 ) were log-transformed. Missing chemical measurements (0.1% of all data) were replaced with mean values per attribute. Observations from 250 beers were randomly separated into a training set (70%, 175 beers) and a test set (30%, 75 beers), stratified per beer style. Chemical measurements (p = 231) were normalized based on the training set average and standard deviation. In total, three linear regression-based models: linear regression with first-order interaction terms (LR), lasso regression with first-order interaction terms (Lasso) and partial least squares regression (PLSR); five decision tree models, Adaboost regressor (ABR), Extra Trees (ET), Gradient Boosting regressor (GBR), Random Forest (RF) and XGBoost regressor (XGBR); one support vector machine model (SVR) and one artificial neural network model (ANN) were trained. The models were implemented using the ‘scikit-learn’ package (v1.2.2) and ‘xgboost’ package (v1.7.3) in Python (v3.9.16). Models were trained, and hyperparameters optimized, using five-fold cross-validated grid search with the coefficient of determination (R 2 ) as the evaluation metric. The ANN (scikit-learn’s MLPRegressor) was optimized using Bayesian Tree-Structured Parzen Estimator optimization with the ‘Optuna’ Python package (v3.2.0). Individual models were trained per attribute, and a multi-output model was trained on all attributes simultaneously.

Model dissection

GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important predictors for each predicted sensorial trait were obtained using the ‘scikit-learn’ package. To observe the relationships between these chemical properties and their predicted targets, partial dependence plots (PDP) were constructed for the six most important predictors of consumer appreciation 74 , 75 .

The ‘SHAP’ package in Python (v0.41.0) was implemented to provide an alternative ranking of predictor importance and to visualize the predictors’ effects as a function of their concentration 68 .

Validation of causal chemical properties

To validate the effects of the most important model features on predicted sensory attributes, beers were spiked with the chemical compounds identified by the models and descriptive sensory analyses were carried out according to the American Society of Brewing Chemists (ASBC) protocol 90 .

Compound spiking was done 30 min before tasting. Compounds were spiked into fresh beer bottles, that were immediately resealed and inverted three times. Fresh bottles of beer were opened for the same duration, resealed, and inverted thrice, to serve as controls. Pairs of spiked samples and controls were served simultaneously, chilled and in dark glasses as outlined in the Trained panel section above. Tasters were instructed to select the glass with the higher flavor intensity for each attribute (directional difference test 92 ) and to select the glass they prefer.

The final concentration after spiking was equal to the within-style average, after normalizing by ethanol concentration. This was done to ensure balanced flavor profiles in the final spiked beer. The same methods were applied to improve a non-alcoholic beer. Compounds were the following: ethyl acetate (Merck KGaA, W241415), ethyl hexanoate (Merck KGaA, W243906), isoamyl acetate (Merck KGaA, W205508), phenethyl acetate (Merck KGaA, W285706), ethanol (96%, Colruyt), glycerol (Merck KGaA, W252506), lactic acid (Merck KGaA, 261106).

Significant differences in preference or perceived intensity were determined by performing the two-sided binomial test on each attribute.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this work are available in the Supplementary Data files and have been deposited to Zenodo under accession code 10653704 93 . The RateBeer scores data are under restricted access, they are not publicly available as they are property of RateBeer (ZX Ventures, USA). Access can be obtained from the authors upon reasonable request and with permission of RateBeer (ZX Ventures, USA).  Source data are provided with this paper.

Code availability

The code for training the machine learning models, analyzing the models, and generating the figures has been deposited to Zenodo under accession code 10653704 93 .

Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355 , 391–394 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Plutowska, B. & Wardencki, W. Application of gas chromatography–olfactometry (GC–O) in analysis and quality assessment of alcoholic beverages – A review. Food Chem. 107 , 449–463 (2008).

Article   CAS   Google Scholar  

Legin, A., Rudnitskaya, A., Seleznev, B. & Vlasov, Y. Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Anal. Chim. Acta 534 , 129–135 (2005).

Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P. & Rayappan, J. B. B. Electronic noses for food quality: A review. J. Food Eng. 144 , 103–111 (2015).

Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A.-L. Flavor network and the principles of food pairing. Sci. Rep. 1 , 196 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bartoshuk, L. M. & Klee, H. J. Better fruits and vegetables through sensory analysis. Curr. Biol. 23 , R374–R378 (2013).

Article   CAS   PubMed   Google Scholar  

Piggott, J. R. Design questions in sensory and consumer science. Food Qual. Prefer. 3293 , 217–220 (1995).

Article   Google Scholar  

Kermit, M. & Lengard, V. Assessing the performance of a sensory panel-panellist monitoring and tracking. J. Chemom. 19 , 154–161 (2005).

Cook, D. J., Hollowood, T. A., Linforth, R. S. T. & Taylor, A. J. Correlating instrumental measurements of texture and flavour release with human perception. Int. J. Food Sci. Technol. 40 , 631–641 (2005).

Chinchanachokchai, S., Thontirawong, P. & Chinchanachokchai, P. A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations. J. Retail. Consum. Serv. 61 , 1–12 (2021).

Ross, C. F. Sensory science at the human-machine interface. Trends Food Sci. Technol. 20 , 63–72 (2009).

Chambers, E. IV & Koppel, K. Associations of volatile compounds with sensory aroma and flavor: The complex nature of flavor. Molecules 18 , 4887–4905 (2013).

Pinu, F. R. Metabolomics—The new frontier in food safety and quality research. Food Res. Int. 72 , 80–81 (2015).

Danezis, G. P., Tsagkaris, A. S., Brusic, V. & Georgiou, C. A. Food authentication: state of the art and prospects. Curr. Opin. Food Sci. 10 , 22–31 (2016).

Shepherd, G. M. Smell images and the flavour system in the human brain. Nature 444 , 316–321 (2006).

Meilgaard, M. C. Prediction of flavor differences between beers from their chemical composition. J. Agric. Food Chem. 30 , 1009–1017 (1982).

Xu, L. et al. Widespread receptor-driven modulation in peripheral olfactory coding. Science 368 , eaaz5390 (2020).

Kupferschmidt, K. Following the flavor. Science 340 , 808–809 (2013).

Billesbølle, C. B. et al. Structural basis of odorant recognition by a human odorant receptor. Nature 615 , 742–749 (2023).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Smith, B. Perspective: Complexities of flavour. Nature 486 , S6–S6 (2012).

Pfister, P. et al. Odorant receptor inhibition is fundamental to odor encoding. Curr. Biol. 30 , 2574–2587 (2020).

Moskowitz, H. W., Kumaraiah, V., Sharma, K. N., Jacobs, H. L. & Sharma, S. D. Cross-cultural differences in simple taste preferences. Science 190 , 1217–1218 (1975).

Eriksson, N. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1 , 22 (2012).

Ferdenzi, C. et al. Variability of affective responses to odors: Culture, gender, and olfactory knowledge. Chem. Senses 38 , 175–186 (2013).

Article   PubMed   Google Scholar  

Lawless, H. T. & Heymann, H. Sensory evaluation of food: Principles and practices. (Springer, New York, NY). https://doi.org/10.1007/978-1-4419-6488-5 (2010).

Colantonio, V. et al. Metabolomic selection for enhanced fruit flavor. Proc. Natl. Acad. Sci. 119 , e2115865119 (2022).

Fritz, F., Preissner, R. & Banerjee, P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 49 , W679–W684 (2021).

Tuwani, R., Wadhwa, S. & Bagler, G. BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Sci. Rep. 9 , 1–13 (2019).

Dagan-Wiener, A. et al. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7 , 1–13 (2017).

Pallante, L. et al. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci. Rep. 12 , 1–11 (2022).

Malavolta, M. et al. A survey on computational taste predictors. Eur. Food Res. Technol. 248 , 2215–2235 (2022).

Lee, B. K. et al. A principal odor map unifies diverse tasks in olfactory perception. Science 381 , 999–1006 (2023).

Mayhew, E. J. et al. Transport features predict if a molecule is odorous. Proc. Natl. Acad. Sci. 119 , e2116576119 (2022).

Niu, Y. et al. Sensory evaluation of the synergism among ester odorants in light aroma-type liquor by odor threshold, aroma intensity and flash GC electronic nose. Food Res. Int. 113 , 102–114 (2018).

Yu, P., Low, M. Y. & Zhou, W. Design of experiments and regression modelling in food flavour and sensory analysis: A review. Trends Food Sci. Technol. 71 , 202–215 (2018).

Oladokun, O. et al. The impact of hop bitter acid and polyphenol profiles on the perceived bitterness of beer. Food Chem. 205 , 212–220 (2016).

Linforth, R., Cabannes, M., Hewson, L., Yang, N. & Taylor, A. Effect of fat content on flavor delivery during consumption: An in vivo model. J. Agric. Food Chem. 58 , 6905–6911 (2010).

Guo, S., Na Jom, K. & Ge, Y. Influence of roasting condition on flavor profile of sunflower seeds: A flavoromics approach. Sci. Rep. 9 , 11295 (2019).

Ren, Q. et al. The changes of microbial community and flavor compound in the fermentation process of Chinese rice wine using Fagopyrum tataricum grain as feedstock. Sci. Rep. 9 , 3365 (2019).

Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning. (Springer, New York, NY). https://doi.org/10.1007/978-0-387-21606-5 (2001).

Dietz, C., Cook, D., Huismann, M., Wilson, C. & Ford, R. The multisensory perception of hop essential oil: a review. J. Inst. Brew. 126 , 320–342 (2020).

CAS   Google Scholar  

Roncoroni, Miguel & Verstrepen, Kevin Joan. Belgian Beer: Tested and Tasted. (Lannoo, 2018).

Meilgaard, M. Flavor chemistry of beer: Part II: Flavor and threshold of 239 aroma volatiles. in (1975).

Bokulich, N. A. & Bamforth, C. W. The microbiology of malting and brewing. Microbiol. Mol. Biol. Rev. MMBR 77 , 157–172 (2013).

Dzialo, M. C., Park, R., Steensels, J., Lievens, B. & Verstrepen, K. J. Physiology, ecology and industrial applications of aroma formation in yeast. FEMS Microbiol. Rev. 41 , S95–S128 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Datta, A. et al. Computer-aided food engineering. Nat. Food 3 , 894–904 (2022).

American Society of Brewing Chemists. Beer Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A.).

Olaniran, A. O., Hiralal, L., Mokoena, M. P. & Pillay, B. Flavour-active volatile compounds in beer: production, regulation and control. J. Inst. Brew. 123 , 13–23 (2017).

Verstrepen, K. J. et al. Flavor-active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Meilgaard, M. C. Flavour chemistry of beer. part I: flavour interaction between principal volatiles. Master Brew. Assoc. Am. Tech. Q 12 , 107–117 (1975).

Briggs, D. E., Boulton, C. A., Brookes, P. A. & Stevens, R. Brewing 227–254. (Woodhead Publishing). https://doi.org/10.1533/9781855739062.227 (2004).

Bossaert, S., Crauwels, S., De Rouck, G. & Lievens, B. The power of sour - A review: Old traditions, new opportunities. BrewingScience 72 , 78–88 (2019).

Google Scholar  

Verstrepen, K. J. et al. Flavor active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Snauwaert, I. et al. Microbial diversity and metabolite composition of Belgian red-brown acidic ales. Int. J. Food Microbiol. 221 , 1–11 (2016).

Spitaels, F. et al. The microbial diversity of traditional spontaneously fermented lambic beer. PLoS ONE 9 , e95384 (2014).

Blanco, C. A., Andrés-Iglesias, C. & Montero, O. Low-alcohol Beers: Flavor Compounds, Defects, and Improvement Strategies. Crit. Rev. Food Sci. Nutr. 56 , 1379–1388 (2016).

Jackowski, M. & Trusek, A. Non-Alcohol. beer Prod. – Overv. 20 , 32–38 (2018).

Takoi, K. et al. The contribution of geraniol metabolism to the citrus flavour of beer: Synergy of geraniol and β-citronellol under coexistence with excess linalool. J. Inst. Brew. 116 , 251–260 (2010).

Kroeze, J. H. & Bartoshuk, L. M. Bitterness suppression as revealed by split-tongue taste stimulation in humans. Physiol. Behav. 35 , 779–783 (1985).

Mennella, J. A. et al. A spoonful of sugar helps the medicine go down”: Bitter masking bysucrose among children and adults. Chem. Senses 40 , 17–25 (2015).

Wietstock, P., Kunz, T., Perreira, F. & Methner, F.-J. Metal chelation behavior of hop acids in buffered model systems. BrewingScience 69 , 56–63 (2016).

Sancho, D., Blanco, C. A., Caballero, I. & Pascual, A. Free iron in pale, dark and alcohol-free commercial lager beers. J. Sci. Food Agric. 91 , 1142–1147 (2011).

Rodrigues, H. & Parr, W. V. Contribution of cross-cultural studies to understanding wine appreciation: A review. Food Res. Int. 115 , 251–258 (2019).

Korneva, E. & Blockeel, H. Towards better evaluation of multi-target regression models. in ECML PKDD 2020 Workshops (eds. Koprinska, I. et al.) 353–362 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-65965-3_23 .

Gastón Ares. Mathematical and Statistical Methods in Food Science and Technology. (Wiley, 2013).

Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? Preprint at http://arxiv.org/abs/2207.08815 (2022).

Gries, S. T. Statistics for Linguistics with R: A Practical Introduction. in Statistics for Linguistics with R (De Gruyter Mouton, 2021). https://doi.org/10.1515/9783110718256 .

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Ickes, C. M. & Cadwallader, K. R. Effects of ethanol on flavor perception in alcoholic beverages. Chemosens. Percept. 10 , 119–134 (2017).

Kato, M. et al. Influence of high molecular weight polypeptides on the mouthfeel of commercial beer. J. Inst. Brew. 127 , 27–40 (2021).

Wauters, R. et al. Novel Saccharomyces cerevisiae variants slow down the accumulation of staling aldehydes and improve beer shelf-life. Food Chem. 398 , 1–11 (2023).

Li, H., Jia, S. & Zhang, W. Rapid determination of low-level sulfur compounds in beer by headspace gas chromatography with a pulsed flame photometric detector. J. Am. Soc. Brew. Chem. 66 , 188–191 (2008).

Dercksen, A., Laurens, J., Torline, P., Axcell, B. C. & Rohwer, E. Quantitative analysis of volatile sulfur compounds in beer using a membrane extraction interface. J. Am. Soc. Brew. Chem. 54 , 228–233 (1996).

Molnar, C. Interpretable Machine Learning: A Guide for Making Black-Box Models Interpretable. (2020).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. Publ. Am. Stat. Assoc. 39 , 272–281 (2019).

Article   MathSciNet   Google Scholar  

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. (Springer, 2019).

Labrado, D. et al. Identification by NMR of key compounds present in beer distillates and residual phases after dealcoholization by vacuum distillation. J. Sci. Food Agric. 100 , 3971–3978 (2020).

Lusk, L. T., Kay, S. B., Porubcan, A. & Ryder, D. S. Key olfactory cues for beer oxidation. J. Am. Soc. Brew. Chem. 70 , 257–261 (2012).

Gonzalez Viejo, C., Torrico, D. D., Dunshea, F. R. & Fuentes, S. Development of artificial neural network models to assess beer acceptability based on sensory properties using a robotic pourer: A comparative model approach to achieve an artificial intelligence system. Beverages 5 , 33 (2019).

Gonzalez Viejo, C., Fuentes, S., Torrico, D. D., Godbole, A. & Dunshea, F. R. Chemical characterization of aromas in beer and their effect on consumers liking. Food Chem. 293 , 479–485 (2019).

Gilbert, J. L. et al. Identifying breeding priorities for blueberry flavor using biochemical, sensory, and genotype by environment analyses. PLOS ONE 10 , 1–21 (2015).

Goulet, C. et al. Role of an esterase in flavor volatile variation within the tomato clade. Proc. Natl. Acad. Sci. 109 , 19009–19014 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Borisov, V. et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21 https://doi.org/10.1109/TNNLS.2022.3229161 (2022).

Statista. Statista Consumer Market Outlook: Beer - Worldwide.

Seitz, H. K. & Stickel, F. Molecular mechanisms of alcoholmediated carcinogenesis. Nat. Rev. Cancer 7 , 599–612 (2007).

Voordeckers, K. et al. Ethanol exposure increases mutation rate through error-prone polymerases. Nat. Commun. 11 , 3664 (2020).

Goelen, T. et al. Bacterial phylogeny predicts volatile organic compound composition and olfactory response of an aphid parasitoid. Oikos 129 , 1415–1428 (2020).

Article   ADS   Google Scholar  

Reher, T. et al. Evaluation of hop (Humulus lupulus) as a repellent for the management of Drosophila suzukii. Crop Prot. 124 , 104839 (2019).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 10 , 770–781 (1999).

American Society of Brewing Chemists. Sensory Analysis Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A., 1992).

McAuley, J., Leskovec, J. & Jurafsky, D. Learning Attitudes and Attributes from Multi-Aspect Reviews. Preprint at https://doi.org/10.48550/arXiv.1210.3926 (2012).

Meilgaard, M. C., Carr, B. T. & Carr, B. T. Sensory Evaluation Techniques. (CRC Press, Boca Raton). https://doi.org/10.1201/b16452 (2014).

Schreurs, M. et al. Data from: Predicting and improving complex beer flavor through machine learning. Zenodo https://doi.org/10.5281/zenodo.10653704 (2024).

Download references

Acknowledgements

We thank all lab members for their discussions and thank all tasting panel members for their contributions. Special thanks go out to Dr. Karin Voordeckers for her tremendous help in proofreading and improving the manuscript. M.S. was supported by a Baillet-Latour fellowship, L.C. acknowledges financial support from KU Leuven (C16/17/006), F.A.T. was supported by a PhD fellowship from FWO (1S08821N). Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16/17/006).

Author information

These authors contributed equally: Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni.

Authors and Affiliations

VIB—KU Leuven Center for Microbiology, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni, Lloyd Cool, Beatriz Herrera-Malaver, Florian A. Theßeling & Kevin J. Verstrepen

CMPG Laboratory of Genetics and Genomics, KU Leuven, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Leuven Institute for Beer Research (LIBR), Gaston Geenslaan 1, B-3001, Leuven, Belgium

Laboratory of Socioecology and Social Evolution, KU Leuven, Naamsestraat 59, B-3000, Leuven, Belgium

Lloyd Cool, Christophe Vanderaa & Tom Wenseleers

VIB Bioinformatics Core, VIB, Rijvisschestraat 120, B-9052, Ghent, Belgium

Łukasz Kreft & Alexander Botzki

AB InBev SA/NV, Brouwerijplein 1, B-3000, Leuven, Belgium

Philippe Malcorps & Luk Daenen

You can also search for this author in PubMed   Google Scholar

Contributions

S.P., M.S. and K.J.V. conceived the experiments. S.P., M.S. and K.J.V. designed the experiments. S.P., M.S., M.R., B.H. and F.A.T. performed the experiments. S.P., M.S., L.C., C.V., L.K., A.B., P.M., L.D., T.W. and K.J.V. contributed analysis ideas. S.P., M.S., L.C., C.V., T.W. and K.J.V. analyzed the data. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Kevin J. Verstrepen .

Ethics declarations

Competing interests.

K.J.V. is affiliated with bar.on. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Florian Bauer, Andrew John Macintosh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, supplementary data 6, supplementary data 7, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schreurs, M., Piampongsant, S., Roncoroni, M. et al. Predicting and improving complex beer flavor through machine learning. Nat Commun 15 , 2368 (2024). https://doi.org/10.1038/s41467-024-46346-0

Download citation

Received : 30 October 2023

Accepted : 21 February 2024

Published : 26 March 2024

DOI : https://doi.org/10.1038/s41467-024-46346-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

quantitative analysis and research

IMAGES

  1. Quantitative Research

    quantitative analysis and research

  2. Quantitative Analysis

    quantitative analysis and research

  3. Qualitative V/S Quantitative Research Method: Which One Is Better?

    quantitative analysis and research

  4. Types of Quantitative Research

    quantitative analysis and research

  5. Qualitative vs. Quantitative Research

    quantitative analysis and research

  6. Qualitative vs Quantitative Research: What's the Difference?

    quantitative analysis and research

VIDEO

  1. Quantitative research process

  2. Quantitative Research

  3. Quantitative Analysis, Topic 4

  4. Quantitative Research, Types and Examples Latest

  5. QUANTITATIVE ANALYSIS REVISION FOR APRIL 2023 QUESTION ONE

  6. PART TWO OF QUANTITATIVE ANALYSIS REVISION DONE ON 23RD NOVEMBER 2023

COMMENTS

  1. What Is Quantitative Research?

    Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, ... Quantitative data analysis. Once data is collected, you may need to process it before it can be analyzed. For example, survey and test data may need to be transformed ...

  2. Quantitative Research

    Quantitative Research. Quantitative research is a type of research that collects and analyzes numerical data to test hypotheses and answer research questions.This research typically involves a large sample size and uses statistical analysis to make inferences about a population based on the data collected.

  3. What is Quantitative Research? Definition, Methods, Types, and Examples

    Quantitative research is used to validate or test a hypothesis through the collection and analysis of data. (Image by Freepik) If you're wondering what is quantitative research and whether this methodology works for your research study, you're not alone. If you want a simple quantitative research definition, then it's enough to say that this is a method undertaken by researchers based on ...

  4. A Practical Guide to Writing Quantitative and Qualitative Research

    A research question is what a study aims to answer after data analysis and interpretation. The answer is written in length in the discussion section of the paper. ... In quantitative research, hypotheses predict the expected relationships among variables.15 Relationships among variables that can be predicted include 1) ...

  5. Quantitative research

    Quantitative research is a research strategy that focuses on quantifying the collection and analysis of data. It is formed from a deductive approach where emphasis is placed on the testing of theory, shaped by empiricist and positivist philosophies.. Associated with the natural, applied, formal, and social sciences this research strategy promotes the objective empirical investigation of ...

  6. What Is Quantitative Research?

    Revised on 10 October 2022. Quantitative research is the process of collecting and analysing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalise results to wider populations. Quantitative research is the opposite of qualitative research, which involves collecting and ...

  7. Research Methods--Quantitative, Qualitative, and More: Overview

    About Research Methods. This guide provides an overview of research methods, how to choose and use them, and supports and resources at UC Berkeley. As Patten and Newhart note in the book Understanding Research Methods, "Research methods are the building blocks of the scientific enterprise. They are the "how" for building systematic knowledge.

  8. Quantitative Methods

    Quantitative method is the collection and analysis of numerical data to answer scientific research questions. Quantitative method is used to summarize, average, find patterns, make predictions, and test causal associations as well as generalizing results to wider populations.

  9. Data Analysis in Quantitative Research

    In quantitative data analysis, causal research sets dependent and independent variables to identify the causative relationships. Causal questions are also named as explanatory questions. 3.2 Different Types of Variate Analysis. The term of variate is widely used in statistical texts, but it is difficult to locate statistical literature that ...

  10. Quantitative and Qualitative Research

    Quantitative research includes methodologies such as questionnaires, structured observations or experiments and stands in contrast to qualitative research. Qualitative research involves the collection and analysis of narratives and/or open-ended observations through methodologies such as interviews, focus groups or ethnographies.

  11. Quantitative Data Analysis Methods & Techniques 101

    The two "branches" of quantitative analysis. As I mentioned, quantitative analysis is powered by statistical analysis methods.There are two main "branches" of statistical methods that are used - descriptive statistics and inferential statistics.In your research, you might only use descriptive statistics, or you might use a mix of both, depending on what you're trying to figure out.

  12. What is Quantitative Research?

    Quantitative research is the methodology which researchers use to test theories about people's attitudes and behaviors based on numerical and statistical evidence. Researchers sample a large number of users (e.g., through surveys) to indirectly obtain measurable, bias-free data about users in relevant situations.

  13. Qualitative vs Quantitative Research: What's the Difference?

    The main difference between quantitative and qualitative research is the type of data they collect and analyze. Quantitative research collects numerical data and analyzes it using statistical methods. The aim is to produce objective, empirical data that can be measured and expressed in numerical terms.

  14. Basic statistical tools in research and data analysis

    The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  15. (PDF) Quantitative Analysis: the guide for beginners

    For all this and many more, doing research is worth it. The objective of this handbook is that readers become capable to conduct research following a quantitative methodology. This is a manual to ...

  16. Introduction to Quantitative Analysis

    Chapter Learning Objectives. Understand the justification for quantitative analysis. Learn how data and the scientific process can be used to inform decisions. Learn and differentiate between some of the commonly used terminology in quantitative analysis. Introduce the functions of quantitative analysis.

  17. What is Quantitative Research? Definition, Examples, Key ...

    Quantitative Research: Key Advantages. The advantages of quantitative research make it a valuable research method in a variety of fields, particularly in fields that require precise measurement and testing of hypotheses. Precision: Quantitative research aims to be precise in its measurement and analysis of data.

  18. Quantitative Analysis: Definition, Importance + Types

    This analysis seeks to identify data patterns, trends, and linkages to inform decisions and predictions. Quantitative data analysis uses statistics and math to solve problems in business, finance, and risk management problems. It is an important technique that helps financial analysts, scientists, and researchers understand challenging ideas ...

  19. Quantitative Analysis (QA): What It Is and How It's Used in Finance

    Quantitative analysis refers to economic, business or financial analysis that aims to understand or predict behavior or events through the use of mathematical measurements and calculations ...

  20. Quantifying possible bias in clinical and epidemiological studies with

    Bias in epidemiological studies can adversely affect the validity of study findings. Sensitivity analyses, known as quantitative bias analyses, are available to quantify potential residual bias arising from measurement error, confounding, and selection into the study. Effective application of these methods benefits from the input of multiple parties including clinicians, epidemiologists, and ...

  21. Quantitative Analysis

    Quantitative analysis is the use of mathematical and statistical techniques to assess the performance of a business. Before the advent of quantitative analysis, many company directors based their decisions on experience and gut. Business owners can now use quantitative methods to predict trends, determine the allocation of resources, and manage ...

  22. Introduction to Empirical Analysis and Quantitative Methods

    We will cover basic topics in research design, statistics, and formal modeling, considering many examples along the way. The two primary goals of the course are: (1) to provide students with analytic tools that will help them to understand how political scientists do empirical research, and (2) to improve students' ability to pose and answer ...

  23. Quantitative Analysis: A Simple Overview

    Quantitative analysis (QA) seeks to understand behavior by using mathematical and statistical modeling, measurement, and research. more Forecasting: What It Is, How It's Used in Business and ...

  24. Internal factors promoting research collaboration problems: an input

    Research collaborations are crucial for scientific progress, but their success is often compromised by internal collaboration problems. While previous work is often small-scaled and largely based on case studies and qualitative work, we present a large-scale, quantitative, and representative study to investigate important drivers behind research collaboration problems in various disciplines ...

  25. Separations

    Sialic acid disorders, which are screened out via the quantitative analysis of both total sialic acids and free sialic acids in human serum, show promise in terms of disease monitoring and prognosis . In tumors, especially malignant tumors, the level of sialic acids can be higher than in normal tissue and can thus be used as a marker to ...

  26. Predicting and improving complex beer flavor through machine ...

    For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 ...