Quantitative Data Analysis: A Comprehensive Guide

By: Ofem Eteng Published: May 18, 2022

Related Articles

quantitative data analysis tools in research

A healthcare giant successfully introduces the most effective drug dosage through rigorous statistical modeling, saving countless lives. A marketing team predicts consumer trends with uncanny accuracy, tailoring campaigns for maximum impact.

Table of Contents

These trends and dosages are not just any numbers but are a result of meticulous quantitative data analysis. Quantitative data analysis offers a robust framework for understanding complex phenomena, evaluating hypotheses, and predicting future outcomes.

In this blog, we’ll walk through the concept of quantitative data analysis, the steps required, its advantages, and the methods and techniques that are used in this analysis. Read on!

What is Quantitative Data Analysis?

Quantitative data analysis is a systematic process of examining, interpreting, and drawing meaningful conclusions from numerical data. It involves the application of statistical methods, mathematical models, and computational techniques to understand patterns, relationships, and trends within datasets.

Quantitative data analysis methods typically work with algorithms, mathematical analysis tools, and software to gain insights from the data, answering questions such as how many, how often, and how much. Data for quantitative data analysis is usually collected from close-ended surveys, questionnaires, polls, etc. The data can also be obtained from sales figures, email click-through rates, number of website visitors, and percentage revenue increase. 

Quantitative Data Analysis vs Qualitative Data Analysis

When we talk about data, we directly think about the pattern, the relationship, and the connection between the datasets – analyzing the data in short. Therefore when it comes to data analysis, there are broadly two types – Quantitative Data Analysis and Qualitative Data Analysis.

Quantitative data analysis revolves around numerical data and statistics, which are suitable for functions that can be counted or measured. In contrast, qualitative data analysis includes description and subjective information – for things that can be observed but not measured.

Let us differentiate between Quantitative Data Analysis and Quantitative Data Analysis for a better understanding.

Data Preparation Steps for Quantitative Data Analysis

Quantitative data has to be gathered and cleaned before proceeding to the stage of analyzing it. Below are the steps to prepare a data before quantitative research analysis:

  • Step 1: Data Collection

Before beginning the analysis process, you need data. Data can be collected through rigorous quantitative research, which includes methods such as interviews, focus groups, surveys, and questionnaires.

  • Step 2: Data Cleaning

Once the data is collected, begin the data cleaning process by scanning through the entire data for duplicates, errors, and omissions. Keep a close eye for outliers (data points that are significantly different from the majority of the dataset) because they can skew your analysis results if they are not removed.

This data-cleaning process ensures data accuracy, consistency and relevancy before analysis.

  • Step 3: Data Analysis and Interpretation

Now that you have collected and cleaned your data, it is now time to carry out the quantitative analysis. There are two methods of quantitative data analysis, which we will discuss in the next section.

However, if you have data from multiple sources, collecting and cleaning it can be a cumbersome task. This is where Hevo Data steps in. With Hevo, extracting, transforming, and loading data from source to destination becomes a seamless task, eliminating the need for manual coding. This not only saves valuable time but also enhances the overall efficiency of data analysis and visualization, empowering users to derive insights quickly and with precision

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Now that you are familiar with what quantitative data analysis is and how to prepare your data for analysis, the focus will shift to the purpose of this article, which is to describe the methods and techniques of quantitative data analysis.

Methods and Techniques of Quantitative Data Analysis

Quantitative data analysis employs two techniques to extract meaningful insights from datasets, broadly. The first method is descriptive statistics, which summarizes and portrays essential features of a dataset, such as mean, median, and standard deviation.

Inferential statistics, the second method, extrapolates insights and predictions from a sample dataset to make broader inferences about an entire population, such as hypothesis testing and regression analysis.

An in-depth explanation of both the methods is provided below:

  • Descriptive Statistics
  • Inferential Statistics

1) Descriptive Statistics

Descriptive statistics as the name implies is used to describe a dataset. It helps understand the details of your data by summarizing it and finding patterns from the specific data sample. They provide absolute numbers obtained from a sample but do not necessarily explain the rationale behind the numbers and are mostly used for analyzing single variables. The methods used in descriptive statistics include: 

  • Mean:   This calculates the numerical average of a set of values.
  • Median: This is used to get the midpoint of a set of values when the numbers are arranged in numerical order.
  • Mode: This is used to find the most commonly occurring value in a dataset.
  • Percentage: This is used to express how a value or group of respondents within the data relates to a larger group of respondents.
  • Frequency: This indicates the number of times a value is found.
  • Range: This shows the highest and lowest values in a dataset.
  • Standard Deviation: This is used to indicate how dispersed a range of numbers is, meaning, it shows how close all the numbers are to the mean.
  • Skewness: It indicates how symmetrical a range of numbers is, showing if they cluster into a smooth bell curve shape in the middle of the graph or if they skew towards the left or right.

2) Inferential Statistics

In quantitative analysis, the expectation is to turn raw numbers into meaningful insight using numerical values, and descriptive statistics is all about explaining details of a specific dataset using numbers, but it does not explain the motives behind the numbers; hence, a need for further analysis using inferential statistics.

Inferential statistics aim to make predictions or highlight possible outcomes from the analyzed data obtained from descriptive statistics. They are used to generalize results and make predictions between groups, show relationships that exist between multiple variables, and are used for hypothesis testing that predicts changes or differences.

There are various statistical analysis methods used within inferential statistics; a few are discussed below.

  • Cross Tabulations: Cross tabulation or crosstab is used to show the relationship that exists between two variables and is often used to compare results by demographic groups. It uses a basic tabular form to draw inferences between different data sets and contains data that is mutually exclusive or has some connection with each other. Crosstabs help understand the nuances of a dataset and factors that may influence a data point.
  • Regression Analysis: Regression analysis estimates the relationship between a set of variables. It shows the correlation between a dependent variable (the variable or outcome you want to measure or predict) and any number of independent variables (factors that may impact the dependent variable). Therefore, the purpose of the regression analysis is to estimate how one or more variables might affect a dependent variable to identify trends and patterns to make predictions and forecast possible future trends. There are many types of regression analysis, and the model you choose will be determined by the type of data you have for the dependent variable. The types of regression analysis include linear regression, non-linear regression, binary logistic regression, etc.
  • Monte Carlo Simulation: Monte Carlo simulation, also known as the Monte Carlo method, is a computerized technique of generating models of possible outcomes and showing their probability distributions. It considers a range of possible outcomes and then tries to calculate how likely each outcome will occur. Data analysts use it to perform advanced risk analyses to help forecast future events and make decisions accordingly.
  • Analysis of Variance (ANOVA): This is used to test the extent to which two or more groups differ from each other. It compares the mean of various groups and allows the analysis of multiple groups.
  • Factor Analysis:   A large number of variables can be reduced into a smaller number of factors using the factor analysis technique. It works on the principle that multiple separate observable variables correlate with each other because they are all associated with an underlying construct. It helps in reducing large datasets into smaller, more manageable samples.
  • Cohort Analysis: Cohort analysis can be defined as a subset of behavioral analytics that operates from data taken from a given dataset. Rather than looking at all users as one unit, cohort analysis breaks down data into related groups for analysis, where these groups or cohorts usually have common characteristics or similarities within a defined period.
  • MaxDiff Analysis: This is a quantitative data analysis method that is used to gauge customers’ preferences for purchase and what parameters rank higher than the others in the process. 
  • Cluster Analysis: Cluster analysis is a technique used to identify structures within a dataset. Cluster analysis aims to be able to sort different data points into groups that are internally similar and externally different; that is, data points within a cluster will look like each other and different from data points in other clusters.
  • Time Series Analysis: This is a statistical analytic technique used to identify trends and cycles over time. It is simply the measurement of the same variables at different times, like weekly and monthly email sign-ups, to uncover trends, seasonality, and cyclic patterns. By doing this, the data analyst can forecast how variables of interest may fluctuate in the future. 
  • SWOT analysis: This is a quantitative data analysis method that assigns numerical values to indicate strengths, weaknesses, opportunities, and threats of an organization, product, or service to show a clearer picture of competition to foster better business strategies

How to Choose the Right Method for your Analysis?

Choosing between Descriptive Statistics or Inferential Statistics can be often confusing. You should consider the following factors before choosing the right method for your quantitative data analysis:

1. Type of Data

The first consideration in data analysis is understanding the type of data you have. Different statistical methods have specific requirements based on these data types, and using the wrong method can render results meaningless. The choice of statistical method should align with the nature and distribution of your data to ensure meaningful and accurate analysis.

2. Your Research Questions

When deciding on statistical methods, it’s crucial to align them with your specific research questions and hypotheses. The nature of your questions will influence whether descriptive statistics alone, which reveal sample attributes, are sufficient or if you need both descriptive and inferential statistics to understand group differences or relationships between variables and make population inferences.

Pros and Cons of Quantitative Data Analysis

1. Objectivity and Generalizability:

  • Quantitative data analysis offers objective, numerical measurements, minimizing bias and personal interpretation.
  • Results can often be generalized to larger populations, making them applicable to broader contexts.

Example: A study using quantitative data analysis to measure student test scores can objectively compare performance across different schools and demographics, leading to generalizable insights about educational strategies.

2. Precision and Efficiency:

  • Statistical methods provide precise numerical results, allowing for accurate comparisons and prediction.
  • Large datasets can be analyzed efficiently with the help of computer software, saving time and resources.

Example: A marketing team can use quantitative data analysis to precisely track click-through rates and conversion rates on different ad campaigns, quickly identifying the most effective strategies for maximizing customer engagement.

3. Identification of Patterns and Relationships:

  • Statistical techniques reveal hidden patterns and relationships between variables that might not be apparent through observation alone.
  • This can lead to new insights and understanding of complex phenomena.

Example: A medical researcher can use quantitative analysis to pinpoint correlations between lifestyle factors and disease risk, aiding in the development of prevention strategies.

1. Limited Scope:

  • Quantitative analysis focuses on quantifiable aspects of a phenomenon ,  potentially overlooking important qualitative nuances, such as emotions, motivations, or cultural contexts.

Example: A survey measuring customer satisfaction with numerical ratings might miss key insights about the underlying reasons for their satisfaction or dissatisfaction, which could be better captured through open-ended feedback.

2. Oversimplification:

  • Reducing complex phenomena to numerical data can lead to oversimplification and a loss of richness in understanding.

Example: Analyzing employee productivity solely through quantitative metrics like hours worked or tasks completed might not account for factors like creativity, collaboration, or problem-solving skills, which are crucial for overall performance.

3. Potential for Misinterpretation:

  • Statistical results can be misinterpreted if not analyzed carefully and with appropriate expertise.
  • The choice of statistical methods and assumptions can significantly influence results.

This blog discusses the steps, methods, and techniques of quantitative data analysis. It also gives insights into the methods of data collection, the type of data one should work with, and the pros and cons of such analysis.

Gain a better understanding of data analysis with these essential reads:

  • Data Analysis and Modeling: 4 Critical Differences
  • Exploratory Data Analysis Simplified 101
  • 25 Best Data Analysis Tools in 2024

Carrying out successful data analysis requires prepping the data and making it analysis-ready. That is where Hevo steps in.

Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing Hevo price , which will assist you in selecting the best plan for your requirements.

Share your experience of understanding Quantitative Data Analysis in the comment section below! We would love to hear your thoughts.

Ofem Eteng

Ofem is a freelance writer specializing in data-related topics, who has expertise in translating complex concepts. With a focus on data science, analytics, and emerging technologies.

No-code Data Pipeline for your Data Warehouse

  • Data Analysis
  • Data Warehouse
  • Quantitative Data Analysis

Continue Reading

Sarad Mohanan

Best Data Reconciliation Tools: Complete Guide

Satyam Agrawal

What is Data Reconciliation? Everything to Know

Sarthak Bhardwaj

Data Observability vs Data Quality: Difference and Relationships Explored

I want to read this e-book.

quantitative data analysis tools in research

Grad Coach

Quantitative Data Analysis 101

The lingo, methods and techniques, explained simply.

By: Derek Jansen (MBA)  and Kerryn Warren (PhD) | December 2020

Quantitative data analysis is one of those things that often strikes fear in students. It’s totally understandable – quantitative analysis is a complex topic, full of daunting lingo , like medians, modes, correlation and regression. Suddenly we’re all wishing we’d paid a little more attention in math class…

The good news is that while quantitative data analysis is a mammoth topic, gaining a working understanding of the basics isn’t that hard , even for those of us who avoid numbers and math . In this post, we’ll break quantitative analysis down into simple , bite-sized chunks so you can approach your research with confidence.

Quantitative data analysis methods and techniques 101

Overview: Quantitative Data Analysis 101

  • What (exactly) is quantitative data analysis?
  • When to use quantitative analysis
  • How quantitative analysis works

The two “branches” of quantitative analysis

  • Descriptive statistics 101
  • Inferential statistics 101
  • How to choose the right quantitative methods
  • Recap & summary

What is quantitative data analysis?

Despite being a mouthful, quantitative data analysis simply means analysing data that is numbers-based – or data that can be easily “converted” into numbers without losing any meaning.

For example, category-based variables like gender, ethnicity, or native language could all be “converted” into numbers without losing meaning – for example, English could equal 1, French 2, etc.

This contrasts against qualitative data analysis, where the focus is on words, phrases and expressions that can’t be reduced to numbers. If you’re interested in learning about qualitative analysis, check out our post and video here .

What is quantitative analysis used for?

Quantitative analysis is generally used for three purposes.

  • Firstly, it’s used to measure differences between groups . For example, the popularity of different clothing colours or brands.
  • Secondly, it’s used to assess relationships between variables . For example, the relationship between weather temperature and voter turnout.
  • And third, it’s used to test hypotheses in a scientifically rigorous way. For example, a hypothesis about the impact of a certain vaccine.

Again, this contrasts with qualitative analysis , which can be used to analyse people’s perceptions and feelings about an event or situation. In other words, things that can’t be reduced to numbers.

How does quantitative analysis work?

Well, since quantitative data analysis is all about analysing numbers , it’s no surprise that it involves statistics . Statistical analysis methods form the engine that powers quantitative analysis, and these methods can vary from pretty basic calculations (for example, averages and medians) to more sophisticated analyses (for example, correlations and regressions).

Sounds like gibberish? Don’t worry. We’ll explain all of that in this post. Importantly, you don’t need to be a statistician or math wiz to pull off a good quantitative analysis. We’ll break down all the technical mumbo jumbo in this post.

Need a helping hand?

quantitative data analysis tools in research

As I mentioned, quantitative analysis is powered by statistical analysis methods . There are two main “branches” of statistical methods that are used – descriptive statistics and inferential statistics . In your research, you might only use descriptive statistics, or you might use a mix of both , depending on what you’re trying to figure out. In other words, depending on your research questions, aims and objectives . I’ll explain how to choose your methods later.

So, what are descriptive and inferential statistics?

Well, before I can explain that, we need to take a quick detour to explain some lingo. To understand the difference between these two branches of statistics, you need to understand two important words. These words are population and sample .

First up, population . In statistics, the population is the entire group of people (or animals or organisations or whatever) that you’re interested in researching. For example, if you were interested in researching Tesla owners in the US, then the population would be all Tesla owners in the US.

However, it’s extremely unlikely that you’re going to be able to interview or survey every single Tesla owner in the US. Realistically, you’ll likely only get access to a few hundred, or maybe a few thousand owners using an online survey. This smaller group of accessible people whose data you actually collect is called your sample .

So, to recap – the population is the entire group of people you’re interested in, and the sample is the subset of the population that you can actually get access to. In other words, the population is the full chocolate cake , whereas the sample is a slice of that cake.

So, why is this sample-population thing important?

Well, descriptive statistics focus on describing the sample , while inferential statistics aim to make predictions about the population, based on the findings within the sample. In other words, we use one group of statistical methods – descriptive statistics – to investigate the slice of cake, and another group of methods – inferential statistics – to draw conclusions about the entire cake. There I go with the cake analogy again…

With that out the way, let’s take a closer look at each of these branches in more detail.

Descriptive statistics vs inferential statistics

Branch 1: Descriptive Statistics

Descriptive statistics serve a simple but critically important role in your research – to describe your data set – hence the name. In other words, they help you understand the details of your sample . Unlike inferential statistics (which we’ll get to soon), descriptive statistics don’t aim to make inferences or predictions about the entire population – they’re purely interested in the details of your specific sample .

When you’re writing up your analysis, descriptive statistics are the first set of stats you’ll cover, before moving on to inferential statistics. But, that said, depending on your research objectives and research questions , they may be the only type of statistics you use. We’ll explore that a little later.

So, what kind of statistics are usually covered in this section?

Some common statistical tests used in this branch include the following:

  • Mean – this is simply the mathematical average of a range of numbers.
  • Median – this is the midpoint in a range of numbers when the numbers are arranged in numerical order. If the data set makes up an odd number, then the median is the number right in the middle of the set. If the data set makes up an even number, then the median is the midpoint between the two middle numbers.
  • Mode – this is simply the most commonly occurring number in the data set.
  • In cases where most of the numbers are quite close to the average, the standard deviation will be relatively low.
  • Conversely, in cases where the numbers are scattered all over the place, the standard deviation will be relatively high.
  • Skewness . As the name suggests, skewness indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph, or do they skew to the left or right?

Feeling a bit confused? Let’s look at a practical example using a small data set.

Descriptive statistics example data

On the left-hand side is the data set. This details the bodyweight of a sample of 10 people. On the right-hand side, we have the descriptive statistics. Let’s take a look at each of them.

First, we can see that the mean weight is 72.4 kilograms. In other words, the average weight across the sample is 72.4 kilograms. Straightforward.

Next, we can see that the median is very similar to the mean (the average). This suggests that this data set has a reasonably symmetrical distribution (in other words, a relatively smooth, centred distribution of weights, clustered towards the centre).

In terms of the mode , there is no mode in this data set. This is because each number is present only once and so there cannot be a “most common number”. If there were two people who were both 65 kilograms, for example, then the mode would be 65.

Next up is the standard deviation . 10.6 indicates that there’s quite a wide spread of numbers. We can see this quite easily by looking at the numbers themselves, which range from 55 to 90, which is quite a stretch from the mean of 72.4.

And lastly, the skewness of -0.2 tells us that the data is very slightly negatively skewed. This makes sense since the mean and the median are slightly different.

As you can see, these descriptive statistics give us some useful insight into the data set. Of course, this is a very small data set (only 10 records), so we can’t read into these statistics too much. Also, keep in mind that this is not a list of all possible descriptive statistics – just the most common ones.

But why do all of these numbers matter?

While these descriptive statistics are all fairly basic, they’re important for a few reasons:

  • Firstly, they help you get both a macro and micro-level view of your data. In other words, they help you understand both the big picture and the finer details.
  • Secondly, they help you spot potential errors in the data – for example, if an average is way higher than you’d expect, or responses to a question are highly varied, this can act as a warning sign that you need to double-check the data.
  • And lastly, these descriptive statistics help inform which inferential statistical techniques you can use, as those techniques depend on the skewness (in other words, the symmetry and normality) of the data.

Simply put, descriptive statistics are really important , even though the statistical techniques used are fairly basic. All too often at Grad Coach, we see students skimming over the descriptives in their eagerness to get to the more exciting inferential methods, and then landing up with some very flawed results.

Don’t be a sucker – give your descriptive statistics the love and attention they deserve!

Examples of descriptive statistics

Branch 2: Inferential Statistics

As I mentioned, while descriptive statistics are all about the details of your specific data set – your sample – inferential statistics aim to make inferences about the population . In other words, you’ll use inferential statistics to make predictions about what you’d expect to find in the full population.

What kind of predictions, you ask? Well, there are two common types of predictions that researchers try to make using inferential stats:

  • Firstly, predictions about differences between groups – for example, height differences between children grouped by their favourite meal or gender.
  • And secondly, relationships between variables – for example, the relationship between body weight and the number of hours a week a person does yoga.

In other words, inferential statistics (when done correctly), allow you to connect the dots and make predictions about what you expect to see in the real world population, based on what you observe in your sample data. For this reason, inferential statistics are used for hypothesis testing – in other words, to test hypotheses that predict changes or differences.

Inferential statistics are used to make predictions about what you’d expect to find in the full population, based on the sample.

Of course, when you’re working with inferential statistics, the composition of your sample is really important. In other words, if your sample doesn’t accurately represent the population you’re researching, then your findings won’t necessarily be very useful.

For example, if your population of interest is a mix of 50% male and 50% female , but your sample is 80% male , you can’t make inferences about the population based on your sample, since it’s not representative. This area of statistics is called sampling, but we won’t go down that rabbit hole here (it’s a deep one!) – we’ll save that for another post .

What statistics are usually used in this branch?

There are many, many different statistical analysis methods within the inferential branch and it’d be impossible for us to discuss them all here. So we’ll just take a look at some of the most common inferential statistical methods so that you have a solid starting point.

First up are T-Tests . T-tests compare the means (the averages) of two groups of data to assess whether they’re statistically significantly different. In other words, do they have significantly different means, standard deviations and skewness.

This type of testing is very useful for understanding just how similar or different two groups of data are. For example, you might want to compare the mean blood pressure between two groups of people – one that has taken a new medication and one that hasn’t – to assess whether they are significantly different.

Kicking things up a level, we have ANOVA, which stands for “analysis of variance”. This test is similar to a T-test in that it compares the means of various groups, but ANOVA allows you to analyse multiple groups , not just two groups So it’s basically a t-test on steroids…

Next, we have correlation analysis . This type of analysis assesses the relationship between two variables. In other words, if one variable increases, does the other variable also increase, decrease or stay the same. For example, if the average temperature goes up, do average ice creams sales increase too? We’d expect some sort of relationship between these two variables intuitively , but correlation analysis allows us to measure that relationship scientifically .

Lastly, we have regression analysis – this is quite similar to correlation in that it assesses the relationship between variables, but it goes a step further to understand cause and effect between variables, not just whether they move together. In other words, does the one variable actually cause the other one to move, or do they just happen to move together naturally thanks to another force? Just because two variables correlate doesn’t necessarily mean that one causes the other.

Stats overload…

I hear you. To make this all a little more tangible, let’s take a look at an example of a correlation in action.

Here’s a scatter plot demonstrating the correlation (relationship) between weight and height. Intuitively, we’d expect there to be some relationship between these two variables, which is what we see in this scatter plot. In other words, the results tend to cluster together in a diagonal line from bottom left to top right.

Sample correlation

As I mentioned, these are are just a handful of inferential techniques – there are many, many more. Importantly, each statistical method has its own assumptions and limitations.

For example, some methods only work with normally distributed (parametric) data, while other methods are designed specifically for non-parametric data. And that’s exactly why descriptive statistics are so important – they’re the first step to knowing which inferential techniques you can and can’t use.

Remember that every statistical method has its own assumptions and limitations,  so you need to be aware of these.

How to choose the right analysis method

To choose the right statistical methods, you need to think about two important factors :

  • The type of quantitative data you have (specifically, level of measurement and the shape of the data). And,
  • Your research questions and hypotheses

Let’s take a closer look at each of these.

Factor 1 – Data type

The first thing you need to consider is the type of data you’ve collected (or the type of data you will collect). By data types, I’m referring to the four levels of measurement – namely, nominal, ordinal, interval and ratio. If you’re not familiar with this lingo, check out the video below.

Why does this matter?

Well, because different statistical methods and techniques require different types of data. This is one of the “assumptions” I mentioned earlier – every method has its assumptions regarding the type of data.

For example, some techniques work with categorical data (for example, yes/no type questions, or gender or ethnicity), while others work with continuous numerical data (for example, age, weight or income) – and, of course, some work with multiple data types.

If you try to use a statistical method that doesn’t support the data type you have, your results will be largely meaningless . So, make sure that you have a clear understanding of what types of data you’ve collected (or will collect). Once you have this, you can then check which statistical methods would support your data types here .

If you haven’t collected your data yet, you can work in reverse and look at which statistical method would give you the most useful insights, and then design your data collection strategy to collect the correct data types.

Another important factor to consider is the shape of your data . Specifically, does it have a normal distribution (in other words, is it a bell-shaped curve, centred in the middle) or is it very skewed to the left or the right? Again, different statistical techniques work for different shapes of data – some are designed for symmetrical data while others are designed for skewed data.

This is another reminder of why descriptive statistics are so important – they tell you all about the shape of your data.

Factor 2: Your research questions

The next thing you need to consider is your specific research questions, as well as your hypotheses (if you have some). The nature of your research questions and research hypotheses will heavily influence which statistical methods and techniques you should use.

If you’re just interested in understanding the attributes of your sample (as opposed to the entire population), then descriptive statistics are probably all you need. For example, if you just want to assess the means (averages) and medians (centre points) of variables in a group of people.

On the other hand, if you aim to understand differences between groups or relationships between variables and to infer or predict outcomes in the population, then you’ll likely need both descriptive statistics and inferential statistics.

So, it’s really important to get very clear about your research aims and research questions, as well your hypotheses – before you start looking at which statistical techniques to use.

Never shoehorn a specific statistical technique into your research just because you like it or have some experience with it. Your choice of methods must align with all the factors we’ve covered here.

Time to recap…

You’re still with me? That’s impressive. We’ve covered a lot of ground here, so let’s recap on the key points:

  • Quantitative data analysis is all about  analysing number-based data  (which includes categorical and numerical data) using various statistical techniques.
  • The two main  branches  of statistics are  descriptive statistics  and  inferential statistics . Descriptives describe your sample, whereas inferentials make predictions about what you’ll find in the population.
  • Common  descriptive statistical methods include  mean  (average),  median , standard  deviation  and  skewness .
  • Common  inferential statistical methods include  t-tests ,  ANOVA ,  correlation  and  regression  analysis.
  • To choose the right statistical methods and techniques, you need to consider the  type of data you’re working with , as well as your  research questions  and hypotheses.

quantitative data analysis tools in research

Psst… there’s more (for free)

This post is part of our dissertation mini-course, which covers everything you need to get started with your dissertation, thesis or research project. 

You Might Also Like:

Narrative analysis explainer

74 Comments

Oddy Labs

Hi, I have read your article. Such a brilliant post you have created.

Derek Jansen

Thank you for the feedback. Good luck with your quantitative analysis.

Abdullahi Ramat

Thank you so much.

Obi Eric Onyedikachi

Thank you so much. I learnt much well. I love your summaries of the concepts. I had love you to explain how to input data using SPSS

Lumbuka Kaunda

Amazing and simple way of breaking down quantitative methods.

Charles Lwanga

This is beautiful….especially for non-statisticians. I have skimmed through but I wish to read again. and please include me in other articles of the same nature when you do post. I am interested. I am sure, I could easily learn from you and get off the fear that I have had in the past. Thank you sincerely.

Essau Sefolo

Send me every new information you might have.

fatime

i need every new information

Dr Peter

Thank you for the blog. It is quite informative. Dr Peter Nemaenzhe PhD

Mvogo Mvogo Ephrem

It is wonderful. l’ve understood some of the concepts in a more compréhensive manner

Maya

Your article is so good! However, I am still a bit lost. I am doing a secondary research on Gun control in the US and increase in crime rates and I am not sure which analysis method I should use?

Joy

Based on the given learning points, this is inferential analysis, thus, use ‘t-tests, ANOVA, correlation and regression analysis’

Peter

Well explained notes. Am an MPH student and currently working on my thesis proposal, this has really helped me understand some of the things I didn’t know.

Jejamaije Mujoro

I like your page..helpful

prashant pandey

wonderful i got my concept crystal clear. thankyou!!

Dailess Banda

This is really helpful , thank you

Lulu

Thank you so much this helped

wossen

Wonderfully explained

Niamatullah zaheer

thank u so much, it was so informative

mona

THANKYOU, this was very informative and very helpful

Thaddeus Ogwoka

This is great GRADACOACH I am not a statistician but I require more of this in my thesis

Include me in your posts.

Alem Teshome

This is so great and fully useful. I would like to thank you again and again.

Mrinal

Glad to read this article. I’ve read lot of articles but this article is clear on all concepts. Thanks for sharing.

Emiola Adesina

Thank you so much. This is a very good foundation and intro into quantitative data analysis. Appreciate!

Josyl Hey Aquilam

You have a very impressive, simple but concise explanation of data analysis for Quantitative Research here. This is a God-send link for me to appreciate research more. Thank you so much!

Lynnet Chikwaikwai

Avery good presentation followed by the write up. yes you simplified statistics to make sense even to a layman like me. Thank so much keep it up. The presenter did ell too. i would like more of this for Qualitative and exhaust more of the test example like the Anova.

Adewole Ikeoluwa

This is a very helpful article, couldn’t have been clearer. Thank you.

Samih Soud ALBusaidi

Awesome and phenomenal information.Well done

Nūr

The video with the accompanying article is super helpful to demystify this topic. Very well done. Thank you so much.

Lalah

thank you so much, your presentation helped me a lot

Anjali

I don’t know how should I express that ur article is saviour for me 🥺😍

Saiqa Aftab Tunio

It is well defined information and thanks for sharing. It helps me a lot in understanding the statistical data.

Funeka Mvandaba

I gain a lot and thanks for sharing brilliant ideas, so wish to be linked on your email update.

Rita Kathomi Gikonyo

Very helpful and clear .Thank you Gradcoach.

Hilaria Barsabal

Thank for sharing this article, well organized and information presented are very clear.

AMON TAYEBWA

VERY INTERESTING AND SUPPORTIVE TO NEW RESEARCHERS LIKE ME. AT LEAST SOME BASICS ABOUT QUANTITATIVE.

Tariq

An outstanding, well explained and helpful article. This will help me so much with my data analysis for my research project. Thank you!

chikumbutso

wow this has just simplified everything i was scared of how i am gonna analyse my data but thanks to you i will be able to do so

Idris Haruna

simple and constant direction to research. thanks

Mbunda Castro

This is helpful

AshikB

Great writing!! Comprehensive and very helpful.

himalaya ravi

Do you provide any assistance for other steps of research methodology like making research problem testing hypothesis report and thesis writing?

Sarah chiwamba

Thank you so much for such useful article!

Lopamudra

Amazing article. So nicely explained. Wow

Thisali Liyanage

Very insightfull. Thanks

Melissa

I am doing a quality improvement project to determine if the implementation of a protocol will change prescribing habits. Would this be a t-test?

Aliyah

The is a very helpful blog, however, I’m still not sure how to analyze my data collected. I’m doing a research on “Free Education at the University of Guyana”

Belayneh Kassahun

tnx. fruitful blog!

Suzanne

So I am writing exams and would like to know how do establish which method of data analysis to use from the below research questions: I am a bit lost as to how I determine the data analysis method from the research questions.

Do female employees report higher job satisfaction than male employees with similar job descriptions across the South African telecommunications sector? – I though that maybe Chi Square could be used here. – Is there a gender difference in talented employees’ actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – Is there a gender difference in the cost of actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – What practical recommendations can be made to the management of South African telecommunications companies on leveraging gender to mitigate employee turnover decisions?

Your assistance will be appreciated if I could get a response as early as possible tomorrow

Like

This was quite helpful. Thank you so much.

kidane Getachew

wow I got a lot from this article, thank you very much, keep it up

FAROUK AHMAD NKENGA

Thanks for yhe guidance. Can you send me this guidance on my email? To enable offline reading?

Nosi Ruth Xabendlini

Thank you very much, this service is very helpful.

George William Kiyingi

Every novice researcher needs to read this article as it puts things so clear and easy to follow. Its been very helpful.

Adebisi

Wonderful!!!! you explained everything in a way that anyone can learn. Thank you!!

Miss Annah

I really enjoyed reading though this. Very easy to follow. Thank you

Reza Kia

Many thanks for your useful lecture, I would be really appreciated if you could possibly share with me the PPT of presentation related to Data type?

Protasia Tairo

Thank you very much for sharing, I got much from this article

Fatuma Chobo

This is a very informative write-up. Kindly include me in your latest posts.

naphtal

Very interesting mostly for social scientists

Boy M. Bachtiar

Thank you so much, very helpfull

You’re welcome 🙂

Dr Mafaza Mansoor

woow, its great, its very informative and well understood because of your way of writing like teaching in front of me in simple languages.

Opio Len

I have been struggling to understand a lot of these concepts. Thank you for the informative piece which is written with outstanding clarity.

Eric

very informative article. Easy to understand

Leena Fukey

Beautiful read, much needed.

didin

Always greet intro and summary. I learn so much from GradCoach

Mmusyoka

Quite informative. Simple and clear summary.

Jewel Faver

I thoroughly enjoyed reading your informative and inspiring piece. Your profound insights into this topic truly provide a better understanding of its complexity. I agree with the points you raised, especially when you delved into the specifics of the article. In my opinion, that aspect is often overlooked and deserves further attention.

Shantae

Absolutely!!! Thank you

Thazika Chitimera

Thank you very much for this post. It made me to understand how to do my data analysis.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, other interesting articles.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalize your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

quantitative data analysis tools in research

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalizing your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardized indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organizing data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualizing the relationship between two variables using a scatter plot .

By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

Prevent plagiarism. Run a free check.

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval

Methodology

  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hostile attribution bias
  • Affect heuristic

Is this article helpful?

Other students also liked.

  • Descriptive Statistics | Definitions, Types, Examples
  • Inferential Statistics | An Easy Introduction & Examples
  • Choosing the Right Statistical Test | Types & Examples

More interesting articles

  • Akaike Information Criterion | When & How to Use It (Example)
  • An Easy Introduction to Statistical Significance (With Examples)
  • An Introduction to t Tests | Definitions, Formula and Examples
  • ANOVA in R | A Complete Step-by-Step Guide with Examples
  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Chi-Square (Χ²) Distributions | Definition & Examples
  • Chi-Square (Χ²) Table | Examples & Downloadable Table
  • Chi-Square (Χ²) Tests | Types, Formula & Examples
  • Chi-Square Goodness of Fit Test | Formula, Guide & Examples
  • Chi-Square Test of Independence | Formula, Guide & Examples
  • Coefficient of Determination (R²) | Calculation & Interpretation
  • Correlation Coefficient | Types, Formulas & Examples
  • Frequency Distribution | Tables, Types & Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | 4 Ways with Examples & Explanation
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Mode | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Hypothesis Testing | A Step-by-Step Guide with Easy Examples
  • Interval Data and How to Analyze It | Definitions & Examples
  • Levels of Measurement | Nominal, Ordinal, Interval and Ratio
  • Linear Regression in R | A Step-by-Step Guide & Examples
  • Missing Data | Types, Explanation, & Imputation
  • Multiple Linear Regression | A Quick Guide (Examples)
  • Nominal Data | Definition, Examples, Data Collection & Analysis
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • One-way ANOVA | When and How to Use It (With Examples)
  • Ordinal Data | Definition, Examples, Data Collection & Analysis
  • Parameter vs Statistic | Definitions, Differences & Examples
  • Pearson Correlation Coefficient (r) | Guide & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Probability Distribution | Formula, Types, & Examples
  • Quartiles & Quantiles | Calculation, Definition & Interpretation
  • Ratio Scales | Definition, Examples, & Data Analysis
  • Simple Linear Regression | An Easy Introduction & Examples
  • Skewness | Definition, Examples & Formula
  • Statistical Power and Why It Matters | A Simple Introduction
  • Student's t Table (Free Download) | Guide & Examples
  • T-distribution: What it is and how to use it
  • Test statistics | Definition, Interpretation, and Examples
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Two-Way ANOVA | Examples & When To Use It
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Understanding P values | Definition and Examples
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Kurtosis? | Definition, Examples & Formula
  • What Is Standard Error? | How to Calculate (Guide with Examples)

What is your plagiarism score?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.60(9); 2016 Sep

Basic statistical tools in research and data analysis

Zulfiqar ali.

Department of Anaesthesiology, Division of Neuroanaesthesiology, Sheri Kashmir Institute of Medical Sciences, Soura, Srinagar, Jammu and Kashmir, India

S Bala Bhaskar

1 Department of Anaesthesiology and Critical Care, Vijayanagar Institute of Medical Sciences, Bellary, Karnataka, India

Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

INTRODUCTION

Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population.[ 1 ] This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice.[ 2 ]

Variable is a characteristic that varies from one individual member of population to another individual.[ 3 ] Variables such as height and weight are measured by some type of scale, convey quantitative information and are called as quantitative variables. Sex and eye colour give qualitative information and are called as qualitative variables[ 3 ] [ Figure 1 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g001.jpg

Classification of variables

Quantitative variables

Quantitative or numerical data are subdivided into discrete and continuous measurements. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… (integer), whereas continuous data can assume any value. Observations that can be counted constitute the discrete data and observations that can be measured constitute the continuous data. Examples of discrete data are number of episodes of respiratory arrests or the number of re-intubations in an intensive care unit. Similarly, examples of continuous data are the serial serum glucose levels, partial pressure of oxygen in arterial blood and the oesophageal temperature.

A hierarchical scale of increasing precision can be used for observing and recording the data which is based on categorical, ordinal, interval and ratio scales [ Figure 1 ].

Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be arranged in any particular order. If only two categories exist (as in gender male and female), it is called as a dichotomous (or binary) data. The various causes of re-intubation in an intensive care unit due to upper airway obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary oedema and neurological impairment are examples of categorical variables.

Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal intervals. Examples are the American Society of Anesthesiologists status or Richmond agitation-sedation scale.

Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. A good example of an interval scale is the Fahrenheit degree scale used to measure temperature. With the Fahrenheit scale, the difference between 70° and 75° is equal to the difference between 80° and 85°: The units of measurement are equal throughout the full range of the scale.

Ratio scales are similar to interval scales, in that equal differences between scale values have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property. For example, the system of centimetres is an example of a ratio scale. There is a true zero point and the value of 0 cm means a complete absence of length. The thyromental distance of 6 cm in an adult may be twice that of a child in whom it may be 3 cm.

STATISTICS: DESCRIPTIVE AND INFERENTIAL STATISTICS

Descriptive statistics[ 4 ] try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics[ 4 ] use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and inferential statistics are illustrated in Table 1 .

Example of descriptive and inferential statistics

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g002.jpg

Descriptive statistics

The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.

Measures of central tendency

The measures of central tendency are mean, median and mode.[ 6 ] Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores. Mean may be influenced profoundly by the extreme variables. For example, the average stay of organophosphorus poisoning patients in ICU may be influenced by a single patient who stays in ICU for around 5 months because of septicaemia. The extreme values are called outliers. The formula for the mean is

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g003.jpg

where x = each observation and n = number of observations. Median[ 6 ] is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value) while mode is the most frequently occurring variable in a distribution. Range defines the spread, or variability, of a sample.[ 7 ] It is described by the minimum and maximum values of the variables. If we rank the data and after ranking, group the observations into percentiles, we can get better information of the pattern of spread of the variables. In percentiles, we rank the observations into 100 equal parts. We can then describe 25%, 50%, 75% or any other percentile amount. The median is the 50 th percentile. The interquartile range will be the observations in the middle 50% of the observations about the median (25 th -75 th percentile). Variance[ 7 ] is a measure of how spread out is the distribution. It gives an indication of how close an individual observation clusters about the mean value. The variance of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g004.jpg

where σ 2 is the population variance, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The variance of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g005.jpg

where s 2 is the sample variance, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. The formula for the variance of a population has the value ‘ n ’ as the denominator. The expression ‘ n −1’ is known as the degrees of freedom and is one less than the number of parameters. Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used. The square root of the variance is the standard deviation (SD).[ 8 ] The SD of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g006.jpg

where σ is the population SD, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The SD of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g007.jpg

where s is the sample SD, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. An example for calculation of variation and SD is illustrated in Table 2 .

Example of mean, variance, standard deviation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g008.jpg

Normal distribution or Gaussian distribution

Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point.[ 1 ] The standard normal distribution curve is a symmetrical bell-shaped. In a normal distribution curve, about 68% of the scores are within 1 SD of the mean. Around 95% of the scores are within 2 SDs of the mean and 99% within 3 SDs of the mean [ Figure 2 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g009.jpg

Normal distribution curve

Skewed distribution

It is a distribution with an asymmetry of the variables about its mean. In a negatively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the right of Figure 1 . In a positively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the left of the figure leading to a longer right tail.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g010.jpg

Curves showing negatively skewed and positively skewed distribution

Inferential statistics

In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. Hypothesis tests are thus procedures for making rational decisions about the reality of observed effects.

Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty).

In inferential statistics, the term ‘null hypothesis’ ( H 0 ‘ H-naught ,’ ‘ H-null ’) denotes that there is no relationship (difference) between the population variables in question.[ 9 ]

Alternative hypothesis ( H 1 and H a ) denotes that a statement between the variables is expected to be true.[ 9 ]

The P value (or the calculated probability) is the probability of the event occurring by chance if the null hypothesis is true. The P value is a numerical between 0 and 1 and is interpreted by researchers in deciding whether to reject or retain the null hypothesis [ Table 3 ].

P values with interpretation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g011.jpg

If P value is less than the arbitrarily chosen value (known as α or the significance level), the null hypothesis (H0) is rejected [ Table 4 ]. However, if null hypotheses (H0) is incorrectly rejected, this is known as a Type I error.[ 11 ] Further details regarding alpha error, beta error and sample size calculation and factors influencing them are dealt with in another section of this issue by Das S et al .[ 12 ]

Illustration for null hypothesis

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g012.jpg

PARAMETRIC AND NON-PARAMETRIC TESTS

Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.[ 13 ]

Two most basic prerequisites for parametric statistical analysis are:

  • The assumption of normality which specifies that the means of the sample group are normally distributed
  • The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal.

However, if the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric[ 14 ] statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.

Parametric tests

The parametric tests assume that the data are on a quantitative (numerical) scale, with a normal distribution of the underlying population. The samples have the same variance (homogeneity of variances). The samples are randomly drawn from the population, and the observations within a group are independent of each other. The commonly used parametric tests are the Student's t -test, analysis of variance (ANOVA) and repeated measures ANOVA.

Student's t -test

Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three circumstances:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g013.jpg

where X = sample mean, u = population mean and SE = standard error of mean

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g014.jpg

where X 1 − X 2 is the difference between the means of the two groups and SE denotes the standard error of the difference.

  • To test if the population means estimated by two dependent samples differ significantly (the paired t -test). A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment.

The formula for paired t -test is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g015.jpg

where d is the mean difference and SE denotes the standard error of this difference.

The group variances can be compared using the F -test. The F -test is the ratio of variances (var l/var 2). If F differs significantly from 1.0, then it is concluded that the group variances differ significantly.

Analysis of variance

The Student's t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups.

In ANOVA, we study two variances – (a) between-group variability and (b) within-group variability. The within-group variability (error variance) is the variation that cannot be accounted for in the study design. It is based on random differences present in our samples.

However, the between-group (or effect variance) is the result of our treatment. These two estimates of variances are compared using the F-test.

A simplified formula for the F statistic is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g016.jpg

where MS b is the mean squares between the groups and MS w is the mean squares within groups.

Repeated measures analysis of variance

As with ANOVA, repeated measures ANOVA analyses the equality of means of three or more groups. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time.

As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: The data violate the ANOVA assumption of independence. Hence, in the measurement of repeated dependent variables, repeated measures ANOVA should be used.

Non-parametric tests

When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests (distribution-free test) are used in such situation as they do not require the normality assumption.[ 15 ] Non-parametric tests may fail to detect a significant difference when compared with a parametric test. That is, they usually have less power.

As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5 .

Analogue of parametric and non-parametric tests

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g017.jpg

Median test for one sample: The sign test and Wilcoxon's signed rank test

The sign test and Wilcoxon's signed rank test are used for median tests of one sample. These tests examine whether one instance of sample data is greater or smaller than the median reference value.

This test examines the hypothesis about the median θ0 of a population. It tests the null hypothesis H0 = θ0. When the observed value (Xi) is greater than the reference value (θ0), it is marked as+. If the observed value is smaller than the reference value, it is marked as − sign. If the observed value is equal to the reference value (θ0), it is eliminated from the sample.

If the null hypothesis is true, there will be an equal number of + signs and − signs.

The sign test ignores the actual values of the data and only uses + or − signs. Therefore, it is useful when it is difficult to measure the values.

Wilcoxon's signed rank test

There is a major limitation of sign test as we lose the quantitative information of the given data and merely use the + or – signs. Wilcoxon's signed rank test not only examines the observed values in comparison with θ0 but also takes into consideration the relative sizes, adding more statistical power to the test. As in the sign test, if there is an observed value that is equal to the reference value θ0, this observed value is eliminated from the sample.

Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums.

Mann-Whitney test

It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other.

Mann–Whitney test compares all data (xi) belonging to the X group and all data (yi) belonging to the Y group and calculates the probability of xi being greater than yi: P (xi > yi). The null hypothesis states that P (xi > yi) = P (xi < yi) =1/2 while the alternative hypothesis states that P (xi > yi) ≠1/2.

Kolmogorov-Smirnov test

The two-sample Kolmogorov-Smirnov (KS) test was designed as a generic method to test whether two random samples are drawn from the same distribution. The null hypothesis of the KS test is that both distributions are identical. The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves.

Kruskal-Wallis test

The Kruskal–Wallis test is a non-parametric test to analyse the variance.[ 14 ] It analyses if there is any difference in the median values of three or more independent samples. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic.

Jonckheere test

In contrast to Kruskal–Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal–Wallis test.[ 14 ]

Friedman test

The Friedman test is a non-parametric test for testing the difference between several related samples. The Friedman test is an alternative for repeated measures ANOVAs which is used when the same parameter has been measured under different conditions on the same subjects.[ 13 ]

Tests to analyse the categorical data

Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data if there were no differences between groups (i.e., the null hypothesis). It is calculated by the sum of the squared difference between observed ( O ) and the expected ( E ) data (or the deviation, d ) divided by the expected data by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g018.jpg

A Yates correction factor is used when the sample size is small. Fischer's exact test is used to determine if there are non-random associations between two categorical variables. It does not assume random sampling, and instead of referring a calculated statistic to a sampling distribution, it calculates an exact probability. McNemar's test is used for paired nominal data. It is applied to 2 × 2 table with paired-dependent samples. It is used to determine whether the row and column frequencies are equal (that is, whether there is ‘marginal homogeneity’). The null hypothesis is that the paired proportions are equal. The Mantel-Haenszel Chi-square test is a multivariate test as it analyses multiple grouping variables. It stratifies according to the nominated confounding variables and identifies any that affects the primary outcome variable. If the outcome variable is dichotomous, then logistic regression is used.

SOFTWARES AVAILABLE FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

Numerous statistical software systems are available currently. The commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System ((SAS – developed by SAS Institute North Carolina, United States of America), R (designed by Ross Ihaka and Robert Gentleman from R core team), Minitab (developed by Minitab Inc), Stata (developed by StataCorp) and the MS Excel (developed by Microsoft).

There are a number of web resources which are related to statistical power analyses. A few are:

  • StatPages.net – provides links to a number of online power calculators
  • G-Power – provides a downloadable power analysis program that runs under DOS
  • Power analysis for ANOVA designs an interactive site that calculates power or sample size needed to attain a given power for one effect in a factorial ANOVA design
  • SPSS makes a program called SamplePower. It gives an output of a complete report on the computer screen which can be cut and paste into another document.

It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an appropriately well-designed study leading to valid and reliable results. Inappropriate use of statistical techniques may lead to faulty conclusions, inducing errors and undermining the significance of the article. Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research which can be utilised for formulating the evidence-based guidelines.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Learn / Guides / Quantitative data analysis guide

Back to guides

The ultimate guide to quantitative data analysis

Numbers help us make sense of the world. We collect quantitative data on our speed and distance as we drive, the number of hours we spend on our cell phones, and how much we save at the grocery store.

Our businesses run on numbers, too. We spend hours poring over key performance indicators (KPIs) like lead-to-client conversions, net profit margins, and bounce and churn rates.

But all of this quantitative data can feel overwhelming and confusing. Lists and spreadsheets of numbers don’t tell you much on their own—you have to conduct quantitative data analysis to understand them and make informed decisions.

Last updated

Reading time.

quantitative data analysis tools in research

This guide explains what quantitative data analysis is and why it’s important, and gives you a four-step process to conduct a quantitative data analysis, so you know exactly what’s happening in your business and what your users need .

Collect quantitative customer data with Hotjar

Use Hotjar’s tools to gather the customer insights you need to make quantitative data analysis a breeze.

What is quantitative data analysis? 

Quantitative data analysis is the process of analyzing and interpreting numerical data. It helps you make sense of information by identifying patterns, trends, and relationships between variables through mathematical calculations and statistical tests. 

With quantitative data analysis, you turn spreadsheets of individual data points into meaningful insights to drive informed decisions. Columns of numbers from an experiment or survey transform into useful insights—like which marketing campaign asset your average customer prefers or which website factors are most closely connected to your bounce rate. 

Without analytics, data is just noise. Analyzing data helps you make decisions which are informed and free from bias.

What quantitative data analysis is not

But as powerful as quantitative data analysis is, it’s not without its limitations. It only gives you the what, not the why . For example, it can tell you how many website visitors or conversions you have on an average day, but it can’t tell you why users visited your site or made a purchase.

For the why behind user behavior, you need qualitative data analysis , a process for making sense of qualitative research like open-ended survey responses, interview clips, or behavioral observations. By analyzing non-numerical data, you gain useful contextual insights to shape your strategy, product, and messaging. 

Quantitative data analysis vs. qualitative data analysis 

Let’s take an even deeper dive into the differences between quantitative data analysis and qualitative data analysis to explore what they do and when you need them.

quantitative data analysis tools in research

The bottom line: quantitative data analysis and qualitative data analysis are complementary processes. They work hand-in-hand to tell you what’s happening in your business and why.  

💡 Pro tip: easily toggle between quantitative and qualitative data analysis with Hotjar Funnels . 

The Funnels tool helps you visualize quantitative metrics like drop-off and conversion rates in your sales or conversion funnel to understand when and where users leave your website. You can break down your data even further to compare conversion performance by user segment.

Spot a potential issue? A single click takes you to relevant session recordings , where you see user behaviors like mouse movements, scrolls, and clicks. With this qualitative data to provide context, you'll better understand what you need to optimize to streamline the user experience (UX) and increase conversions .

Hotjar Funnels lets you quickly explore the story behind the quantitative data

4 benefits of quantitative data analysis

There’s a reason product, web design, and marketing teams take time to analyze metrics: the process pays off big time. 

Four major benefits of quantitative data analysis include:

1. Make confident decisions 

With quantitative data analysis, you know you’ve got data-driven insights to back up your decisions . For example, if you launch a concept testing survey to gauge user reactions to a new logo design, and 92% of users rate it ‘very good’—you'll feel certain when you give the designer the green light. 

Since you’re relying less on intuition and more on facts, you reduce the risks of making the wrong decision. (You’ll also find it way easier to get buy-in from team members and stakeholders for your next proposed project. 🙌)

2. Reduce costs

By crunching the numbers, you can spot opportunities to reduce spend . For example, if an ad campaign has lower-than-average click-through rates , you might decide to cut your losses and invest your budget elsewhere. 

Or, by analyzing ecommerce metrics , like website traffic by source, you may find you’re getting very little return on investment from a certain social media channel—and scale back spending in that area.

3. Personalize the user experience

Quantitative data analysis helps you map the customer journey , so you get a better sense of customers’ demographics, what page elements they interact with on your site, and where they drop off or convert . 

These insights let you better personalize your website, product, or communication, so you can segment ads, emails, and website content for specific user personas or target groups.

4. Improve user satisfaction and delight

Quantitative data analysis lets you see where your website or product is doing well—and where it falls short for your users . For example, you might see stellar results from KPIs like time on page, but conversion rates for that page are low. 

These quantitative insights encourage you to dive deeper into qualitative data to see why that’s happening—looking for moments of confusion or frustration on session recordings, for example—so you can make adjustments and optimize your conversions by improving customer satisfaction and delight.

💡Pro tip: use Net Promoter Score® (NPS) surveys to capture quantifiable customer satisfaction data that’s easy for you to analyze and interpret. 

With an NPS tool like Hotjar, you can create an on-page survey to ask users how likely they are to recommend you to others on a scale from 0 to 10. (And for added context, you can ask follow-up questions about why customers selected the rating they did—rich qualitative data is always a bonus!)

quantitative data analysis tools in research

Hotjar graphs your quantitative NPS data to show changes over time

4 steps to effective quantitative data analysis 

Quantitative data analysis sounds way more intimidating than it actually is. Here’s how to make sense of your company’s numbers in just four steps:

1. Collect data

Before you can actually start the analysis process, you need data to analyze. This involves conducting quantitative research and collecting numerical data from various sources, including: 

Interviews or focus groups 

Website analytics

Observations, from tools like heatmaps or session recordings

Questionnaires, like surveys or on-page feedback widgets

Just ensure the questions you ask in your surveys are close-ended questions—providing respondents with select choices to choose from instead of open-ended questions that allow for free responses.

quantitative data analysis tools in research

Hotjar’s pricing plans survey template provides close-ended questions

 2. Clean data

Once you’ve collected your data, it’s time to clean it up. Look through your results to find errors, duplicates, and omissions. Keep an eye out for outliers, too. Outliers are data points that differ significantly from the rest of the set—and they can skew your results if you don’t remove them.

By taking the time to clean your data set, you ensure your data is accurate, consistent, and relevant before it’s time to analyze. 

3. Analyze and interpret data

At this point, your data’s all cleaned up and ready for the main event. This step involves crunching the numbers to find patterns and trends via mathematical and statistical methods. 

Two main branches of quantitative data analysis exist: 

Descriptive analysis : methods to summarize or describe attributes of your data set. For example, you may calculate key stats like distribution and frequency, or mean, median, and mode.

Inferential analysis : methods that let you draw conclusions from statistics—like analyzing the relationship between variables or making predictions. These methods include t-tests, cross-tabulation, and factor analysis. (For more detailed explanations and how-tos, head to our guide on quantitative data analysis methods.)

Then, interpret your data to determine the best course of action. What does the data suggest you do ? For example, if your analysis shows a strong correlation between email open rate and time sent, you may explore optimal send times for each user segment.

4. Visualize and share data

Once you’ve analyzed and interpreted your data, create easy-to-read, engaging data visualizations—like charts, graphs, and tables—to present your results to team members and stakeholders. Data visualizations highlight similarities and differences between data sets and show the relationships between variables.

Software can do this part for you. For example, the Hotjar Dashboard shows all of your key metrics in one place—and automatically creates bar graphs to show how your top pages’ performance compares. And with just one click, you can navigate to the Trends tool to analyze product metrics for different segments on a single chart. 

Hotjar Trends lets you compare metrics across segments

Discover rich user insights with quantitative data analysis

Conducting quantitative data analysis takes a little bit of time and know-how, but it’s much more manageable than you might think. 

By choosing the right methods and following clear steps, you gain insights into product performance and customer experience —and you’ll be well on your way to making better decisions and creating more customer satisfaction and loyalty.

FAQs about quantitative data analysis

What is quantitative data analysis.

Quantitative data analysis is the process of making sense of numerical data through mathematical calculations and statistical tests. It helps you identify patterns, relationships, and trends to make better decisions.

How is quantitative data analysis different from qualitative data analysis?

Quantitative and qualitative data analysis are both essential processes for making sense of quantitative and qualitative research .

Quantitative data analysis helps you summarize and interpret numerical results from close-ended questions to understand what is happening. Qualitative data analysis helps you summarize and interpret non-numerical results, like opinions or behavior, to understand why the numbers look like they do.

 If you want to make strong data-driven decisions, you need both.

What are some benefits of quantitative data analysis?

Quantitative data analysis turns numbers into rich insights. Some benefits of this process include: 

Making more confident decisions

Identifying ways to cut costs

Personalizing the user experience

Improving customer satisfaction

What methods can I use to analyze quantitative data?

Quantitative data analysis has two branches: descriptive statistics and inferential statistics. 

Descriptive statistics provide a snapshot of the data’s features by calculating measures like mean, median, and mode. 

Inferential statistics , as the name implies, involves making inferences about what the data means. Dozens of methods exist for this branch of quantitative data analysis, but three commonly used techniques are: 

Cross tabulation

Factor analysis

Quantitative Analysis Guide: Which Statistical Software to Use?

  • Finding Data
  • Which Statistical Software to Use?
  • Merging Data Sets
  • Reshaping Data Sets
  • Choose Statistical Test for 1 Dependent Variable
  • Choose Statistical Test for 2 or More Dependent Variables

NYU Data Services, NYU Libraries & Information Technology

  • Data Services Home Page

Statistical Software Comparison

  • What statistical test to use?
  • Data Visualization Resources
  • Data Analysis Examples External (UCLA) examples of regression and power analysis
  • Supported software
  • Request a consultation
  • Making your code reproducible

Software Access

  • The first version of SPSS was developed by  Norman H. Nie, Dale H. Bent and C.  Hadlai  Hull in and released in 1968 as the Statistical Package for Social Sciences.
  • In July 2009, IBM acquired SPSS.
  • Social sciences
  • Health sciences

Data Format and Compatibility

  • .sav file to save data
  • Optional syntax files (.sps)
  • Easily export .sav file from Qualtrics
  • Import Excel files (.xls, .xlsx), Text files (.csv, .txt, .dat), SAS (.sas7bdat), Stata (.dta)
  • Export Excel files (.xls, .xlsx), Text files (.csv, .dat), SAS (.sas7bdat), Stata (.dta)
  • SPSS Chart Types
  • Chart Builder: Drag and drop graphics
  • Easy and intuitive user interface; menus and dialog boxes
  • Similar feel to Excel
  • SEMs through SPSS Amos
  • Easily exclude data and handle missing data

Limitations

  • Absence of robust methods (e.g...Least Absolute Deviation Regression, Quantile Regression, ...)
  • Unable to perform complex many to many merge

Sample Data

  • Developed by SAS 
  • Created in the 1980s by John Sall to take advantage of the graphical user interface introduced by Macintosh
  • Orginally stood for 'John's Macintosh Program'
  • Five products: JMP, JMP Pro, JMP Clinical, JMP Genomics, JMP Graph Builder App
  • Engineering: Six Sigma, Quality Control, Scientific Research, Design of Experiments
  • Healthcare/Pharmaceutical
  • .jmp file to save data
  • Optional syntax files (.jsl)
  • Import Excel files (.xls, .xlsx), Text files (.csv, .txt, .dat), SAS (.sas7bdat), Stata (.dta), SPSS (.sav)
  • Export Excel files (.xls, .xlsx), Text files (.csv, .dat), SAS (.sas7bdat)
  • Gallery of JMP Graphs
  • Drag and Drop Graph Editor will try to guess what chart is correct for your data
  • Dynamic interface can be used to zoom and change view
  • Ability to lasso outliers on a graph and regraph without the outliers
  • Interactive Graphics
  • Scripting Language (JSL)
  • SAS, R and MATLAB can be executed using JSL
  • Interface for using R from within and add-in for Excel
  • Great interface for easily managing output
  • Graphs and data tables are dynamically linked
  • Great set of online resources!
  • Absence of some robust methods (regression: 2SLS, LAD, Quantile)

  • Stata was first released in January 1985 as a regression and data management package with 44 commands, written by Bill Gould and Sean Becketti. 
  • The name Stata is a syllabic abbreviation of the words  statistics and data.
  • The graphical user interface (menus and dialog boxes) was released in 2003.
  • Political Science
  • Public Health
  • Data Science
  • Who uses Stata?

Data Format and Compatibility

  • .dta file to save dataset
  • .do syntax file, where commands can be written and saved
  • Import Excel files (.xls, .xlsx), Text files (.txt, .csv, .dat), SAS (.XPT), Other (.XML), and various ODBC data sources
  • Export  Excel files  (.xls, . xlsx ), Text files (.txt, .csv, .dat), SAS (.XPT),  Other (.XML),  and various ODBC data sources
  • Newer versions of  Stata  can read datasets, commands, graphs, etc., from older versions, and in doing so, reproduce results 
  • Older versions of Stata cannot read newer versions of Stata datasets,  but newer versions can save in the format of older versions
  • Stata Graph Gallery
  • UCLA - Stata Graph Gallery
  • Syntax mainly used, but menus are an option as well
  • Some user written programs are available to install
  • Offers matrix programming in Mata
  • Works well with panel, survey, and time-series data
  • Data management
  • Can only hold one dataset in memory at a time
  • The specific Stata package ( Stata/IC, Stata/SE, and Stata/MP ) limits the size of usable datasets.  One may have to sacrifice the number of variables for the number of observations, or vice versa, depending on the package.
  • Overall, graphs have limited flexibility.   Stata schemes , however, provide some flexibility in changing the style of the graphs.
  • Sample Syntax

* First enter the data manually; input str10 sex test1 test2    "Male" 86 83    "Male" 93 79    "Male" 85 81    "Male" 83 80    "Male" 91 76    "Female" 94 79    "Fem ale" 91 94    "Fem ale" 83 84    "Fem ale" 96 81    "Fem ale" 95 75 end

*   Next run a paired t-test; ttest test1 == test2

* Create a scatterplot; twoway ( scatter test2 test1 if sex == "Male" ) ( scatter test2 test1 if sex == "Fem ale" ), legend (lab(1 "Male" ) lab(2 "Fem ale" ))

  • The development of SAS (Statistical Analysis System) began in 1966 by Anthony Bar of North Carolina State University and later joined by James Goodnight. 
  • The National Institute of Health funded this project with a goal of analyzing agricultural data to improve crop yields.
  • The first release of SAS was in 1972. In 2012, SAS held 36.2% of the market making it the largest market-share holder in 'advanced analytics.'
  • Financial Services
  • Manufacturing
  • Health and Life Sciences
  • Available for Windows only
  • Import Excel files (.xls, .xlsx), Text files (.txt, .dat, .csv), SPSS (.sav), Stata (.dta), JMP (.jmp), Other (.xml)
  • Export  Excel files (.xls, . xlsx ), Text files (.txt, .dat, .csv),  SPSS  (.sav),  Stata  (.dta), JMP (.jmp),  Other (.xml)
  • SAS Graphics Samples Output Gallery
  • Can be cumbersome at times to create perfect graphics with syntax
  • ODS Graphics Designer provides a more interactive interface
  • BASE SAS contains the data management facility, programming language, data analysis and reporting tools
  • SAS Libraries collect the SAS datasets you create
  • Multitude of additional  components are available to complement Base SAS which include SAS/GRAPH, SAS/PH (Clinical Trial Analysis), SAS/ETS (Econometrics and Time Series), SAS/Insight (Data Mining) etc...
  • SAS Certification exams
  • Handles extremely large datasets
  • Predominantly used for data management and statistical procedures
  • SAS has two main types of code; DATA steps and  PROC  steps
  • With one procedure, test results, post estimation and plots can be produced
  • Size of datasets analyzed is only limited by the machine

Limitations 

  • Graphics can be cumbersome to manipulate
  • Since SAS is a proprietary software, there may be an extensive lag time for the implementation of new methods
  • Documentation and books tend to be very technical and not necessarily new user friendly

* First enter the data manually; data example;    input  sex $ test1 test2;   datalines ;     M 86 83     M 93 79     M 85 81     M 83 80     M 91 76     F 94 79     F 91 94     F 83 84     F 96 81     F 95 75    ; run ;

*   Next run a paired t-test; proc ttest data = example;   paired test1*test2; run ;

* Create a scatterplot; proc sgplot data = example;   scatter y = test1 x = test2 / group = sex; run ;

  • R first appeared in 1993 and was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. 
  • R is an implementation of the S programming language which was developed at Bell Labs.
  • It is named partly after its first authors and partly as a play on the name of S.
  • R is currently developed by the R Development Core Team. 
  • RStudio, an integrated development environment (IDE) was first released in 2011.
  • Companies Using R
  • Finance and Economics
  • Bioinformatics
  • Import Excel files (.xls, .xlsx), Text files (.txt, .dat, .csv), SPSS (.sav), Stata (.dta), SAS(.sas7bdat), Other (.xml, .json)
  • Export Excel files (.xlsx), Text files (.txt, .csv), SPSS (.sav), Stata (.dta), Other (.json)
  • ggplot2 package, grammar of graphics
  • Graphs available through ggplot2
  • The R Graph Gallery
  • Network analysis (igraph)
  • Flexible esthetics and options
  • Interactive graphics with Shiny
  • Many available packages to create field specific graphics
  • R is a free and open source
  • Over 6000 user contributed packages available through  CRAN
  • Large online community
  • Network Analysis, Text Analysis, Data Mining, Web Scraping 
  • Interacts with other software such as, Python, Bioconductor, WinBUGS, JAGS etc...
  • Scope of functions, flexible, versatile etc..

Limitations​

  • Large online help community but no 'formal' tech support
  • Have to have a good understanding of different data types before real ease of use begins
  • Many user written packages may be hard to sift through

# Manually enter the data into a dataframe dataset <- data.frame(sex = c("Male", "Male", "Male", "Male", "Male", "Female", "Female", "Female", "Female", "Female"),                        test1 = c( 86 , 93 , 85 , 83 , 91 , 94 , 91 , 83 , 96 , 95 ),                        test2 = c( 83 , 79 , 81 , 80 , 76 , 79 , 94 , 84 , 81 , 75 ))

# Now we will run a paired t-test t.test(dataset$test1, dataset$test2, paired = TRUE )

# Last let's simply plot these two test variables plot(dataset$test1, dataset$test2, col = c("red","blue")[dataset$sex]) legend("topright", fill = c("blue", "red"), c("Male", "Female"))

# Making the same graph using ggplot2 install.packages('ggplot2') library(ggplot2) mygraph <- ggplot(data = dataset, aes(x = test1, y = test2, color = sex)) mygraph + geom_point(size = 5) + ggtitle('Test1 versus Test2 Scores')

  • Cleave Moler of the University of New Mexico began development in the late 1970s.
  • With the help of Jack Little, they cofounded MathWorks and released MATLAB (matrix laboratory) in 1984. 
  • Education (linear algebra and numerical analysis)
  • Popular among scientists involved in image processing
  • Engineering
  • .m Syntax file
  • Import Excel files (.xls, .xlsx), Text files (.txt, .dat, .csv), Other (.xml, .json)
  • Export Excel files (.xls, .xlsx), Text files (.txt, .dat, .csv), Other (.xml, .json)
  • MATLAB Plot Gallery
  • Customizable but not point-and-click visualization
  • Optimized for data analysis, matrix manipulation in particular
  • Basic unit is a matrix
  • Vectorized operations are quick
  • Diverse set of available toolboxes (apps) [Statistics, Optimization, Image Processing, Signal Processing, Parallel Computing etc..]
  • Large online community (MATLAB Exchange)
  • Image processing
  • Vast number of pre-defined functions and implemented algorithms
  • Lacks implementation of some advanced statistical methods
  • Integrates easily with some languages such as C, but not others, such as Python
  • Limited GIS capabilities

sex = { 'Male' , 'Male' , 'Male' , 'Male' , 'Male' , 'Female' , 'Female' , 'Female' , 'Female' , 'Female' }; t1 = [86,93,85,83,91,94,91,83,96,95]; t2 = [83,79,81,80,76,79,94,84,81,75];

% paired t-test [h,p,ci,stats] = ttest(t1,t2)

% independent samples t-test sex = categorical(sex); [h,p,ci,stats] = ttest2(t1(sex== 'Male' ),t1(sex== 'Female' ))

plot(t1,t2, 'o' ) g = sex== 'Male' ; plot(t1(g),t2(g), 'bx' ); hold on; plot(t1(~g),t2(~g), 'ro' )

Software Features and Capabilities

*The primary interface is bolded in the case of multiple interface types available.

Learning Curve

Cartoon representation of learning difficulty of various quantitative software

Further Reading

  • The Popularity of Data Analysis Software
  • Statistical Software Capability Table
  • The SAS versus R Debate in Industry and Academia
  • Why R has a Steep Learning Curve
  • Comparison of Data Analysis Packages
  • Comparison of Statistical Packages
  • MATLAB commands in Python and R
  • MATLAB and R Side by Side
  • Stata and R Side by Side

Creative Commons License logo.

  • << Previous: Statistical Guidance
  • Next: Merging Data Sets >>
  • Last Updated: Jan 22, 2024 2:07 PM
  • URL: https://guides.nyu.edu/quant

quantitative data analysis tools in research

  • Full funnel reporting
  • Customized white label dashboards
  • Custom connectors
  • Bad data detection
  • Solutions By Role PPC SEO Social Media CMO BI-IT Sales Vertical marketing Web analytics By Industry Marketing Agencies SaaS eCommerce
  • Case Studies
  • CEO Dashboard
  • Sales Dashboard
  • Schedule Demo
  • 101 Guide to Quantitative Data Analysis [Methods + Techniques]
  • March 8, 2023

quantitative data analysis tools in research

Quantitative data analysis comes with the challenge of analyzing large datasets consisting of numeric variables and statistics. Researchers often get overwhelmed by various techniques, methods, and data sources. 

At the same time, the importance of data collection and analysis drastically increases. It helps improve current products/services, identify the potential for a new product, understand target market psychology, and plan upcoming campaigns. 

We have compiled this in-depth guide to ensure you get over the complexities of quantitative data analysis. Let’s begin with the basic meaning and importance of quantitative data analysis. 

Quantitative data analysis meaning 

Quantitative data analysis evaluates quantifiable and structured data to obtain simplified results. Analysts aim to interpret and draw conclusions from numeric variables and statistics. The entire analytical process works on algorithms and data analytics software, helping gain valuable insights. Continuous values are broken into parts for easy understanding using various tools and software. Such data is extracted through surveys and questionnaires. 

However, data analytics software also helps extract quantitative data through email campaigns, websites, and social media. 

Qualitative Vs. Quantitative research: major differences 

Qualitative research aims at extracting valuable insights through non-numerical data like the psychology of customers . The research aims at obtaining solid results and confirming assumptions on general ideas. Furthermore, the collected data presentation remains descriptive instead of numerical-centric.

On the other hand, quantitative research focuses on numbers and statistics to identify gaps in current marketing and operational methods. It successfully answers questions like how many leads are converted in a specific email campaign. 

Collecting quantitative data includes surveys, polls, questionnaires, etc. It remains efficient in identifying trends and patterns in the collected data. However, the obtained results aren’t always accurate as there are chances of numerical errors. 

Note: both quantitative and qualitative data can be obtained with surveys. However, qualitative data collection focuses on asking open-ended questions.

In contrast, quantitative research focuses on close-ended ones. 

4-Step Process of Quantitative data analysis 

Now that we understand the meaning of quantitative data analysis, let’s proceed with four simple steps for conducting it. 

Step 1: Identifying your goals and objectives.

Start by analyzing current business problems and the ones you plan to address with your analysis. 

For example , your customer churn rate significantly increased in the last month.

Do you want to identify the reasons behind it? Being clear with the objective helps in collecting and analyzing relevant data. 

Step 2: Data collection 

Now you are clear with the issue you plan to address. Let’s identify and collect data from all relevant sources. 

For example , conducting a survey of MCQs targeting all possible reasons behind the increase in the churn rate. Identify all relevant data sources and collect data for further analysis. 

Step 3: Data cleaning 

As discussed earlier, quantitative data doesn’t remain highly accurate as they are always chances of errors. Due to this, quantitative data analysis goes through many stages of cleaning. 

Firstly, analysts start with data validation to identify if the data was collected based on defined procedures. Secondly, large datasets require a lot of editing to identify errors like empty fields or wrongly inserted digits. 

Remember that the collected data consists of many duplications, unwanted data points, a lack of structure, and major gaps you must eliminate. Lastly, the collected data is presented in structured formats like tables for easy analysis. 

Step 4: Data Analysis and interpretation 

Now, you are equipped with fairly accurate data sets required for analysis. Using tools and software, data analysts interpret the collected data to draw valuable conclusions. Many techniques are used for quantitative data analysis, including time-series analysis, regression analysis, etc. However, applying the techniques correctly play a greater role than the type of technique used. 

What are the methods for Quantitative research data collection? 

Now that we know the meaning of quantitative data collection, let’s look at some methods of collecting it:

Surveys: close-ended questions

Surveys remain one of the most common methods of quantitative research data collection. These surveys include super-specific questions where respondents answer yes/no or multiple-choice questions. Most companies are going with rating questions or checklist-type survey questions. 

Conducting interviews 

Interviews remain another commonly used method for quantitative data collection. The interviews remain structured with specific questions. Telephone interviews were generally preferred until the introduction of video interviews using Skype, Zoom, etc. Some researchers also go with face-to-face interviews to collect quality data. 

Analytical tools 

Manually collecting and analyzing large datasets remain inconvenient. Many different analytical tools are available to collect, analyze, interpret, and visualize a large amount of data. For instance, tools like GrowthNirvana remain effective in efficient marketing analytics and provide relevant data without delays. All valuable insights required for business growth are easily extracted to make quicker decisions. 

Document review: analyzing primary data

Researchers often analyze the available primary data like public records. The findings support the results generated from other quantitative data collection methods. 

Methods and techniques of quantitative data analysis 

The two common methods used for quantitative data analysis are descriptive and inferential statistics. Analysts use both methods to generate valuable insights. Here’s how: 

Descriptive statistics 

This method describes a dataset and provides an initial idea to help researchers identify potential trends or patterns. It generally focuses on analyzing single variables and explains the details of specific datasets. There are two ways of analyzing data using the descriptive statistics technique- numerical and graphical representation. Let’s start with the numerical method of quantitative data analysis. 

  • Numerical Method 

The numerical method organizes and evaluates data arithmetically to obtain simpler answers to complex problems. Describing data remains the easiest using the measure of central tendency and dispersion. 

The measures of central tendency like mean, median, and mode help identify the collected data’s central position. On the flip side, the measures of dispersion, like range, standard deviation, variance, etc., help understand the extent of data distribution concerning the central point. 

  • Graphical method 

The graphical method provides a better understanding of data through visual representation. Evaluating large data sets becomes easier if presented using a bar chart, pie chart, histogram, boxplot, etc. 

Inferential statistics 

Conducting only descriptive analytics isn’t enough to draw valuable conclusions from the collected data. It only provides limited information on the datasets, emphasizing inferential statistics’ importance. 

Inferential statistics make predictions using the data generated through descriptive statistics. It helps establish relationships between different variables to make relevant predictions. The technique remains suitable for large datasets. 

Certain samples of the data are taken to represent the entire set as evaluating large data remains hectic. Therefore, the summarized samples generated with descriptive statistics are used to draw valuable conclusions. 

Let’s now focus on some commonly used methods in inferential statistics: 

  • Regression analysis 

The method establishes a relationship between a dependent and independent variable(s). It assesses their current strength and predicts future possibilities to devise enhanced strategies. The most commonly used regression models are simple linear and multiple linear. 

  • Cross tabulations 

Cross tabulation or contingency table method is one of the most used methods for market research. It assists in the easy analysis of two or more variables through systematic rows and columns. Furthermore, the major goal of cross tabs remains intact in showing the changes in a dependent variable based on different subgroups. 

  • Monte Carlo method 

The method focuses on weighing all possible outcomes of specific scenarios. Analyzing the pros and cons helps in predicting advanced risks before taking action. Therefore, forecasting future risks based on changing scenarios improves decision-making. 

  • SWOT analysis 

A SWOT analysis identifies an organization’s strengths, weaknesses, opportunities, and threats. It takes into consideration internal and external factors to make better business plans. Companies often conduct a SWOT analysis to improve their products and services or while initiating a new project. 

  • Time Series analysis 

Time series or trend analysis evaluates data sets recorded within specific intervals. Instead of taking random samples, the data is recorded in given time frames. Companies often use time series analysis for forecasting demand and supply. 

What are examples of Quantitative data?

Let’s now look at some examples of quantitative data: 

  • Total number of app downloads in a month 
  • The total number of people who loved a newly introduced product feature 
  • Number of users converted with a marketing campaign 
  • Total number of website conversions in six months 
  • Number of customers residing in a specific location 

What does a data analyst do?

A data analyst collects and interprets data to answer key questions like the potential of new product development, changes in customer purchase behavior, gaps in current marketing campaigns, etc. Many data analysts conduct exploratory analysis to identify trends and patterns during the data cleaning process. 

The job also includes communicating the findings to team members to create better strategies. One can thrive in this position with the right technical and leadership skills. Therefore, a data analyst’s two key roles include knowing how to collect and use the collected data for business growth. 

What is the typical process that a data analyst follows?

The process followed by a data analyst includes the following steps: 

  • Setting objectives 

Understanding the goals behind collecting data help target the right areas for data collection. Next, identify the type of data you need to collect to conduct a specific analysis. 

  • Collecting data 

Now you are clear with the goals and data requirements. Focus on collecting data from identified sources using different methods of data collection. Some of the top sources include surveys, polls, interviews, etc. 

  • Data cleaning 

The data collection process includes collecting a large amount of data which requires further cleaning for analysis. This step removes duplicate records and identifies omissions and other numerical errors. 

  • Data analytics and interpretation 

Now, data analysts focus on analyzing the collected data using various tools and software. Based on the analysis, they draw relevant conclusions to make the most of their findings. 

  • Data visualization 

Data analysts must also convey the acquired information to relevant team members. Data visualization refers to the graphical interpretation of collected data for easy understanding. The process also helps analysts identify hidden insights for detailed reporting. 

What techniques and tools do data analytics use?

Data analysts use various techniques, including regression analysis, the Monte Carlo method, factor analysis, cohort analysis, etc. The right blend of techniques to suit specific situations helps achieve the required results. 

The tools used for data analysis reduce the manual burden of analysts and improve overall decision-making. There are varied categories of tools in data analysis, including business intelligence, ETL tools, automation tools, data mining, data visualization tools, etc. 

Some popular choices for data analysis include Google Analytics, Growth Nirvana (marketing analytics), Improvado , Datapine , etc. 

What are the skills required to become a data analyst?

A data analyst requires the following skills to thrive in the field: 

  • Complete knowledge of python programming
  • Mathematical and statistical understanding 
  • Data decluttering, organizing, and analyzing 
  • SQL knowledge
  • Problem-solving 
  • Logical reasoning and critical thinking 
  • Sharp communication skills
  • Collaboration 

What are some of the best data analytics course?

Let’s now look at some of the best data analytics courses in 2022 to help you gain all relevant skills: 

  • Detailed learning: Data Analyst Nanodegree program by Udacity  

A 4-month program helping people develop advanced programming skills to handle complex data-related issues. It covers everything from data analysis, and visualization, to exploration. 

What are some of the best data analytics course?

        Source

  • Best data analytics course for beginners: Become A Data Analyst by LinkedIn. 

The course consists of beginner-friendly lessons suitable for people with no prior understanding of data analysis. The experts in the industry take all sessions. Furthermore, you can easily complete the course within the free 30 days LinkedIn Learning period. 

Best data analytics course for beginners

  • Bite-sized learning: Data Analyst with R by Datacamp

The entire learning experience breaks down into multiple courses to help you keep up the pace. Industry experts curate about 19 different courses with a duration of 4 hours for every course. Furthermore, it also helps students gain practical exposure by working with real-life datasets. 

Data Analyst with R by Datacamp

      Source

What does the future hold for data analytics?

Data analytics remains a constantly evolving area that will become increasingly important for businesses in the coming future. Extracting real-time insights will help enhance business operations for continuous growth. Furthermore, the increasing growth of business analytics tools makes it easier for businesses to analyze data and draw conclusions without complex coding knowledge. 

Key Takeaways 

  • Analysts must use quantitative (number-oriented) and qualitative (non-numeric) data to devise and modify business strategies. 
  • Surveys are used for obtaining both quantitative and qualitative research data. However, quantitative surveys include only close-ended questions. 
  • Data cleaning remains one of the most crucial steps of data analysis. It ensures the collected data doesn’t contain duplications, omissions, unwanted data points, etc. 
  • The top skills possessed by data analysts include python programming, statistical knowledge, data decluttering, SQL knowledge, collaboration, and communication skills. 

The top two data analytics techniques — descriptive and inferential statistics- are complementary.

Related Resources

  • 101 Guide to Quantitative Data Analysis
  • Marketing Automation In Your Marketing Strategy in 2023
  • How to use Analytics Data to Reach New and Existing Customers
  • How to Leverage Email Marketing with Advanced Analytics?
  • The Best 7 Domo Alternatives in 2023
  • 5 Best Improvado Alternatives for Marketers in 2023
  • supermetrics alternatives
  • marketing agency reporting tools

Related Guide Resources:

  • Connect Google Analytics with Google Data Studio: 101 [New Guide]
  • An Ultimate Guide To Omnichannel Analytics: Meaning, Benefits, Setup Process
  • What is Marketing Analytics? Examples and Its Importance

You may also like...

quantitative data analysis tools in research

Top 20 data connectors in 2023

GA4 Reports

How To Make Custom Reports In Google Analytics 4 (2023)

ppc lead generation strategy

How To Get More Leads With PPC: Top 12 PPC Lead Generation St...

Growth Nirvana Inc. 145 Cedar Way #14 Laguna Beach, CA 92651

Copyright © 2024 Growth Nirvana.  All rights reserved.

Give us a few details & get your automated custom dashboard in no time

No worries before you go explore how growth nirvana helped growing brands scale up to new heights., is your current reporting process taking up valuable time.

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Quantitative Data – Types, Methods and Examples

Quantitative Data – Types, Methods and Examples

Table of Contents

 Quantitative Data

Quantitative Data

Definition:

Quantitative data refers to numerical data that can be measured or counted. This type of data is often used in scientific research and is typically collected through methods such as surveys, experiments, and statistical analysis.

Quantitative Data Types

There are two main types of quantitative data: discrete and continuous.

  • Discrete data: Discrete data refers to numerical values that can only take on specific, distinct values. This type of data is typically represented as whole numbers and cannot be broken down into smaller units. Examples of discrete data include the number of students in a class, the number of cars in a parking lot, and the number of children in a family.
  • Continuous data: Continuous data refers to numerical values that can take on any value within a certain range or interval. This type of data is typically represented as decimal or fractional values and can be broken down into smaller units. Examples of continuous data include measurements of height, weight, temperature, and time.

Quantitative Data Collection Methods

There are several common methods for collecting quantitative data. Some of these methods include:

  • Surveys : Surveys involve asking a set of standardized questions to a large number of people. Surveys can be conducted in person, over the phone, via email or online, and can be used to collect data on a wide range of topics.
  • Experiments : Experiments involve manipulating one or more variables and observing the effects on a specific outcome. Experiments can be conducted in a controlled laboratory setting or in the real world.
  • Observational studies : Observational studies involve observing and collecting data on a specific phenomenon without intervening or manipulating any variables. Observational studies can be conducted in a natural setting or in a laboratory.
  • Secondary data analysis : Secondary data analysis involves using existing data that was collected for a different purpose to answer a new research question. This method can be cost-effective and efficient, but it is important to ensure that the data is appropriate for the research question being studied.
  • Physiological measures: Physiological measures involve collecting data on biological or physiological processes, such as heart rate, blood pressure, or brain activity.
  • Computerized tracking: Computerized tracking involves collecting data automatically from electronic sources, such as social media, online purchases, or website analytics.

Quantitative Data Analysis Methods

There are several methods for analyzing quantitative data, including:

  • Descriptive statistics: Descriptive statistics are used to summarize and describe the basic features of the data, such as the mean, median, mode, standard deviation, and range.
  • Inferential statistics : Inferential statistics are used to make generalizations about a population based on a sample of data. These methods include hypothesis testing, confidence intervals, and regression analysis.
  • Data visualization: Data visualization involves creating charts, graphs, and other visual representations of the data to help identify patterns and trends. Common types of data visualization include histograms, scatterplots, and bar charts.
  • Time series analysis: Time series analysis involves analyzing data that is collected over time to identify patterns and trends in the data.
  • Multivariate analysis : Multivariate analysis involves analyzing data with multiple variables to identify relationships between the variables.
  • Factor analysis : Factor analysis involves identifying underlying factors or dimensions that explain the variation in the data.
  • Cluster analysis: Cluster analysis involves identifying groups or clusters of observations that are similar to each other based on multiple variables.

Quantitative Data Formats

Quantitative data can be represented in different formats, depending on the nature of the data and the purpose of the analysis. Here are some common formats:

  • Tables : Tables are a common way to present quantitative data, particularly when the data involves multiple variables. Tables can be used to show the frequency or percentage of data in different categories or to display summary statistics.
  • Charts and graphs: Charts and graphs are useful for visualizing quantitative data and can be used to highlight patterns and trends in the data. Some common types of charts and graphs include line charts, bar charts, scatterplots, and pie charts.
  • Databases : Quantitative data can be stored in databases, which allow for easy sorting, filtering, and analysis of large amounts of data.
  • Spreadsheets : Spreadsheets can be used to organize and analyze quantitative data, particularly when the data is relatively small in size. Spreadsheets allow for calculations and data manipulation, as well as the creation of charts and graphs.
  • Statistical software : Statistical software, such as SPSS, R, and SAS, can be used to analyze quantitative data. These programs allow for more advanced statistical analyses and data modeling, as well as the creation of charts and graphs.

Quantitative Data Gathering Guide

Here is a basic guide for gathering quantitative data:

  • Define the research question: The first step in gathering quantitative data is to clearly define the research question. This will help determine the type of data to be collected, the sample size, and the methods of data analysis.
  • Choose the data collection method: Select the appropriate method for collecting data based on the research question and available resources. This could include surveys, experiments, observational studies, or other methods.
  • Determine the sample size: Determine the appropriate sample size for the research question. This will depend on the level of precision needed and the variability of the population being studied.
  • Develop the data collection instrument: Develop a questionnaire or survey instrument that will be used to collect the data. The instrument should be designed to gather the specific information needed to answer the research question.
  • Pilot test the data collection instrument : Before collecting data from the entire sample, pilot test the instrument on a small group to identify any potential problems or issues.
  • Collect the data: Collect the data from the selected sample using the chosen data collection method.
  • Clean and organize the data : Organize the data into a format that can be easily analyzed. This may involve checking for missing data, outliers, or errors.
  • Analyze the data: Analyze the data using appropriate statistical methods. This may involve descriptive statistics, inferential statistics, or other types of analysis.
  • Interpret the results: Interpret the results of the analysis in the context of the research question. Identify any patterns, trends, or relationships in the data and draw conclusions based on the findings.
  • Communicate the findings: Communicate the findings of the analysis in a clear and concise manner, using appropriate tables, graphs, and other visual aids as necessary. The results should be presented in a way that is accessible to the intended audience.

Examples of Quantitative Data

Here are some examples of quantitative data:

  • Height of a person (measured in inches or centimeters)
  • Weight of a person (measured in pounds or kilograms)
  • Temperature (measured in Fahrenheit or Celsius)
  • Age of a person (measured in years)
  • Number of cars sold in a month
  • Amount of rainfall in a specific area (measured in inches or millimeters)
  • Number of hours worked in a week
  • GPA (grade point average) of a student
  • Sales figures for a product
  • Time taken to complete a task.
  • Distance traveled (measured in miles or kilometers)
  • Speed of an object (measured in miles per hour or kilometers per hour)
  • Number of people attending an event
  • Price of a product (measured in dollars or other currency)
  • Blood pressure (measured in millimeters of mercury)
  • Amount of sugar in a food item (measured in grams)
  • Test scores (measured on a numerical scale)
  • Number of website visitors per day
  • Stock prices (measured in dollars)
  • Crime rates (measured by the number of crimes per 100,000 people)

Applications of Quantitative Data

Quantitative data has a wide range of applications across various fields, including:

  • Scientific research: Quantitative data is used extensively in scientific research to test hypotheses and draw conclusions. For example, in biology, researchers might use quantitative data to measure the growth rate of cells or the effectiveness of a drug treatment.
  • Business and economics: Quantitative data is used to analyze business and economic trends, forecast future performance, and make data-driven decisions. For example, a company might use quantitative data to analyze sales figures and customer demographics to determine which products are most popular among which segments of their customer base.
  • Education: Quantitative data is used in education to measure student performance, evaluate teaching methods, and identify areas where improvement is needed. For example, a teacher might use quantitative data to track the progress of their students over the course of a semester and adjust their teaching methods accordingly.
  • Public policy: Quantitative data is used in public policy to evaluate the effectiveness of policies and programs, identify areas where improvement is needed, and develop evidence-based solutions. For example, a government agency might use quantitative data to evaluate the impact of a social welfare program on poverty rates.
  • Healthcare : Quantitative data is used in healthcare to evaluate the effectiveness of medical treatments, track the spread of diseases, and identify risk factors for various health conditions. For example, a doctor might use quantitative data to monitor the blood pressure levels of their patients over time and adjust their treatment plan accordingly.

Purpose of Quantitative Data

The purpose of quantitative data is to provide a numerical representation of a phenomenon or observation. Quantitative data is used to measure and describe the characteristics of a population or sample, and to test hypotheses and draw conclusions based on statistical analysis. Some of the key purposes of quantitative data include:

  • Measuring and describing : Quantitative data is used to measure and describe the characteristics of a population or sample, such as age, income, or education level. This allows researchers to better understand the population they are studying.
  • Testing hypotheses: Quantitative data is often used to test hypotheses and theories by collecting numerical data and analyzing it using statistical methods. This can help researchers determine whether there is a statistically significant relationship between variables or whether there is support for a particular theory.
  • Making predictions : Quantitative data can be used to make predictions about future events or trends based on past data. This is often done through statistical modeling or time series analysis.
  • Evaluating programs and policies: Quantitative data is often used to evaluate the effectiveness of programs and policies. This can help policymakers and program managers identify areas where improvements can be made and make evidence-based decisions about future programs and policies.

When to use Quantitative Data

Quantitative data is appropriate to use when you want to collect and analyze numerical data that can be measured and analyzed using statistical methods. Here are some situations where quantitative data is typically used:

  • When you want to measure a characteristic or behavior : If you want to measure something like the height or weight of a population or the number of people who smoke, you would use quantitative data to collect this information.
  • When you want to compare groups: If you want to compare two or more groups, such as comparing the effectiveness of two different medical treatments, you would use quantitative data to collect and analyze the data.
  • When you want to test a hypothesis : If you have a hypothesis or theory that you want to test, you would use quantitative data to collect data that can be analyzed statistically to determine whether your hypothesis is supported by the data.
  • When you want to make predictions: If you want to make predictions about future trends or events, such as predicting sales for a new product, you would use quantitative data to collect and analyze data from past trends to make your prediction.
  • When you want to evaluate a program or policy : If you want to evaluate the effectiveness of a program or policy, you would use quantitative data to collect data about the program or policy and analyze it statistically to determine whether it has had the intended effect.

Characteristics of Quantitative Data

Quantitative data is characterized by several key features, including:

  • Numerical values : Quantitative data consists of numerical values that can be measured and counted. These values are often expressed in terms of units, such as dollars, centimeters, or kilograms.
  • Continuous or discrete : Quantitative data can be either continuous or discrete. Continuous data can take on any value within a certain range, while discrete data can only take on certain values.
  • Objective: Quantitative data is objective, meaning that it is not influenced by personal biases or opinions. It is based on empirical evidence that can be measured and analyzed using statistical methods.
  • Large sample size: Quantitative data is often collected from a large sample size in order to ensure that the results are statistically significant and representative of the population being studied.
  • Statistical analysis: Quantitative data is typically analyzed using statistical methods to determine patterns, relationships, and other characteristics of the data. This allows researchers to make more objective conclusions based on empirical evidence.
  • Precision : Quantitative data is often very precise, with measurements taken to multiple decimal points or significant figures. This precision allows for more accurate analysis and interpretation of the data.

Advantages of Quantitative Data

Some advantages of quantitative data are:

  • Objectivity : Quantitative data is usually objective because it is based on measurable and observable variables. This means that different people who collect the same data will generally get the same results.
  • Precision : Quantitative data provides precise measurements of variables. This means that it is easier to make comparisons and draw conclusions from quantitative data.
  • Replicability : Since quantitative data is based on objective measurements, it is often easier to replicate research studies using the same or similar data.
  • Generalizability : Quantitative data allows researchers to generalize findings to a larger population. This is because quantitative data is often collected using random sampling methods, which help to ensure that the data is representative of the population being studied.
  • Statistical analysis : Quantitative data can be analyzed using statistical methods, which allows researchers to test hypotheses and draw conclusions about the relationships between variables.
  • Efficiency : Quantitative data can often be collected quickly and efficiently using surveys or other standardized instruments, which makes it a cost-effective way to gather large amounts of data.

Limitations of Quantitative Data

Some Limitations of Quantitative Data are as follows:

  • Limited context: Quantitative data does not provide information about the context in which the data was collected. This can make it difficult to understand the meaning behind the numbers.
  • Limited depth: Quantitative data is often limited to predetermined variables and questions, which may not capture the complexity of the phenomenon being studied.
  • Difficulty in capturing qualitative aspects: Quantitative data is unable to capture the subjective experiences and qualitative aspects of human behavior, such as emotions, attitudes, and motivations.
  • Possibility of bias: The collection and interpretation of quantitative data can be influenced by biases, such as sampling bias, measurement bias, or researcher bias.
  • Simplification of complex phenomena: Quantitative data may oversimplify complex phenomena by reducing them to numerical measurements and statistical analyses.
  • Lack of flexibility: Quantitative data collection methods may not allow for changes or adaptations in the research process, which can limit the ability to respond to unexpected findings or new insights.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Primary Data

Primary Data – Types, Methods and Examples

Qualitative Data

Qualitative Data – Types, Methods and Examples

Research Data

Research Data – Types Methods and Examples

Secondary Data

Secondary Data – Types, Methods and Examples

Research Information

Information in Research – Types and Examples

A Fresh Way to Do Statistics

Download JASP  

quantitative data analysis tools in research

JASP is an open-source project supported by the University of Amsterdam.

JASP has an intuitive interface that was designed with the user in mind.

JASP offers standard analysis procedures in both their classical and Bayesian form.

quantitative data analysis tools in research

Main Features

Your choice.

  • Frequentist analyses
  • Bayesian analyses

User-friendly Interface

  • Dynamic update of all results
  • Spreadsheet layout and an intuitive drag-and-drop interface
  • Progressive disclosure for increased understanding
  • Annotated output for communicating your results

Developed for publishing analyses

  • Integrated with The Open Science Framework (OSF)
  • Support for APA format (copy graphs and tables directly into Word)

View complete feature list

Mission Statement

Our main goal is to help statistical practitioners reach maximally informative conclusions with a minimum of fuss. This is why we have developed JASP, a free cross-platform software program with a state-of-the-art graphical user interface.

quantitative data analysis tools in research

Your First Steps Using JASP

Getting started.

The introductory video on the left should give you a good idea of how JASP works. You can consult our Getting Started Guide for more information.

How to Use JASP

Take a look at our How to Use JASP page for in-depth explanations of the different features in JASP.

Past Sponsors

quantitative data analysis tools in research

University of Bern

Department of Psychology

www.psy.unibe.ch

APS Fund for Teaching and Public Understanding of Psychological Science

www.psychologicalscience.org

European Research Council

www.erc.europa.eu

Nederlandse Organisatie voor Wetenschappelijk Onderzoek

Center for Open Science

Scientific Advisory Board

  • Prof. James O. Berger, Duke University
  • Prof. Jon Forster, University of Southampton
  • Prof. Merlise A. Clyde, Duke University
  • Prof. Ioannis Ntzoufras, Athens University of Economics and Business
  • Prof. Jeffrey N. Rouder, University of California, Irvine
  • Prof. Zoltan Dienes, University of Sussex
  • Prof. Andy Field, University of Sussex
  • Prof. Han L. J. van der Maas, University of Amsterdam
  • Prof. Erin Buchanan, Missouri State University
  • Prof. Casper Albers, University of Groningen
  • Dr. Henrik Singmann, University College London
  • Dr. Felix Schönbrodt, LMU Munich

For more details on the scientific advisory board, click here .

Data Analysis Techniques for Quantitative Study

  • First Online: 27 October 2022

Cite this chapter

Book cover

  • Md. Mahsin 4  

This chapter describes the types of data analysis techniques in quantitative research and sampling strategies suitable for quantitative studies, particularly probability sampling, to produce credible and trustworthy explanations of a phenomenon. Initially, it briefly describes the measurement levels of variables. It then provides some statistical analysis techniques for quantitative study with examples using tables and graphs, making it easier for the readers to understand the data presentation techniques in quantitative research. In summary, it will be a beneficial resource for those interested in using quantitative design for their data analysis.

  • Social research
  • Quantitative research
  • Sample size
  • Probability sampling
  • Data measurement
  • Data analysis

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

It is called the “Pearson correlation coefficient” in honour of Karl Pearson, a British mathematician who developed the method.

Agresti, A., & Kateri, M. (2011). Categorical data analysis . Springer.

Google Scholar  

Argyrous, G. (1997). Statistics for social research . Macmillan Education Australia Printery Limited.

Aron, A., Coups, E., & Aron, E. N. (2013). Statistics for the behavioral and social sciences: A brief course: Pearson new international edition. Pearson Higher Ed.

Bailey, K. (2008). Methods of social research . Simon and Schuster.

Babbie, E. R. (2015). The practice of social research . Nelson Education.

Bernard, H. R., & Bernard, H. R. (2012). Social research methods: Qualitative and quantitative approaches . Sage Publications.

Bickman, L., & Rog, D. J. (Eds.). (2008). The sage handbook of applied social research methods . Sage Publications.

Bryman, A. (2015). Social research methods . Oxford University Press.

Field, A. (2009). Discovering statistics using SPSS . Sage Publications.

Gorard, S. (2003). Quantitative methods in social science research . A&C Black.

Hosmer, D. W., Jr., & Lemeshow, S. (2004). Applied logistic regression . John Wiley & Sons.

Islam, M. R. (Ed.). (2019). Social research methodology and new techniques in analysis, interpretation and writing . IGI Global.

Klecka, W. R. (1980). Discriminant analysis (No. 19). Sage Populations.

Lampard, R., & Pole, C. (2015). Practical social investigation: Qualitative and quantitative methods in social research . Routledge.

McLachlan, G. (2004). Discriminant analysis and statistical pattern recognition (Vol. 544). John Wiley & Sons.

Montgomery, D. C. (2012). Design and analysis of experiments . John Wiley & Sons.

Muijs, D. (2010). Doing quantitative research in education with SPSS . Sage Publications.

Neuman, L. W. (2002). Social research methods: Qualitative and quantitative approaches .

Population & Housing Census, Bangladesh. (2011). Preliminary results

Population Division of the Department of Economic and Social Affairs of the United Nations Secretariat, World Population Prospects: The 2010 Revision

Punch, K. F. (2013). Introduction to social research: Quantitative and qualitative approaches . Sage Publications.

Stevens, J. P. (2012). Applied multivariate statistics for the social sciences . Routledge.

Book   Google Scholar  

Wilcox, R. R. (1996). Statistics for the social sciences . Academic Press.

Download references

Author information

Authors and affiliations.

Department of Mathematics and Statistics, University of Calgary, 2500 University Drive NW, Calgary, Canada

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Md. Mahsin .

Editor information

Editors and affiliations.

Centre for Family and Child Studies, Research Institute of Humanities and Social Sciences, University of Sharjah, Sharjah, United Arab Emirates

M. Rezaul Islam

Department of Development Studies, University of Dhaka, Dhaka, Bangladesh

Niaz Ahmed Khan

Department of Social Work, School of Humanities, University of Johannesburg, Johannesburg, South Africa

Rajendra Baikady

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Mahsin, M. (2022). Data Analysis Techniques for Quantitative Study. In: Islam, M.R., Khan, N.A., Baikady, R. (eds) Principles of Social Research Methodology. Springer, Singapore. https://doi.org/10.1007/978-981-19-5441-2_16

Download citation

DOI : https://doi.org/10.1007/978-981-19-5441-2_16

Published : 27 October 2022

Publisher Name : Springer, Singapore

Print ISBN : 978-981-19-5219-7

Online ISBN : 978-981-19-5441-2

eBook Packages : Social Sciences

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 April 2024

An easy to use tool for the analysis of subcellular mRNA transcript colocalisation in smFISH data

  • Calum Bentley-Abbot 1 , 2 , 3 ,
  • Rhiannon Heslop 1 , 2 ,
  • Chiara Pirillo 4 ,
  • Praveena Chandrasegaran 1 , 2 ,
  • Gail McConnell 5 ,
  • Ed Roberts 4 ,
  • Edward Hutchinson 3 &
  • Annette MacLeod 1 , 2  

Scientific Reports volume  14 , Article number:  8348 ( 2024 ) Cite this article

Metrics details

  • Cellular imaging
  • Infectious diseases
  • Transcriptomics

Single molecule fluorescence in situ hybridisation (smFISH) has become a valuable tool to investigate the mRNA expression of single cells. However, it requires a considerable amount of programming expertise to use currently available open-source analytical software packages to extract and analyse quantitative data about transcript expression. Here, we present FISHtoFigure, a new software tool developed specifically for the analysis of mRNA abundance and co-expression in QuPath-quantified, multi-labelled smFISH data. FISHtoFigure facilitates the automated spatial analysis of transcripts of interest, allowing users to analyse populations of cells positive for specific combinations of mRNA targets without the need for computational image analysis expertise. As a proof of concept and to demonstrate the capabilities of this new research tool, we have validated FISHtoFigure in multiple biological systems. We used FISHtoFigure to identify an upregulation in the expression of Cd4 by T-cells in the spleens of mice infected with influenza A virus, before analysing more complex data showing crosstalk between microglia and regulatory B-cells in the brains of mice infected with Trypanosoma brucei brucei . These analyses demonstrate the ease of analysing cell expression profiles using FISHtoFigure and the value of this new tool in the field of smFISH data analysis.

Similar content being viewed by others

quantitative data analysis tools in research

SM-Omics is an automated platform for high-throughput spatial multi-omics

S. Vickovic, B. Lötstedt, … A. Regev

quantitative data analysis tools in research

ShIVA: a user-friendly and interactive interface giving biologists control over their single-cell RNA-seq data

Rudy Aussel, Muhammad Asif, … Lionel Spinelli

quantitative data analysis tools in research

A density-based enrichment measure for assessing colocalization in single-molecule localization microscopy data

Aske L. Ejdrup, Matthew D. Lycas, … Ulrik Gether

Introduction

Single molecule fluorescence in situ hybridisation (smFISH) technologies such as RNAScope enable the visualisation of single mRNA molecules within single cells. mRNA transcripts are detected by fluorescence microscopy, with each transcript appearing as a single ‘transcriptional spot’ 1 . Quantification of these signals enables the analysis of transcriptional activity at the single cell level within the spatial context of tissues 2 . However, the large microscopy datasets produced by smFISH experiments currently require custom code in order to conduct in-depth transcriptomic analyses. QuPath is a purpose-built platform for the analysis of large images such as those acquired during smFISH experiments, and is recommended by ACDBio-Techne, the developer of the RNAScope platform ( https://acdbio.com/qupath-rna-ish-analysis ), for image analysis 3 . QuPath has specific in-built tools for cell segmentation and fluorescent spot detection, which can be used to quantify transcriptional spots. Furthermore, the software incorporates a batch processing feature which facilitates automated analysis of data from multiple images 3 . Following quantification, QuPath can plot quantified data, such as transcripts per cell, as a histogram 3 . However, users wishing to conduct more complex analyses, such as differential expression analysis or co-expression analysis, must develop custom pipelines to parse raw QuPath output data, thus restricting such analysis to users with extensive programming experience.

Here, we present FISHtoFigure, a standalone, open-source software tool for the in-depth analysis of transcript abundance in QuPath-quantified smFISH data by users with all levels of programming experience. FISHtoFigure can concatenate the batch processed data from QuPath, enabling the analysis of large, multi-image datasets. Notably, FISHtoFigure allows users to conduct transcript abundance analysis for cells with specific, multi-transcript expression profiles. Additionally, FISHtoFigure enables users to conduct differential expression analysis between datasets, facilitating the targeted study of differential expression in specific cell types and populations. Thus, FISHtoFigure provides a means for all users to examine mRNA expression of multiple transcripts without the need for custom analysis pipelines.

Here, we demonstrate the use of FISHtoFigure in two biological scenarios. First, we used FISHtoFigure to analyse T-cell and B-cell populations in the spleens of influenza A virus (IAV) infected mice, hereafter referred to as the spleen dataset. Second, we demonstrate the capabilities of FISHtoFigure for the analysis of high-plex smFISH data collected from highly ramified, non-round cell types, using a dataset obtained in a recent experiment by our group investigating microglia in the brains of Trypanosoma brucei brucei infected mice, hereafter referred to as the brain dataset 4 .

Materials and methods

Specifications and data handling.

FISHtoFigure is a Python-based analytical software tool, designed to quantify cell expression profiles within smFISH data. Expression profile analysis is conducted in FISHtoFigure using the Pandas library 5 . A two-branched strategy is used to isolate cellular and subcellular data into two new datasets. Graphical outputs are generated using a combination of the Matplotlib and Seaborn Python libraries 6 , 7 . In addition to the graphical outputs, data from FISHtoFigure analysis are stored in CSV format for downstream statistical analysis. The statistical tests in this paper were performed with GraphPad PRISM, non-parametric tests were selected due to lack of normality.

Animal work and sample collection

All spleen samples were collected from 9 week old, male C57BL/6 mice. A mouse was intranasally infected with IAV A/Puerto Rico/8/34 (PR8; H1N1). The infected mouse was culled 6 days post infection via intraperitoneal injection of pentobarbital and whole spleens were harvested immediately. The infected mouse was weighed to monitor disease progression as per the ethical regulations of animal licence P72BA642F. The spleen from an uninfected male C57BL/6 mouse, culled by the same method, was harvested to act as a naïve control.

All brain samples were collected from 6 to 8-week-old, female C57BL/6 mice. Two mice were infected with T. b. brucei Antat 1.1E 4 . Mice were culled 45 days post infection via rapid decapitation following isoflurane anaesthesia and whole brains were harvested immediately. Mice were monitored for disease severity using the following clinical scoring method, score (0) normal, healthy, and explorative mouse; score (1) slow, sluggish, or displaying stary coat; score (2) animals with reduced coordination of hind limbs and/or altered gait; score (3) animals with flaccid paralysis of one hind limb. Mice with clinical scores higher than (3) were humanely killed as per the ethical regulations of animal licence PC8C3B25C. Whole brains from two uninfected female mice, culled by the same method, were harvested to act as naïve controls.

Mice were bred and housed at the Beatson Cancer Research Institute (Spleen samples) and the University of Glasgow Centre for Virus Research (Brain Samples). All animal work was carried out in line with the EU Directive 2010/63/eu and Animal (Scientific Procedures) Act 1986, under project licences P72BA642F (Spleen samples) and PC8C3B25C (Brain samples), and was approved by the University of Glasgow Animal Welfare and Ethics Review Board. This work was carried out in accordance with the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines. The reporting in this study fulfils the ARRIVE recommendations.

All samples were fixed in 4% paraformaldehyde (PFA) at room temperature for 24 h and embedded in paraffin. From paraffin blocks, sections were cut on a microtome (Thermo Scientific) and mounted on glass slides for histology.

RNAScope data collection

Commercial RNAScope control slides containing mouse NIH 3T3 cells (Advanced Cell Diagnostics, US) were used as a positive control sample for RNAScope for all samples. RNAScope was used to visualise Cd79a and Cd4 transcripts in the spleens of naïve and IAV infected mice, and Cd79a, Cx3cr1, Il10 and Il10ra transcripts in the brain of naïve and T. b. brucei infected mice. Fresh probe mixes containing the RNAScope probes were prepared for each experiment (Table 1 ). A single probe per channel (C1–C4) was included in each experiment. RNAScope 4-plex positive controls (for Polr2a, Ppib, and Ubc ) and negative controls (for the Bacillus subtilis bacterial Dapb gene) were also included (probe details are listed at https://acdbio.com/control-slides-and-control-probes-rnascope ). Slides were imaged by confocal microscopy (Zeiss LSM 880, 63 × objective for the spleen samples; Zeiss LSM 710, 63 × objective for the brain samples) within 72 h of staining.

QuPath image analysis

Once imaged, QuPath 0.3.1 Software was used to quantify the number of transcripts for each target probe 3 . Negative control images were generated by probing spleen and brain tissue sections with the RNAScope 3-plex negative control probes. Fluorescence measurements for each detection channel in the negative control images were subtracted from final experimental images to determine background fluorescence thresholds. Subtracting background fluorescence in this way ensures that all detected fluorescent spots represent true signal from RNA transcripts. Using in-built QuPath annotation tools, one large region of interest (ROI) was specified on each image such that the whole image was encompassed in a single annotation. The “Cell Detection” function was used to determine the number and position of cells in each ROI based on the DAPI nuclear stain (under the assumption that one nucleus represented one cell), and the ‘Subcellular Detection’ function was used to calculate the number of transcripts for each target. The accuracy of QuPath’s automated annotation features have been confirmed by comparison to annotation by expert human analysts 3 . QuPath output data were then used as input data for FISHtoFigure. The analysis workflow was scripted to enable batch processing of all images within each dataset.

We designed FISHtoFigure to facilitate the conversion of QuPath-quantified image data into transcript abundance analytics. We designed a simple graphical user interface and packaged the FISHtoFigure software as a standalone executable program, enabling analysis to be conducted with no interaction with the raw data or underlying Python code. Below we outline the steps involved in analysing smFISH data using FISHtoFigure, along with examples of analysis outcomes.

Step 1: Data harvesting and validation of quantified smFISH data

First, cellular boundaries and mRNA transcripts were identified using QuPath. QuPath output data were then processed using FISHtoFigure to produce differential transcript abundance analytics for different cell types or expression profiles 3 . An overview of the FISHtoFigure pipeline is given in Fig.  1 a and an example of a typical image for processing is given in Fig.  1 bi.

figure 1

FISHtoFigure pipeline. ( a ) (i) An smFISH image from the spleen dataset captured via confocal microscopy (Zeiss LSM 880). (ii) QuPath’s “Cell Detection” function was used to identify cell boundaries (shown in red). Cell nuclei identification is based on fluorescence above background in the channel associated with the nuclear stain (DAPI). (iii) An overview of FISHtoFigure processing of QuPath output data to generate transcript abundance outputs. ( b ) (i) An smFISH image from the brain dataset (scale bar = 20 µm), captured by confocal microscopy (Zeiss LSM 710) and (ii) processed using FISHtoFigure’s “Plot transcript distribution” function, where points represent cells and are sized based on the number of transcripts being expressed by that cell. (iii) An overlay of the captured smFISH image with the plot produced by FISHtoFigure demonstrates the accuracy of the pipeline.

As experiments usually require numerous individual images, we created a dedicated pre-processing tool to concatenate individual QuPath-quantified image datasets into a single file comprising data from any number of smFISH images, which can then be analysed by the main FISHtoFigure program. Due to the volume of information captured during imaging, the resulting quantified files are large and include metrics not relevant for transcript expression analysis (e.g. morphometric data, such as, cell area, nucleus and cytoplasm morphology, etc.). The desired information, i.e., the number of transcripts per cell and fluorescent intensity data, which comprise only a small portion of the quantified data, are extracted by FISHtoFigure from QuPath-quantified smFISH data files and assigned to the cells from which they originate. Metrics are then calculated for each cell, i.e. the number of transcripts and total fluorescent intensities for each mRNA target. In addition to transcriptome information, cell location information is extracted in the form of the cell centroid (based on nuclear staining identified using the “Cell Detection” function in QuPath). Further information on the flags FISHtoFigure uses to harvest data are provided in the “FISHtoFigure v1.0.1 User Guide” document in the tool’s GitHub repository and is recommended for developers wishing to further develop the FISHtoFigure tool. These data are then processed by FISHtoFigure using the “Plot Transcript Distribution” feature, which produces a scatter plot of points representing cell centroids, with points sized by number of mRNA transcripts within the cell and coloured by gene (Fig.  1 bii). This allows users to visualise quantified data in a format analogous to the original smFISH image (Fig.  1 bi) and, by overlaying this visualised data with the original smFISH image, directly validate the accuracy of data extraction by FISHtoFigure (Fig.  1 biii).

Step 2: Differential target abundance analysis from smFISH data using the FISHtoFigure package

Following data extraction and assignment of transcript information to cells, differential transcript abundance analysis can be conducted using FISHtoFigure’s “Transcript abundance analysis” feature.

Using our spleen dataset, we investigated T-cell and B-cell populations in the spleens of mice, either uninfected or 6 days after infection with influenza A virus. These cells are highly abundant in spleen tissue and have a classically “round” cellular morphology. Their morphology enabled easy identification of cell boundaries in QuPath, and thus generated a straightforward dataset for software validation. Spleen sections from naïve and infected mice were stained using DAPI to identify cell nuclei and probed for Cd4 and Cd79a mRNA transcripts, enabling us to identify helper T-cells and B-cells, respectively 8 , 9 . This analysis revealed a statistically significant upregulation of Cd4 expression within the T-cell population during infection (p < 0.01, Mann–Whitney test; Fig.  2 a), while no statistically significant difference in Cd79a expression was observed. In addition to graphical outputs, FISHtoFigure analysis is saved in CSV format, meaning further downstream analysis can be performed using a wide variety of platforms (R, Microsoft Excel, etc.). Here, statistical analysis was performed on the FISHtoFigure output data using GraphPad PRISM.

figure 2

Analysis of spleen samples from naïve and influenza A virus infected mice. ( a ) FISHtoFigure quantification of Cd4 expression within Cd4 + cells in the naïve (n = 1228) and infected (n = 1486) spleen datasets, significantly upregulated during infection (Mann–Whitney test). Bars represent mean values across all cells in the dataset, each dot represents a Cd4 + cell in the dataset. ( b ) Total number of cells co-expressing Cd79a and Cd4 , with threshold set to 1 or 2 transcripts. ( c ) (i) An smFISH image from a naïve spleen (scale bar = 20 µm), captured by confocal microscopy (Zeiss LSM 880). (ii) A zoomed view of the region shown in the red square in (i) shows a B-cell (red arrow) and T-cell (blue arrow) in close proximity. (iii) Cell boundaries identified using QuPath. (iv) FISHtoFigure’s “Plot Transcript Distribution” feature with a threshold of 1 transcript per cell, (v) with a threshold of 2 transcripts per cell. Setting a threshold of 2 transcripts per target per cell results in the B-cell being correctly categorised (red arrow)—note the removal of the ambiguous Cd79a + Cd4 + cell expressing both transcripts as they are below threshold levels (black arrow in (iv)).

We expanded the analysis capabilities of FISHtoFigure by adding the “Multi-target transcript abundance” feature, enabling the identification and quantification of cell types with multiplex transcriptomic profiles. This feature can be used to identify cells expressing any combination of mRNA transcripts. Here, we present an overview of the capabilities of this feature of FISHtoFigure and example analysis on the brain dataset. Comprehensive information on the underlying code and processing is available in the “FISHtoFigure v1.0.1 User Guide” document in the tool’s GitHub repository. Here, we used this feature to validate the cell type quantification of our pipeline. Cd4 and Cd79a are well established markers for helper T-cells and B-cells respectively 8 , 9 . Spleen resident B-cells do not express Cd4 , and T-cells do not express Cd79a . Therefore we used the double-positive Cd4 + Cd79a + cell population as a metric for mis-categorisation of cells by FISHtoFigure. The naïve dataset comprised a total of 1229 cells of which 273 contained transcripts of Cd4 or Cd79a . A total of 18 cells were labelled as Cd4 + Cd79a + , representing approximately 1.5% of all cells and 6.6% of transcript-expressing cells (Fig.  2 b). The infected dataset comprised a total of 1487 cells of which 882 contained transcripts from either marker. The infected dataset showed a higher presumed mis-categorisation rate, with 171 cells (11.5% of all cells and 19.4% of transcript-expressing cells) labelled as Cd4 + Cd79a + (Fig.  2 b).

Upon closer inspection of the quantified data, many of the apparently Cd4 + Cd79a + cells contained a majority of transcripts from one gene, suggesting that mis-categorisation typically resulted from a small number of transcripts from the other gene. This could be plausibly explained if incorrect boundary approximations caused a small proportion of transcripts to be mis-allocated between highly localised cells. For example, a B-cell in close proximity to T-cell might appear to contain a single Cd4 transcript due to cell boundary approximation (Fig.  2 c). In such cases, it is reasonable to assume the cell identity based on the majority transcript. To address this, we introduced a thresholding feature so that users can define the minimum number of transcripts from each mRNA target required for cells to be included in analysis. By setting this threshold at 2 transcripts from each mRNA, the population of Cd4 + Cd79a + cells was eliminated in the naïve dataset and substantially reduced (67 cells, representing 4.5% of all cells and 7.5% of transcript-expressing cells) in the infected dataset (Fig.  2 b). This was consistent with the model that Cd4 + Cd79a + cells were artefacts, and showed that thresholding allowed this source of error to be controlled.

Having demonstrated that FISHtoFigure can quantify cell types based on mRNA expression profiles, we progressed to a more challenging system containing cells with less regular boundaries. To do this we examined sections of mouse brains, which contain highly ramified cell types, using data from a study exploring the interactions between regulatory B-cells (Bregs) and microglia during infection with T. b. brucei 4 . The brain dataset comprised 17 images captured from brain sections of infected mice and 9 captured from uninfected (naïve) controls. Brain sections were stained using DAPI and probed for Cd79a (a B-cell marker) , Cx3cr1 (a microglia marker) , Il10 (an anti-inflammatory cytokine hypothesised to be involved in Breg–microglia interactions) , and Il10ra (the receptor for Il10 ) 9 , 10 . These images were quantified in QuPath and concatenated into two datasets comprising naïve control data and infected data.

Cx3cr1 is a well-established microglia marker 10 . B-cells do not express Cx3cr1 and microglia do not express Cd79a . Similarly to the spleen dataset, in order to examine to what extent the thresholding function could improve cell type quantification in data containing ramified cells, presumptively mis-categorised Cd79a + Cx3cr1 + cells were quantified. The naïve dataset contained 1631 cells, 914 of which contained transcripts. 30 cells were labelled Cd79a + Cx3cr1 + double-positive (1.8% of all cells, 3.3% of transcript-expressing cells). The infected dataset contained 3907 cells, of which 3332 contained transcripts, 392 were labelled as Cd79a + Cx3cr1 + double-positive (10% of all cells, 11.7% of transcript-expressing cells; Fig.  3 a). Applying a threshold of 2 transcripts per mRNA per cell reduced the number of Cd79a + Cx3cr1 + double-positive cells to 4 in the naïve dataset (0.2% of all cells, 0.4% of transcript-expressing cells), and 76 in the infected dataset (1.9% of all cells, 2.3% of transcript-expressing cells; Fig.  3 a). This demonstrated that applying thresholds for transcript abundance could allow accurate allocation of transcripts to cells even for cells with complex and irregular boundaries.

figure 3

Examples of FISHtoFigure outputs from analysis of the brain dataset. ( a ) Total number of cells expressing both Cd79a and Cx3cr1 with threshold of either 1 or 2 transcripts per mRNA per cell. ( b ) Fluorescence intensities for each marker all cells which express a transcript from at least one marker, where each point represents a cell, showing that Cd79a and Il10 expression are significantly upregulated during infection (Mann–Whitney test). Box limits are defined by the interquartile range (IQR) with whiskers extending to the lowest/highest data point still within 1.5 IQR of the lower/upper quartile. This figure re-plots data originally collected in Ref. 4 . ( c ) Percentage of cells expressing both Cd79a and Il10 (Bregs) and percentage expressing both Cx3cr1 and Il10ra (Microglia) in smFISH images from the naïve and infected datasets, showing that both are significantly upregulated during infection (Mann–Whitney test). Bars represent mean values across all images in the dataset, each dot represents a single image from the dataset. Percentages were taken from each image in the naïve (n = 9) and infected (n = 17) datasets individually and statistical analysis was performed in GraphPad PRISM, a non-parametric test was selected due to lack of normality.

Finally, as a demonstration of the application of FISHtoFigure in an experimental workflow, we re-analysed data that we had collected as part of a study of Breg-microglia crosstalk in the brains of mice infected with T. b. brucei 4 . Briefly, single cell and spatial transcriptomic analyses of infected mice revealed an upregulation of the anti-inflammatory cytokine Il10 , along with Breg and microglia associated transcripts, in the brains of T. b. brucei infected mice. We tested the hypothesis that during infection Il10 expression governed crosstalk between Bregs and microglia in the brain, using smFISH and FISHtoFigure to investigate the localisation of transcripts. FISHtoFigure’s “Transcript abundance analysis” function revealed a statistically significant upregulation in Cd79a and Il10 expression in infected specimens compared to naïve controls, in agreement with results from single cell transcriptomics 4 . Graphical outputs in the format produced by FISHtoFigure are presented in Fig.  3 b (p < 0.01, Mann–Whitney test; data from Ref. 4 ). We then used a variety of analyses to validate that this crosstalk was driven by two specific cell types ( Il10 + Bregs and Il10ra + microglia), including visualising these cell types using smFISH. Here, we expand on this analysis by using FISHtoFigure to directly quantify the abundance of two different double-positive cell types in infected and naïve mice. We used FISHtoFigure’s “Multi-target transcript abundance” feature to analyse Cd79a + Il10 + Breg populations and Cx3cr1 + Il10ra + microglia populations in naïve and infected specimens. This analysis confirmed that during infection there was an upregulation of both Cd79a + Il10 + Bregs (Fig.  3 c; p < 0.02, Mann–Whitney test) and Cx3cr1 + Il10ra + microglia (Fig.  3 c; p < 0.01, Mann–Whitney test). In the context of the current paper, this demonstrates that FISHtoFigure can accurately quantify the abundance of specific cell types, including those with irregular boundaries, using multiplex expression profiles. Taken together, these findings demonstrate the value of FISHtoFigure in an experimental workflow.

FISHtoFigure automates the extraction and processing of transcriptomic data from QuPath-quantified smFISH data, allowing users to analyse specific transcript expression profiles in datasets that would otherwise be very difficult to parse.

Our tool is capable of analysing smFISH data by any number of mRNA targets and quantifying cell types and expression profiles with a high accuracy. Furthermore, the graphical user interface allows users to specify a positivity threshold for transcript abundance analysis (i.e., the number of transcripts required for a cell to be marked as positive, and by extension, be included in analysis), allowing users to directly control the sensitivity of the FISHtoFigure platform individually for each set of analyses.

Current analysis packages for smFISH data are largely focused on quantification and labelling of transcripts and only offer limited downstream transcript abundance analysis options, which require programming experience to implement. For example, FISH-quant provides a means to detect transcripts in smFISH data and assign individual transcripts to cells and subcellular compartments 11 . FISH-quant offers downstream analysis options for mRNA expression, but this analysis is largely focused on the intracellular distributions of transcripts rather than the quantification of cells that express multiple mRNA targets. Another smFISH analysis tool, dotdotdot , outputs quantified cell and transcript data in a format interpretable using R or Python. However, programming experience is required to implement downstream analysis 12 .

FISHtoFigure facilitates custom differential transcript and cell type abundance analyses without the need for custom code. By providing multi-transcript analysis tools in an intuitive package, FISHtoFigure significantly broadens the accessibility of smFISH analysis.

Comparison of FISHtoFigure’s spatial distribution plots with the confocal microscopy images from which they were derived demonstrates high levels of concordance between raw and quantified data (Fig.  1 b). We demonstrate that FISHtoFigure can accurately determine cell profiles in different biological systems, and that the in-built thresholding feature can substantially reduce mis-categorisation caused by the close proximity of different cell types (Fig.  2 ). It is advised that users adjust this threshold to obtain the best results from different experimental designs. In order to determine the most appropriate value for this threshold for each experiment presented here, combinations of transcripts which were not expected to be representative of any cell type in each dataset were analysed. This allowed us to adjust the false discovery rate: the threshold was adjusted until the number of cells expressing these “impossible” combinations of transcripts was below a predetermined percentage of the total number of cells in the dataset. We suggest that users conduct similar analyses of their own data in order to determine an appropriate threshold value for each experiment. In the spleen dataset, applying a threshold of 2 transcripts per mRNA target per cell completely removed all mis-categorised cells in the naïve dataset and reduced mis-categorisation by > 60% in the infected dataset (Fig.  2 b). In the brain dataset, applying a threshold of 2 transcripts per mRNA target per cell reduced mis-categorisation of cell types by > 80% in both the naïve and infected datasets (Fig.  3 a).

In the brain dataset, FISHtoFigure enabled rapid analysis of smFISH data which would otherwise require considerable time investment and programming experience. FISHtoFigure analysis reveals a statistically significant (p < 0.01, Mann–Whitney test) upregulation in expression of Cd79a and Il10 during infection (Fig.  3 b).The ability to analyse and plot cellular information for specific cell types with multiplex transcriptional profiles allowed us to identify the upregulation of Cd79a + Il10 + Bregs and Cx3cr1 + Il10ra + microglia in infected specimens compared with controls, a difference which would otherwise require custom code to assess (Fig.  3 c).

Considerations and limitations

The labelling of transcripts is the first step of our quantification pipeline and poor sensitivity or specificity at this step will have a compounding effect on the accuracy of cell type analysis performed by FISHtoFigure. Many smFISH methods are available, and in this study we used the RNAScope assay, a well-established and widely-used method 1 . The RNAScope assay incorporates various features to ensure accurate and reproducible labelling of transcripts, such as a pair-wise probe design in which probes will not fluoresce unless adjacent probes also bind, significantly reducing the signal from non-specific binding 1 . Additionally, the RNAScope assay includes a set of positive control probes specific to the species the tissue samples are taken from. These probes target common housekeeping genes present in all cell types in the sample. Because the abundance levels of these target genes are well characterised (and by extension fluorophores will bind to these targets at known levels), processing samples with these probes prior to the final experiment provide users with a means to check that fluorophores used to visualise target probes are binding correctly and are of approximately equal brightness. Finally, the assay includes a set of negative control probes which target genes which are not expressed by any cell in the sample, and therefore any fluorescence observed in images of samples processed with these probes can be treated as background signal (generally arising from fluorophore remaining in the tissue after the wash steps). The maximum fluorescence for each fluorophore in these images can then be used to set minimum detection thresholds in experimental data (i.e. any fluorescent signal below this threshold treated as background and is removed). Using this approach, we can be confident that quantified fluorescent spots in experimental data represent true signal from transcripts.

The variety of smFISH methods available means that users may wish to analyse different formats of input data. FISHtoFigure was designed for the analysis of QuPath output files, but we have intentionally built FISHtoFigure as a modular tool, separating the data harvesting step (in which data is pulled from the QuPath output) from the analysis steps. As a result, the data harvesting section can be adjusted easily, without interfering with any downstream analysis steps. At present, to modify FISHtoFigure to work with data quantified using another platform (e.g. CellProfiler) simply requires changes to the specific flags which the program uses to identify information in the quantified data file. Information on how to make these changes to the tool are available in the “FISHtoFigure v1.0.1 User Guide” document in the FISHtoFigure GitHub repository.

Regarding identifying cell boundaries, QuPath has the capacity to quantify cell boundaries based on nuclear staining, or via a fluorescent membrane marker. Here, cell nuclei were identified via fluorescent DAPI staining, and cell boundaries were approximated by applying a set radius based on tissue cell type composition to each identified nucleus using the “Cell Detection” function in QuPath. Though we demonstrate that this can allow the accurate quantification of cells, even for cell types with irregular boundaries, further improvements in the determination of cell boundaries, and by extension cell expression profiles, could likely be achieved through adjustments in sample preparation. Though the threshold function included within FISHtoFigure can be used to eliminate the majority of cell type misclassification events, the use of a membrane marker would further improve cell type quantification. We advise the inclusion of a membrane marker if users find that they cannot sufficiently eliminate misclassification events using the inbuilt thresholding function.

The problem of balancing accessibility for non-specialist users and analytical scope is an important consideration in the development of software tools. Here, we present FISHtoFigure, an analytical platform for QuPath-quantified smFISH data capable of analysing specific cell types and multiplex transcriptomic profiles and of generating a variety of differential transcript abundance analytics for cells expressing a user-specified combination of mRNA transcripts. In the interest of accessibility for users with all levels of computational image analysis experience, we have created a simple graphical user interface and packaged FISHtoFigure as an executable program, thus allowing transcript expression analysis without interaction with raw quantified image data or custom analysis scripts. FISHtoFigure can therefore expand the in-house analysis capabilities of many research groups investigating transcriptomics via smFISH.

Data availability

All code involved in the production of the FISHtoFigure package and all analysis presented here is available on GitHub: https://github.com/Calum-Bentley-Abbot/FISHtoFigure.git Data are available under the terms of the MIT open access licence ( https://opensource.org/license/mit/ ).

Wang, F. et al. RNAscope: A novel in situ RNA analysis platform for formalin-fixed, paraffin-embedded tissues. J. Mol. Diagn. JMD 14 , 22–29 (2012).

Article   CAS   PubMed   Google Scholar  

Marx, V. Method of the year: Spatially resolved transcriptomics. Nat. Methods 18 , 9–14 (2021).

Bankhead, P. et al. QuPath: Open source software for digital pathology image analysis. Sci. Rep. 7 , 16878 (2017).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Quintana, J. F. et al. Single cell and spatial transcriptomic analyses reveal microglia-plasma cell crosstalk in the brain during Trypanosoma brucei infection. Nat. Commun. 13 , 5752 (2022).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Reback, J. et al. pandas-dev/pandas: Pandas 1.4.3. (2022) https://doi.org/10.5281/ZENODO.3509134 .

Caswell, T. A. et al. matplotlib/matplotlib: REL: v3.4.3. (2021) https://doi.org/10.5281/ZENODO.5194481 .

Waskom, M. seaborn: Statistical data visualization. J. Open Source Softw. 6 , 3021 (2021).

Article   ADS   Google Scholar  

Luckheeram, R. V., Zhou, R., Verma, A. D. & Xia, B. CD4+T cells: Differentiation and functions. Clin. Dev. Immunol. 2012 , 925135 (2012).

Article   PubMed   PubMed Central   Google Scholar  

Mason, D. Y. et al. CD79a: A novel marker for B-cell neoplasms in routinely processed tissue samples. Blood 86 , 1453–1459 (1995).

Wolf, Y., Yona, S., Kim, K.-W. & Jung, S. Microglia, seen from the CX3CR1 angle. Front. Cell. Neurosci. 7 , 26 (2013).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Imbert, A. et al. FISH-quant v2: A scalable and modular tool for smFISH image analysis. RNA 28 , 786–795 (2022).

Maynard, K. R. et al. dotdotdot: An automated approach to quantify multiplex single molecule fluorescent in situ hybridization (smFISH) images in complex tissues. Nucleic Acids Res. 48 , e66 (2020).

Download references

Acknowledgements

We thank Dr. Juan F. Quintana for generating and allowing us to use the mouse brain dataset analysed in this paper. The work to produce this dataset was funded by a Sir Henry Wellcome postdoctoral fellowship (221640/Z/20/Z to J.F.Q.) and a Wellcome Trust ISSF Catalyst grant awarded to J.F.Q. (204820/Z/16/Z to JFQ). We also thank Colin Loney for his assistance in the acquisition of the images forming the spleen dataset used during validation. We also thank Ruaridh Wilson for providing feedback from the perspective of a computational scientist during the writing of this manuscript.

CBA is funded by a Wellcome Trust Four-Year PhD Studentship in Basic Science [226861/Z/23/Z]. RH is funded by a Wellcome Trust Four-Year Studentship in Basic Science [227095/Z/23/Z]. CP and ER are funded by Cancer Research UK [A_BICR_1920_Roberts] awarded to ER. AML is funded by a Wellcome Trust Senior Research fellowship [209511/Z/17/Z]. PC is funded by a Wellcome Trust Senior Research fellowship [209511/Z/17/Z] awarded to AML. GM is supported by the UK Medical Research Council [MR/K015583/1], Biotechnology & Biological Sciences Research Council [BB/P02565X/1, BBT011602], and the Leverhulme Trust. E.H. is funded by a Transition Support Award from the UK Medical Research Council [MR/V035789/1].

Author information

Authors and affiliations.

Wellcome Centre for Integrative Parasitology (WCIP), University of Glasgow, Glasgow, UK

Calum Bentley-Abbot, Rhiannon Heslop, Praveena Chandrasegaran & Annette MacLeod

School of Biodiversity, One Health, Veterinary Medicine (SBOHVM), College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK

MRC-University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, UK

Calum Bentley-Abbot & Edward Hutchinson

Beatson Institute for Cancer Research, Glasgow, UK

Chiara Pirillo & Ed Roberts

Department of Physics, University of Strathclyde, Glasgow, UK

Gail McConnell

You can also search for this author in PubMed   Google Scholar

Contributions

C.B.A. was responsible for the conceptualisation, experimentation and software produced in the study, as well as the writing and editing of the manuscript. R.H. was responsible for conceptualisation and experimentation. P.C., C.P. and E.R. were responsible for experimentation. E.R., E.H., G.M., and A.M.L. were responsible for the supervision of this work writing and editing of the manuscript. All authors contributed to the editing of the final manuscript draft for publication.

Corresponding author

Correspondence to Calum Bentley-Abbot .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Bentley-Abbot, C., Heslop, R., Pirillo, C. et al. An easy to use tool for the analysis of subcellular mRNA transcript colocalisation in smFISH data. Sci Rep 14 , 8348 (2024). https://doi.org/10.1038/s41598-024-58641-3

Download citation

Received : 14 September 2023

Accepted : 01 April 2024

Published : 09 April 2024

DOI : https://doi.org/10.1038/s41598-024-58641-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

quantitative data analysis tools in research

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organisations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organise and summarise the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalise your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarise your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, frequently asked questions about statistics.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalise your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalisable findings, you should use a probability sampling method. Random selection reduces sampling bias and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to be biased, they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalising your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalise your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialised, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalised in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardised indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarise them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organising data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualising the relationship between two variables using a scatter plot .

By visualising your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimise the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasises null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
  • If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Statistical analysis is the main method for analyzing quantitative research data . It uses probabilities and models to test predictions about a population from sample data.

Is this article helpful?

Other students also liked, a quick guide to experimental design | 5 steps & examples, controlled experiments | methods & examples of control, between-subjects design | examples, pros & cons, more interesting articles.

  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Correlation Coefficient | Types, Formulas & Examples
  • Descriptive Statistics | Definitions, Types, Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | Meaning, Formula & Examples
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Inferential Statistics | An Easy Introduction & Examples
  • Levels of measurement: Nominal, ordinal, interval, ratio
  • Missing Data | Types, Explanation, & Imputation
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Skewness | Definition, Examples & Formula
  • T-Distribution | What It Is and How To Use It (With Examples)
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Interval Data? | Examples & Definition
  • What Is Nominal Data? | Examples & Definition
  • What Is Ordinal Data? | Examples & Definition
  • What Is Ratio Data? | Examples & Definition
  • What Is the Mode in Statistics? | Definition, Examples & Calculator

IMAGES

  1. Quantitative research tools for data analysis

    quantitative data analysis tools in research

  2. Quantitative Research

    quantitative data analysis tools in research

  3. Standard statistical tools in research and data analysis

    quantitative data analysis tools in research

  4. Quantitative Data: What it is, Types & Examples

    quantitative data analysis tools in research

  5. Your Guide to Qualitative and Quantitative Data Analysis Methods

    quantitative data analysis tools in research

  6. Top 4 Data Analysis Techniques

    quantitative data analysis tools in research

VIDEO

  1. Introduction to Quantitative Data Analysis

  2. QUANTITATIVE DATA ANALYSIS EVENING SESSION DR LIA

  3. Day-5 Application of SPSS for Data Analysis (Quantitative Data Analysis)

  4. Quantitative Data Analysis on SPSS

  5. Day-5 Application of SPSS for Data Analysis (Quantitative Data Analysis)

  6. Creating Bihistogram

COMMENTS

  1. Quantitative Data Analysis: A Comprehensive Guide

    Quantitative data has to be gathered and cleaned before proceeding to the stage of analyzing it. Below are the steps to prepare a data before quantitative research analysis: Step 1: Data Collection. Before beginning the analysis process, you need data. Data can be collected through rigorous quantitative research, which includes methods such as ...

  2. PDF TOOLS AND BEST PRACTICES IN QUANTITATIVE RESEARCH

    quantitative and geospatial data unstructured text as data. Imagine that you have data for all the deaths of all Medicare beneficiaries in the US 2000-2012 (~half a million person-years) and want to model the effect of air pollution levels on death, controlling for other factors that also affect death (such as smoking, BMI).

  3. Quantitative Data Analysis Methods & Techniques 101

    Quantitative data analysis is one of those things that often strikes fear in students. It's totally understandable - quantitative analysis is a complex topic, full of daunting lingo, like medians, modes, correlation and regression.Suddenly we're all wishing we'd paid a little more attention in math class…. The good news is that while quantitative data analysis is a mammoth topic ...

  4. What Is Quantitative Research?

    Revised on June 22, 2023. Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations. Quantitative research is the opposite of qualitative research, which involves collecting and analyzing ...

  5. Data Analysis in Quantitative Research

    Abstract. Quantitative data analysis serves as part of an essential process of evidence-making in health and social sciences. It is adopted for any types of research question and design whether it is descriptive, explanatory, or causal. However, compared with qualitative counterpart, quantitative data analysis has less flexibility.

  6. The Beginner's Guide to Statistical Analysis

    Statistical analysis means investigating trends, patterns, and relationships using quantitative data. It is an important research tool used by scientists, governments, businesses, and other organizations. To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process. You need to specify ...

  7. 10 Data Analysis Tools and When to Use Them

    Whether you are part of a small or large organization, learning how to effectively utilize data analytics can help you take advantage of the wide range of data-driven benefits. 1. RapidMiner. Primary use: Data mining. RapidMiner is a comprehensive package for data mining and model development.

  8. The 9 Best Quantitative Data Analysis Software and Tools

    6. Kissmetrics. Kissmetrics is a software for quantitative data analysis that focuses on customer analytics and helps businesses understand user behavior and customer journeys. Kissmetrics lets you track user actions, create funnels to analyze conversion rates, segment your user base, and measure customer lifetime value.

  9. Tools for Analyzing Quantitative Data

    Two commonly used statistical analysis packages described later in this chapter (SPSS and SAS) offer comprehensive data analysis tools for hypothesis testing. Spreadsheet and Relational Database Packages. Many application tools not created for quantitative data research have become sufficiently powerful to be used for that today.

  10. Quantitative Data Analysis Methods, Types + Techniques

    8. Weight customer feedback. So far, the quantitative data analysis methods on this list have leveraged numeric data only. However, there are ways to turn qualitative data into quantifiable feedback and to mix and match data sources. For example, you might need to analyze user feedback from multiple surveys.

  11. Basic statistical tools in research and data analysis

    The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  12. Quantitative Data Analysis: A Complete Guide

    Here's how to make sense of your company's numbers in just four steps: 1. Collect data. Before you can actually start the analysis process, you need data to analyze. This involves conducting quantitative research and collecting numerical data from various sources, including: Interviews or focus groups.

  13. Research Guides: Quantitative Analysis Guide: Which Statistical

    Stata was first released in January 1985 as a regression and data management package with 44 commands, written by Bill Gould and Sean Becketti. The name Stata is a syllabic abbreviation of the words statistics and data. The graphical user interface (menus and dialog boxes) was released in 2003. Users. Economics; Sociology; Political Science ...

  14. Quantitative Data Analysis

    Offers a guide through the essential steps required in quantitative data analysis; Helps in choosing the right method before starting the data collection process; ... executing and reporting appropriate data analysis methods to answer their research questions. It provides readers with a basic understanding of the steps that each method involves ...

  15. 101 Guide to Quantitative Data Analysis [Methods + Techniques]

    Step 3: Data cleaning. As discussed earlier, quantitative data doesn't remain highly accurate as they are always chances of errors. Due to this, quantitative data analysis goes through many stages of cleaning. Firstly, analysts start with data validation to identify if the data was collected based on defined procedures.

  16. A Comprehensive Guide to Quantitative Research Methods: Design, Data

    Quantitative Research: Focus: Quantitative research focuses on numerical data, seeking to quantify variables and examine relationships between them. It aims to provide statistical evidence and generalize findings to a larger population. Measurement: Quantitative research involves standardized measurement instruments, such as surveys or questionnaires, to collect data.

  17. Quantitative Research

    Replicable: Quantitative research aims to be replicable, meaning that other researchers should be able to conduct similar studies and obtain similar results using the same methods. Statistical analysis: Quantitative research involves using statistical tools and techniques to analyze the numerical data collected during the research process ...

  18. Quantitative Methods

    Definition. Quantitative method is the collection and analysis of numerical data to answer scientific research questions. Quantitative method is used to summarize, average, find patterns, make predictions, and test causal associations as well as generalizing results to wider populations.

  19. (PDF) Quantitative Research Tools

    Quantitative Research Tools: Quantitative research tools consist of different types of questionnaires, surveys, struct ured. interviews, and behavioural observation which are based upon explicit ...

  20. Quantitative Data

    The purpose of quantitative data is to provide a numerical representation of a phenomenon or observation. Quantitative data is used to measure and describe the characteristics of a population or sample, and to test hypotheses and draw conclusions based on statistical analysis. Some of the key purposes of quantitative data include:

  21. JASP

    Dr. Henrik Singmann, University College London. Dr. Felix Schönbrodt, LMU Munich. For more details on the scientific advisory board, click here. JASP is an open-source statistics program that is free, friendly, and flexible. Armed with an easy-to-use GUI, JASP allows both classical and Bayesian analyses.

  22. Data Analysis Techniques for Quantitative Study

    Multivariate data analysis is an essential tool for social researchers, and it is the most widely used technique in quantitative social research projects. In social science, much of the interest lies in the relationships between many variables.

  23. An easy to use tool for the analysis of subcellular mRNA ...

    Here, we present FISHtoFigure, a new software tool developed specifically for the analysis of mRNA abundance and co-expression in QuPath-quantified, multi-labelled smFISH data.

  24. The Beginner's Guide to Statistical Analysis

    Statistical analysis means investigating trends, patterns, and relationships using quantitative data. It is an important research tool used by scientists, governments, businesses, and other organisations. To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process. You need to specify ...

  25. Point Pattern Analysis (PPA) as a tool for reproducible ...

    Point Pattern Analysis (PPA) has gained momentum in archaeological research, particularly in site distribution pattern recognition compared to supra-regional environmental variables. While PPA is now a statistically well-established method, most of the data necessary for the analyses are not freely accessible, complicating reproducibility and transparency. In this article, we present a fully ...