• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

what is unit of analysis in research

Home Market Research Research Tools and Apps

Unit of Analysis: Definition, Types & Examples

A unit of analysis is what you discuss after your research, probably what you would regard to be the primary emphasis of your research.

The unit of analysis is the people or things whose qualities will be measured. The unit of analysis is an essential part of a research project. It’s the main thing that a researcher looks at in his research.

A unit of analysis is the object about which you hope to have something to say at the end of your analysis, perhaps the major subject of your research.

In this blog, we will define:

  • Definition of “unit of analysis”

Types of “unit of analysis”

What is a unit of analysis.

A unit of analysis is the thing you want to discuss after your research, probably what you would regard to be the primary emphasis of your research.

The researcher plans to comment on the primary topic or object in the research as a unit of analysis. The research question plays a significant role in determining it. The “who” or “what” that the researcher is interested in investigating is, to put it simply, the unit of analysis.

In his book “Man, the State, and War” from 2001, author Waltz divides the world into three distinct spheres of study: the individual, the state, and war.

Understanding the reasoning behind the unit of analysis is vital. The likelihood of fruitful research increases if the rationale is understood. An individual, group, organization, nation, social phenomenon, etc., are a few examples.

LEARN ABOUT: Data Analytics Projects

In business research, there are almost unlimited types of possible analytical units. Data analytics and data analysis are closely related processes that involve extracting insights from data to make informed decisions. Even though the most typical unit of analysis is the individual, many research questions can be more precisely answered by looking at other types of units. Let’s find out, 

Individual Level

The most prevalent unit of analysis in business research is the individual. These are the primary analytical units. The researcher may be interested in looking into:

  • Employee actions
  • Perceptions
  • Attitudes, or opinions.

Employees may come from wealthy or low-income families, as well as from rural or metropolitan areas.

A researcher might investigate if personnel from rural areas are more likely to arrive on time than those from urban areas. Additionally, he can check whether workers from rural areas who come from poorer families arrive on time compared to those from rural areas who come from wealthy families.

Each time, the individual (employee) serving as the analytical unit is discussed and explained. Employee analysis as a unit of analysis can shed light on issues in business, including customer and human resource behavior.

For example, employee work satisfaction and consumer purchasing patterns impact business, making research into these topics vital.

Psychologists typically concentrate on the research of individuals. The research of individuals may significantly aid the success of a firm. Their knowledge and experiences reveal vital information. Individuals are so heavily utilized in business research.

Aggregates Level

People are not usually the focus of social science research. By combining the reactions of individuals, social scientists frequently describe and explain social interactions, communities, and groupings. Additionally, they research the collective of individuals, including communities, groups, and countries.

Aggregate levels can be divided into two types: Groups (groups with an ad hoc structure) and Organizations (groups with a formal organization).

Groups of people make up the following levels of the unit of analysis. A group is defined as two or more individuals interacting, having common traits, and feeling connected to one another. 

Many definitions also emphasize interdependence or objective resemblance (Turner, 1982; Platow, Grace, & Smithson, 2011) and those who identify as group members (Reicher, 1982) .

As a result, society and gangs serve as examples of groups. According to Webster’s Online Dictionary (2012), they can resemble some clubs but be far less formal.

Siblings, identical twins, family, and small group functioning are examples of studies with many units of analysis.

In such circumstances, a whole group might be compared to another. Families, gender-specific groups, pals, Facebook groups, and work departments can all be groups.

By analyzing groups, researchers can learn how they form and how age, experience, class, and gender affect them. When aggregated, an individual’s data describes the group to which they belong.

LEARN ABOUT: Data Management Framework

Sociologists study groups like economists. Businesspeople form teams to complete projects. They’re continually researching groups and group behavior.

Organizations

The next level of the unit of analysis is organizations, which are groups of people. Organizations are groups set up formally. It could include businesses, religious groups, parts of the military, colleges, academic departments, supermarkets, business groups, and so on.

The social organization includes things like sexual composition, styles of leadership, organizational structure, systems of communication, and so on. (Susan & Wheelan, 2005; Chapais & Berman, 2004) . (Lim, Putnam, and Robert, 2010) say that well-known social organizations and religious institutions are among them.

Moody, White, and Douglas (2003) say that social organizations are hierarchical. Hasmath, Hildebrandt, and Hsu (2016) say that social organizations can take different forms. For example, they can be made by institutions like schools or governments.

Sociology, economics, political science, psychology, management, and organizational communication (Douma & Schreuder, 2013) are some social science fields that study organizations.

Organizations are different from groups in that they are more formal and have better organization. A researcher might want to study a company to generalize its results to the whole population of companies.

One way to look at an organization is by the number of employees, the net annual revenue, the net assets, the number of projects, and so on. He might want to know if big companies hire more or fewer women than small companies.

Organization researchers might be interested in how companies like Reliance, Amazon, and HCL affect our social and economic lives. People who work in business often study business organizations.

Social Level

The social level has 2 types,

Social Artifacts Level

Things are studied alongside humans. Social artifacts are human-made objects from diverse communities. Social artifacts are items, representations, assemblages, institutions, knowledge, and conceptual frameworks used to convey, interpret, or achieve a goal (IGI Global, 2017).

Cultural artifacts are anything humans generate that reveals their culture (Watts, 1981).

Social artifacts include books, newspapers, advertising, websites, technical devices, films, photographs, paintings, clothes, poems, jokes, students’ late excuses, scientific breakthroughs, furniture, machines, structures, etc. Infinite.

Humans build social objects for social behavior. As people or groups suggest a population in business research, each social object implies a class of items.

Same-class goods include business books, magazines, articles, and case studies. A business magazine’s quantity of articles, frequency, price, content, and editor in a research study may be characterized.

Then, a linked magazine’s population might be evaluated for description and explanation. Marx W. Wartofsky (1979) defined artifacts as primary artifacts utilized in production (like a camera), secondary artifacts connected to primary artifacts (like a camera user manual), and tertiary objects related to representations of secondary artifacts (like a camera user-manual sculpture).

An artifact’s scientific study reveals its creators and users. The artifacts researcher may be interested in advertising, marketing, distribution, buying, etc.

Social Interaction Level

Social artifacts include social interaction. Such as:

  • Eye contact with a coworker
  • Buying something in a store
  • Friendship decisions
  • Road accidents
  • Airline hijackings
  • Professional counseling
  • Whatsapp messaging

A researcher might study youthful employees’ smartphone addictions . Some addictions may involve social media, while others involve online games and movies that inhibit connection.

Smartphone addictions are examined as a societal phenomenon. Observation units are probably individuals (employees).

Anthropologists typically study social artifacts. They may be interested in the social order. A researcher who examines social interactions may be interested in how broader societal structures and factors impact daily behavior, festivals, and weddings.

LEARN ABOUT: Level of Analysis

Even though there is no perfect way to do research, it is generally agreed that researchers should try to find a unit of analysis that keeps the context needed to make sense of the data.

Researchers should consider the details of their research when deciding on the unit of analysis. 

They should keep in mind that consistent use of these units throughout the analysis process (from coding to developing categories and themes to interpreting the data) is essential to gaining insight from qualitative data and protecting the reliability of the results.

QuestionPro does much more than merely serve as survey software. For every sector of the economy and every kind of issue, we have a solution. We also have systems for managing data, such as our research repository Insights Hub.

LEARN MORE         FREE TRIAL

MORE LIKE THIS

ux research software

Top 17 UX Research Software for UX Design in 2024

Apr 5, 2024

Healthcare Staff Burnout

Healthcare Staff Burnout: What it Is + How To Manage It

Apr 4, 2024

employee retention software

Top 15 Employee Retention Software in 2024

employee development software

Top 10 Employee Development Software for Talent Growth

Apr 3, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Join thousands of product people at Insight Out Conf on April 11. Register free.

Insights hub solutions

Analyze data

Uncover deep customer insights with fast, powerful features, store insights, curate and manage insights in one searchable platform, scale research, unlock the potential of customer insights at enterprise scale.

Featured reads

what is unit of analysis in research

Inspiration

Three things to look forward to at Insight Out

Create a quick summary to identify key takeaways and keep your team in the loop.

Tips and tricks

Make magic with your customer data in Dovetail

what is unit of analysis in research

Four ways Dovetail helps Product Managers master continuous product discovery

Events and videos

© Dovetail Research Pty. Ltd.

Unit of analysis: definition, types, examples, and more

Last updated

16 April 2023

Reviewed by

Cathy Heath

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • What is a unit of analysis?

A unit of analysis is an object of study within a research project. It is the smallest unit a researcher can use to identify and describe a phenomenon—the 'what' or 'who' the researcher wants to study. 

For example, suppose a consultancy firm is hired to train the sales team in a solar company that is struggling to meet its targets. To evaluate their performance after the training, the unit of analysis would be the sales team—it's the main focus of the study. 

Different methods, such as surveys , interviews, or sales data analysis, can be used to evaluate the sales team's performance and determine the effectiveness of the training.

  • Units of observation vs. units of analysis

A unit of observation refers to the actual items or units being measured or collected during the research. In contrast, a unit of analysis is the entity that a researcher can comment on or make conclusions about at the end of the study.

In the example of the solar company sales team, the unit of observation would be the individual sales transactions or deals made by the sales team members. In contrast, the unit of analysis would be the sales team as a whole.

The firm may observe and collect data on individual sales transactions, but the ultimate conclusion would be based on the sales team's overall performance, as this is the entity that the firm is hired to improve.

In some studies, the unit of observation may be the same as the unit of analysis, but researchers need to define both clearly to themselves and their audiences.

  • Unit of analysis types

Below are the main types of units of analysis:

Individuals – These are the smallest levels of analysis.

Groups – These are people who interact with each other.

Artifacts –These are material objects created by humans that a researcher can study using empirical methods.

Geographical units – These are smaller than a nation and range from a province to a neighborhood.

Social interactions – These are formal or informal interactions between society members.

  • Importance of selecting the correct unit of analysis in research

Selecting the correct unit of analysis helps reveal more about the subject you are studying and how to continue with the research. It also helps determine the information you should use in the study. For instance, if a researcher has a large sample, the unit of analysis will help decide whether to focus on the whole population or a subset of it.

  • Examples of a unit of analysis

Here are examples of a unit of analysis:

Individuals – A person, an animal, etc.

Groups – Gangs, roommates, etc. 

Artifacts – Phones, photos, books, etc.  

Geographical units – Provinces, counties, states, or specific areas such as neighborhoods, city blocks, or townships

Social interaction – Friendships, romantic relationships, etc.

  • Factors to consider when selecting a unit of analysis

The main things to consider when choosing a unit of analysis are:

Research questions and hypotheses

Research questions can be descriptive if the study seeks to describe what exists or what is going on.

It can be relational if the study seeks to look at the relationship between variables. Or, it can be causal if the research aims at determining whether one or more variables affect or cause one or more outcome variables.

Your study's research question and hypothesis should guide you in choosing the correct unit of analysis.

Data availability and quality

Consider the nature of the data collected and the time spent observing each participant or studying their behavior. You should also consider the scale used to measure variables.

Some studies involve measuring every variable on a one-to-one scale, while others use variables with discrete values. All these influence the selection of a unit of analysis.

Feasibility and practicality

Look at your study and think about the unit of analysis that would be feasible and practical.

Theoretical framework and research design

The theoretical framework is crucial in research as it introduces and describes the theory explaining why the problem under research exists. As a structure that supports the theory of a study, it is a critical consideration when choosing the unit of analysis. Moreover, consider the overall strategy for collecting responses to your research questions.

  • Common mistakes when choosing a unit of analysis

Below are common errors that occur when selecting a unit of analysis:

Reductionism

This error occurs when a researcher uses data from a lower-level unit of analysis to make claims about a higher-level unit of analysis. This includes using individual-level data to make claims about groups.

However, claiming that Rosa Parks started the movement would be reductionist. There are other factors behind the rise and success of the US civil rights movement. These include the Supreme Court’s historic decision to desegregate schools, protests over legalized racial segregation, and the formation of groups such as the Student Nonviolent Coordinating Committee (SNCC). In short, the movement is attributable to various political, social, and economic factors.  

Ecological fallacy

This mistake occurs when researchers use data from a higher-level unit of analysis to make claims about one lower-level unit of analysis. It usually occurs when only group-level data is collected, but the researcher makes claims about individuals.

For instance, let's say a study seeks to understand whether addictions to electronic gadgets are more common in certain universities than others.

The researcher moves on and obtains data on the percentage of gadget-addicted students from different universities around the country. But looking at the data, the researcher notes that universities with engineering programs have more cases of gadget additions than campuses without the programs.

Concluding that engineering students are more likely to become addicted to their electronic gadgets would be inappropriate. The data available is only about gadget addiction rates by universities; thus, one can only make conclusions about institutions, not individual students at those universities.

Making claims about students while the data available is about the university puts the researcher at risk of committing an ecological fallacy.

  • The lowdown

A unit of analysis is what you would consider the primary emphasis of your study. It is what you want to discuss after your study. Researchers should determine a unit of analysis that keeps the context required to make sense of the data. They should also keep the unit of analysis in mind throughout the analysis process to protect the reliability of the results.

What is the most common unit of analysis?

The individual is the most prevalent unit of analysis.

Can the unit of analysis and the unit of observation be one?

Some situations have the same unit of analysis and observation. For instance, let's say a tutor is hired to improve the oral French proficiency of a student who finds it difficult. A few months later, the tutor wants to evaluate the student's proficiency based on what they have taught them for the time period. In this case, the student is both the unit of analysis and the unit of observation.

Get started today

Go from raw data to valuable insights with a flexible research platform

Editor’s picks

Last updated: 21 December 2023

Last updated: 16 December 2023

Last updated: 6 October 2023

Last updated: 17 February 2024

Last updated: 5 March 2024

Last updated: 19 November 2023

Last updated: 15 February 2024

Last updated: 11 March 2024

Last updated: 12 December 2023

Last updated: 6 March 2024

Last updated: 10 April 2023

Last updated: 20 December 2023

Latest articles

Related topics, log in or sign up.

Get started for free

what is unit of analysis in research

Community Blog

Keep up-to-date on postgraduate related issues with our quick reads written by students, postdocs, professors and industry leaders.

The Unit of Analysis Explained

DiscoverPhDs

  • By DiscoverPhDs
  • October 3, 2020

Unit of Analysis

The unit of analysis refers to the main parameter that you’re investigating in your research project or study. Example of the different types of unit analysis that may be used in a project include:

  • Individual people
  • Groups of people
  • Objects such as photographs, newspapers and books
  • Geographical unit based on parameters such as cities or counties
  • Social parameters such as births, deaths, divorces

The unit of analysis is named as such because the unit type is determined based on the actual data analysis that you perform in your project or study.

For example, if your research is based around data on exam grades for students at two different universities, then the unit of analysis is the data for the individual student due to each student having an exam score associated with them.

Conversely if your study is based on comparing noise level data between two different lecture halls full of students, then your unit of analysis here is the collective group of students in each hall rather than any data associated with an individual student.

In the same research study involving the same students, you may perform different types of analysis and this will be reflected by having different units of analysis. In the example of student exam scores, if you’re comparing individual exam grades then the unit of analysis is the individual student.

On the other hand, if you’re comparing the average exam grade between two universities, then the unit of analysis is now the group of students as you’re comparing the average of the group rather than individual exam grades.

These different levels of hierarchies of units of analysis can become complex with multiple levels. In fact, its complexity has led to a new field of statistical analysis that’s commonly known as hierarchical modelling.

As a researcher, you need to be clear on what your specific research questio n is. Based on this, you can define each data, observation or other variable and how they make up your dataset.

A clarity of your research question will help you identify your analysis units and the appropriate sample size needed to obtain a meaningful result (and is this a random sample/sampling unit or something else).

In developing your research method, you need to consider whether you’ll need any repeated observation of each measurement. You also need to consider whether you’re working with qualitative data/qualitative research or if this is quantitative content analysis.

The unit of analysis of your study is the specifically ‘who’ or what’ it is that your analysing – for example are you analysing the individual student, the group of students or even the whole university. You may have to consider a different unit of analysis based on the concept you’re considering, even if working with the same observation data set.

Scope and Delimitation

The scope and delimitations of a thesis, dissertation or paper define the topic and boundaries of a research problem – learn how to form them.

How to Build a Research Collaboration

Learning how to effectively collaborate with others is an important skill for anyone in academia to develop.

Science Investigatory Project

A science investigatory project is a science-based research project or study that is performed by school children in a classroom, exhibition or science fair.

Join thousands of other students and stay up to date with the latest PhD programmes, funding opportunities and advice.

what is unit of analysis in research

Browse PhDs Now

Choosing a Good PhD Supervisor

Choosing a good PhD supervisor will be paramount to your success as a PhD student, but what qualities should you be looking for? Read our post to find out.

A Guide to Your First Week as a PhD Student

How should you spend your first week as a PhD student? Here’s are 7 steps to help you get started on your journey.

Becky-Smethurst_Profile

Dr Smethurst gained her DPhil in astrophysics from the University of Oxford in 2017. She is now an independent researcher at Oxford, runs a YouTube channel with over 100k subscribers and has published her own book.

DiscoverPhDs_Helena_Bates_Profile Image

Helena is a final year PhD student at the Natural History Museum in London and the University of Oxford. Her research is on understanding the evolution of asteroids through analysis of meteorites.

Join Thousands of Students

  • Unit of Analysis: Definition, Types & Examples

Olayemi Jemimah Aransiola

Introduction

A unit of analysis is the smallest level of analysis for a research project. It’s important to choose the right unit of analysis because it helps you make more accurate conclusions about your data.

What Is a Unit of Analysis?

A unit of analysis is the smallest element in a data set that can be used to identify and describe a phenomenon or the smallest unit that can be used to gather data about a subject. The unit of analysis will determine how you will define your variables, which are the things that you measure in your data. 

If you want to understand why people buy a particular product, you should choose a unit of analysis that focuses on buying behavior. This means choosing a unit of analysis that is relevant to your research topic and question .

For example, if you want to study the needs of soldiers in a war zone, you will need to choose an appropriate unit of analysis for this study: soldiers or the war zone. In this case, choosing the right unit of analysis would be important because it could help you decide if your research design is appropriate for this particular subject and situation.

Why is Choosing the Right Unit of Analysis Important?

The unit of analysis is important because it helps you understand what you are trying to find out about your subject, and it also helps you to make decisions about how to proceed with your research.

Choosing the right unit of analysis is also important because it determines what information you’re going to use in your research. If you have a small sample, then you’ll have to choose whether or not to focus on the entire population or just a subset of it. 

If you have a large sample, then you’ll be able to find out more about specific groups within your population. For example, if you want to understand why people buy certain types of products, then you should choose a unit of analysis that focuses on buying behavior. 

This means choosing a unit of analysis that is relevant to your research topic and question.

Unit of Analysis vs Unit of Observation

Unit of analysis is a term used to refer to a particular part of a data set that can be analyzed. For example, in the case of a survey, the unit of analysis is an individual: the person who was selected to take part in the survey. 

Unit of analysis is used in the social sciences to refer to the individuals or groups that have been studied. It can also be referred to as the unit of observation.

Unit of observation refers to a specific person or group in the study being observed by the researcher. An example would be a particular town, census tract, state, or other geographical location being studied by researchers conducting research on crime rates in that area.

Unit of analysis refers to the individual or group being studied by the researcher. An example would be an entire town being analyzed for crime rates over time.

Types of “Unit of Analysis”

The unit of analysis is a way to understand and study a phenomenon. There are four main types of unit of analysis: individuals, groups, artifacts (books, photos, newspapers), and geographical units (towns, census tracts, states).

  • Individuals are the smallest level of analysis. For example, an individual may be a person or an animal. A group can be composed of individuals or a collection of people who interact with each other. For example, an individual might go to college with other individuals or a family might live together as roommates. 
  • An artifact is anything that can be studied using empirical methods—including books and photos but also any physical object like knives or phones. 
  • A geographical unit is smaller than an entire country but larger than just one city block or neighborhood; it may be smaller than just two houses but larger than just two houses in the same street. 
  • Social interactions include dyadic relations (such as friendships or romantic relationships) and divorces among many other things such as arrests.

Examples of Each Type of Unit of Analysis

  • Individuals are the smallest unit of analysis. An individual is a person, animal, or thing. For example, an individual can be a person or a building.
  • Artifacts are the next largest units of analysis. An artifact is something produced by human beings and is not alive. For example, a child’s toy is an artifact. Artifacts can include any material object that was produced by human activity and which has meaning to someone. Artifacts can be tangible or intangible and may be produced intentionally or accidentally.
  • Geographical units are large geographic areas such as states, counties, provinces, etc. Geographical units may also refer to specific locations within these areas such as cities or townships. 
  • Social interaction refers to interactions between members of society (e.g., family members interacting with each other). Social interaction includes both formal interactions (such as attending school) and informal interactions (such as talking on the phone).

How Does a Social Scientist Choose a Unit of Analysis?

Social scientists choose a unit of analysis based on the purpose of their research, their research question, and the type of data they have. For example, if they are trying to understand the relationship between a person’s personality and their behavior, they would choose to study personality traits.

For example, if a researcher wanted to study the effects of legalizing marijuana on crime rates, they may choose to use administrative data from police departments. However, if they wanted to study how culture influences crime rates, they might use survey data from smaller groups of people who are further removed from the influence of culture (e.g., individuals living in different areas or countries).

Factors to Consider When Choosing a Unit of Analysis

The unit of analysis is the object or person that you are studying, and it determines what kind of data you are collecting and how you will analyze it.

Factors to consider when choosing a unit of analysis include:

  • What is your purpose for studying this topic? Is it for a research paper or an article? If so, which type of paper do you want to write?
  • What is the most appropriate unit for your study? If you are studying a specific event or period of time, this may be obvious. But if your focus is broader, such as all social sciences or all human development, then you need to determine how broad your scope should be before beginning any research process (see question one above) so that you know where to start in order for it to be effective (see question three below).
  • How do other people define their units? This can be helpful when trying to understand what other people mean when they use certain terms like “social science” or “human development” because they may define those terms differently than what you would expect them to.
  • The nature of the data collected. Is it quantitative or qualitative? If it’s qualitative, what kind of data is collected? How much time was spent observing each participant/examining their behavior?
  • The scale used to measure variables. Is every variable measured on a one-to-one scale (like measurements between people)? Or do some variables only take on discrete values (like yes/no questions)?

The unit of analysis is the smallest part of a data set that you analyze. It’s important to remember that your data is made up of more than just one unit—you have lots of different units in your dataset, and each of those units has its own characteristics that you need to think about when you’re trying to analyze it.

Logo

Connect to Formplus, Get Started Now - It's Free!

  • Data Collection
  • research questions
  • unit of analysis
  • Olayemi Jemimah Aransiola

Formplus

You may also like:

What is Field Research: Meaning, Examples, Pros & Cons

Introduction Field research is a method of research that deals with understanding and interpreting the social interactions of groups of...

what is unit of analysis in research

The McNamara Fallacy: How Researchers Can Detect and to Avoid it.

Introduction The McNamara Fallacy is a common problem in research. It happens when researchers take a single piece of data as evidence...

Projective Techniques In Surveys: Definition, Types & Pros & Cons

Introduction When you’re conducting a survey, you need to find out what people think about things. But how do you get an accurate and...

Research Summary: What Is It & How To Write One

Introduction A research summary is a requirement during academic research and sometimes you might need to prepare a research summary...

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

Logo for British Columbia/Yukon Open Authoring Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 4: Measurement and Units of Analysis

4.4 Units of Analysis and Units of Observation

Another point to consider when designing a research project, and which might differ slightly in qualitative and quantitative studies, has to do with units of analysis and units of observation. These two items concern what you, the researcher, actually observe in the course of your data collection and what you hope to be able to say about those observations. Table 3.1 provides a summary of the differences between units of analysis and observation.

Unit of Analysis

A unit of analysis is the entity that you wish to be able to say something about at the end of your study, probably what you would consider to be the main focus of your study.

Unit of Observation

A unit of observation is the item (or items) that you actually observe, measure, or collect in the course of trying to learn something about your unit of analysis. In a given study, the unit of observation might be the same as the unit of analysis, but that is not always the case. Further, units of analysis are not required to be the same as units of observation. What is required, however, is for researchers to be clear about how they define their units of analysis and observation, both to themselves and to their audiences. More specifically, your unit of analysis will be determined by your research question. Your unit of observation, on the other hand, is determined largely by the method of data collection that you use to answer that research question.

To demonstrate these differences, let us look at the topic of students’ addictions to their cell phones. We will consider first how different kinds of research questions about this topic will yield different units of analysis. Then we will think about how those questions might be answered and with what kinds of data. This leads us to a variety of units of observation.

If I were to ask, “Which students are most likely to be addicted to their cell phones?” our unit of analysis would be the individual. We might mail a survey to students on a university or college campus, with the aim to classify individuals according to their membership in certain social classes and, in turn, to see how membership in those classes correlates with addiction to cell phones. For example, we might find that students studying media, males, and students with high socioeconomic status are all more likely than other students to become addicted to their cell phones. Alternatively, we could ask, “How do students’ cell phone addictions differ and how are they similar? In this case, we could conduct observations of addicted students and record when, where, why, and how they use their cell phones. In both cases, one using a survey and the other using observations, data are collected from individual students. Thus, the unit of observation in both examples is the individual. But the units of analysis differ in the two studies. In the first one, our aim is to describe the characteristics of individuals. We may then make generalizations about the populations to which these individuals belong, but our unit of analysis is still the individual. In the second study, we will observe individuals in order to describe some social phenomenon, in this case, types of cell phone addictions. Consequently, our unit of analysis would be the social phenomenon.

Another common unit of analysis in sociological inquiry is groups. Groups, of course, vary in size, and almost no group is too small or too large to be of interest to sociologists. Families, friendship groups, and street gangs make up some of the more common micro-level groups examined by sociologists. Employees in an organization, professionals in a particular domain (e.g., chefs, lawyers, sociologists), and members of clubs (e.g., Girl Guides, Rotary, Red Hat Society) are all meso-level groups that sociologists might study. Finally, at the macro level, sociologists sometimes examine citizens of entire nations or residents of different continents or other regions.

A study of student addictions to their cell phones at the group level might consider whether certain types of social clubs have more or fewer cell phone-addicted members than other sorts of clubs. Perhaps we would find that clubs that emphasize physical fitness, such as the rugby club and the scuba club, have fewer cell phone-addicted members than clubs that emphasize cerebral activity, such as the chess club and the sociology club. Our unit of analysis in this example is groups. If we had instead asked whether people who join cerebral clubs are more likely to be cell phone-addicted than those who join social clubs, then our unit of analysis would have been individuals. In either case, however, our unit of observation would be individuals.

Organizations are yet another potential unit of analysis that social scientists might wish to say something about. Organizations include entities like corporations, colleges and universities, and even night clubs. At the organization level, a study of students’ cell phone addictions might ask, “How do different colleges address the problem of cell phone addiction?” In this case, our interest lies not in the experience of individual students but instead in the campus-to-campus differences in confronting cell phone addictions. A researcher conducting a study of this type might examine schools’ written policies and procedures, so his unit of observation would be documents. However, because he ultimately wishes to describe differences across campuses, the college would be his unit of analysis.

Social phenomena are also a potential unit of analysis. Many sociologists study a variety of social interactions and social problems that fall under this category. Examples include social problems like murder or rape; interactions such as counselling sessions, Facebook chatting, or wrestling; and other social phenomena such as voting and even cell phone use or misuse. A researcher interested in students’ cell phone addictions could ask, “What are the various types of cell phone addictions that exist among students?” Perhaps the researcher will discover that some addictions are primarily centred on social media such as chat rooms, Facebook, or texting, while other addictions centre on single-player games that discourage interaction with others. The resultant typology of cell phone addictions would tell us something about the social phenomenon (unit of analysis) being studied. As in several of the preceding examples, however, the unit of observation would likely be individual people.

Finally, a number of social scientists examine policies and principles, the last type of unit of analysis we will consider here. Studies that analyze policies and principles typically rely on documents as the unit of observation. Perhaps a researcher has been hired by a college to help it write an effective policy against cell phone use in the classroom. In this case, the researcher might gather all previously written policies from campuses all over the country, and compare policies at campuses where the use of cell phones in classroom is low to policies at campuses where the use of cell phones in the classroom is high.

In sum, there are many potential units of analysis that a sociologist might examine, but some of the most common units include the following:

  • Individuals
  • Organizations
  • Social phenomena.
  • Policies and principles.

Table 4.1 Units of analysis and units of observation: A hypothetical study of students’ addictions to cell phones.

Research Methods for the Social Sciences: An Introduction Copyright © 2020 by Valerie Sheppard is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Popular searches

  • How to Get Participants For Your Study
  • How to Do Segmentation?
  • Conjoint Preference Share Simulator
  • MaxDiff Analysis
  • Likert Scales
  • Reliability & Validity

Request consultation

Do you need support in running a pricing or product study? We can help you with agile consumer research and conjoint analysis.

Looking for an online survey platform?

Conjointly offers a great survey tool with multiple question types, randomisation blocks, and multilingual support. The Basic tier is always free.

Research Methods Knowledge Base

  • Navigating the Knowledge Base
  • Five Big Words
  • Types of Research Questions
  • Time in Research
  • Types of Relationships
  • Types of Data

Unit of Analysis

  • Two Research Fallacies
  • Philosophy of Research
  • Ethics in Research
  • Conceptualizing
  • Evaluation Research
  • Measurement
  • Research Design
  • Table of Contents

Fully-functional online survey tool with various question types, logic, randomisation, and reporting for unlimited number of surveys.

Completely free for academics and students .

One of the most important ideas in a research project is the unit of analysis . The unit of analysis is the major entity that you are analyzing in your study. For instance, any of the following could be a unit of analysis in a study:

  • individuals
  • artifacts (books, photos, newspapers)
  • geographical units (town, census tract, state)
  • social interactions (dyadic relations, divorces, arrests)

Why is it called the ‘unit of analysis’ and not something else (like, the unit of sampling)? Because it is the analysis you do in your study that determines what the unit is . For instance, if you are comparing the children in two classrooms on achievement test scores, the unit is the individual child because you have a score for each child. On the other hand, if you are comparing the two classes on classroom climate, your unit of analysis is the group, in this case the classroom, because you only have a classroom climate score for the class as a whole and not for each individual student. For different analyses in the same study you may have different units of analysis. If you decide to base an analysis on student scores, the individual is the unit. But you might decide to compare average classroom performance. In this case, since the data that goes into the analysis is the average itself (and not the individuals’ scores) the unit of analysis is actually the group. Even though you had data at the student level, you use aggregates in the analysis. In many areas of social research these hierarchies of analysis units have become particularly important and have spawned a whole area of statistical analysis sometimes referred to as hierarchical modeling . This is true in education, for instance, where we often compare classroom performance but collected achievement data at the individual student level.

Cookie Consent

Conjointly uses essential cookies to make our site work. We also use additional cookies in order to understand the usage of the site, gather audience analytics, and for remarketing purposes.

For more information on Conjointly's use of cookies, please read our Cookie Policy .

Which one are you?

I am new to conjointly, i am already using conjointly.

Logo for VCU's Press Books

6. Sampling

6.1. Units of Analysis

Man wearing a hat, orange jacket, and backpack looking out to the sea using binoculars.

Learning Objectives

  • Describe units of analysis.
  • Discuss how we can study the same topic using different units of analysis.

Before you can decide on a sampling strategy, you must define the unit of analysis of your scientific study. The unit of analysis refers to the person, collective, or object that you are focusing on and want to learn about through your research. As depicted in Figure 6.1 , your unit of analysis would be the type of entity (say, an individual) you’re interested in. Your sample would be a group of such entities (say, a group of individuals you survey)—which, collectively, stand in for the population you wish to study.

Cartoon depictions of a large group of people representing a population. Six of them are selected to represent the sample.

Typical units of analysis include individuals, groups, organizations, and countries. For instance, if we are interested in studying people’s shopping behavior, their learning outcomes, or their attitudes toward new technologies, then the unit of analysis is likely to be the individual . If we want to study characteristics of street gangs or teamwork in organizations, then the unit of analysis is probably the group . If our research is directed at understanding differences in national cultures, then our unit of analysis could be the country . In the latter two examples, even though specific individuals—the group or country’s leaders—may have a greater say over what these groups or countries do, for the sake of analysis, researchers typically think of those decisions as reflecting a collective decision rather than any one individual’s decision.

Even inanimate objects can serve as units of analysis. For instance, if we wish to study how two or more individuals engage with each other during social interactions, the unit of analysis might be each conversation , and not the individual speakers. If we wanted to track how depictions of people of color have changed in popular culture over time, we could focus on a film or television show as a unit of analysis.

Our choice of a particular unit of analysis will depend on our research question. For instance, if we wish to study why certain neighborhoods have high crime rates, then our unit of analysis becomes the neighborhood —not crimes or criminals committing such crimes—because the object of our inquiry is the neighborhood and not the people living in it. If, however, we wish to compare the prevalence of different types of crimes—homicide versus robbery versus assault, for example—across neighborhoods, our unit of analysis could very well be the crime . If we wish to study why criminals engage in illegal activities, then the unit of analysis becomes the individual (i.e., the criminal).

Now let’s consider a completely different kind of sociological study. If we want to examine why some business innovations are more successful than others, then our unit of analysis is an innovation —such as the invention of a new method for charging phones. If, however, we wish to study how some tech companies develop innovative products more consistently than others, then the unit of analysis is the organization . As you can see, two related research questions within the same study may have entirely different units of analysis.

Determining the appropriate unit of analysis is important because it influences what type of data you should collect for your study and whom you collect it from. If your unit of analysis is the organization, then you usually will want to collect organizational-level data—that is, data that has to do with the organization, such as its size, personnel structure, or revenues. Data may come from a variety of sources, such as financial records or surveys of directors or executives, who are presumed to be representing their organization when they answer your survey questions. Meanwhile, if your unit of analysis is a website, you will want to collect data about different sites, such as how one kind of site compares to others in terms of traffic. We could use the term “site-level” data—just like we’d use the term “individual-level” data when individuals are the unit of analysis. We could also talk about “lower” and “higher” levels of analysis—with individual-level data existing on a lower level than group-level data, which may, in turn, be on a lower level than national data (see the discussion of micro , meso , and macro levels of analysis in Chapter 3: The Role of Theory in Research ). It is important to note that “higher” does not imply “better” in this case. We’re just talking about whether we’re looking at smaller or larger groupings of data.

Frequently, the unit of analysis is what we observe in our research—the source of our data—but that is not always the case. In fact, sometimes we want to make a distinction between units of analysis and units of observation . The unit of analysis is what we really want to study, but sometimes we have to get at it indirectly, by observing something else. For example, surveys often ask questions about families to understand their family structure, income, and various aspects of their well-being, but they need to get information about the family through individuals—specifically, the respondent who is answering survey questions on behalf of the family. In this case, the unit of analysis for the survey’s family-related questions would be the family, but the unit of observation would be the individual. Likewise, in our earlier examples, we talked about studying organizations and websites as our units of analysis, but doing so might involve talking to individuals—the directors of those organizations, or the users of those websites, respectively.

Analyzing multiple types of units of observation can give us a fuller picture of our unit of analysis. For example, if you are conducting research about what makes particular social media apps more addictive than others, then examining differences between the apps in terms of their functionality ( app as the unit of observation) would tell you one thing, but surveying individuals about their usage of apps ( user as the unit of observation) would clarify other aspects of that question. Furthermore, it is often a good idea to collect data from a lower level of analysis and sum up, or aggregate , that data, converting it into higher-level data. This can give you a bigger-picture perspective on your unit of analysis. For instance, to study teamwork in organizations, you can survey individuals in different teams and measure how much conflict or cohesion they perceive on their teams. You can then average their individual scores to create a “team-level” score on those particular ratings. Note, however, that issues can arise when we move in the opposite direction—from a higher to a lower level of analysis (see the sidebar Deeper Dive: Ecological Fallacies ).

Ultimately, the unit of analysis will help you determine both the population you are interested in and the sample that you will study to arrive at any conclusions about that population. So you need to choose it wisely. For example, let’s say you’re interested in the average pay of chief executive officers (CEOs) at companies across the nation. The unit of analysis would be the CEO, and the population would be all individuals in the country who work as company CEOs. But the unit of analysis would be different for a very similar research question: the average amount that U.S. companies pay their CEOs. In this case, the unit of analysis is actually the company because you are interested in how much companies pay their CEOs—not how much individuals are paid as CEOs. The difference is subtle, but the main point is that your unit of analysis is linked to whatever population you actually want to say something about—in this example, either individual CEOs, or companies that have CEOs.

Deeper Dive: Ecological Fallacies

Person using magnifying glass on a map.

A mismatch between the unit of analysis and the unit of observation can create issues for researchers. Let’s say you want to compare the residents of different states (your unit of analysis is the individual ), but you only have access to state-level data (your unit of observation is the state ). This is a problem because you generally do not want to be making claims about a lower level of analysis based only on aggregated data at a higher level—in this example, drawing conclusions about individuals based on the states where they reside. For instance, the fact that the population of a state is, on average, wealthier than the rest of the country does not mean that residents of that state are more likely to be rich than the average American. It may be that a small contingent of superrich people have pulled up the average wealth of the state, but its many other residents actually tend to be poorer than the average American. (As you might know from your statistics classes, in this situation, mean wealth—the group’s average—differs dramatically from median wealth—how much money the person smack in the middle of the income distribution has.) This logical error—making claims about the nature of individuals based on data from the groups they belong to—is called an ecological fallacy .

Émile Durkheim’s classic study of suicide is often mentioned as an example of an ecological fallacy. One of the pioneers of the field of sociology, Durkheim argued in his 1897 book Suicide that societies in which individuals struggled to feel they belonged—that is, populations with low levels of social integration—would experience more suicide. Ideally, the unit of analysis for such a study would be the individual. Specifically, we would want to study individuals and the factors that contributed to their deaths by suicide. But Durkheim did not have individual-level data. Instead, he had higher-level data about the number of suicides in each country. To test his theory that social integration safeguarded individuals against suicide, Durkheim compared countries that were mostly Protestant to those that were mostly Catholic. The idea was that Protestantism was a more individualistic and unstructured faith than Catholicism, and so the two varieties of religious belief could stand in for less and more social integration, respectively.

Durkheim’s analysis concluded that suicide was indeed higher in Protestant-majority countries. The problem was that his data only allowed him to say that Protestant countries were more likely to have higher suicide rates—not that Protestant individuals were more likely to commit suicide. To conclude the latter would have been an ecological fallacy, and yet that was the question that Durkheim truly wanted to answer. To his credit, Durkheim also tested his theory by studying suicide rates across localities within countries—another level of analysis (Selvin 1958). (Replicating your analysis across different types of data is a good way to check the robustness of your findings, as we will discuss in later chapters.) Durkheim found the same pattern of higher levels of Protestant belief correlating with higher suicide rates within counties, giving further credence to his theory. Although flawed, Durkheim’s analysis made creative use of the data that was available to him at the time, and his work continues to inspire researchers, including those studying the growing rates of suicide among less educated Americans since 2000 (Case and Deaton 2020).

Key Takeaways

  • A unit of analysis is a member of the larger group you wish to be able to say something about at the end of your study. A unit of observation is a member of the population that you actually observe.
  • When researchers confuse their units of analysis and observation, they may commit an ecological fallacy—that is, when we make possibly inaccurate claims about the nature of individuals based on data from the groups they belong to.

The Craft of Sociological Research by Victor Tan Chen; Gabriela León-Pérez; Julie Honnold; and Volkan Aytar is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Centilio Logo

Unravelling the “Unit of Analysis”: A Comprehensive Guide to the 5 Key Aspects

Maha Lakshmi

Exploring the essential aspects of the “Unit of Analysis” in research

Introduction

Every research starts with a question- what are we studying? How do we measure and categorize it? When faced with such conundrums, a pivotal component of research that experts consistently rely on is the Unit of Analysis. This essential building block defines the main entity being analyzed in a study, be it individuals, groups, institutions, or social interactions. Comprehending the Unit of Analysis is crucial, as it establishes the foundation for consequent stages for the research process.

Unit of Analysis

The Unit of Analysis is a pivotal concept in the realm of research and data collection. In layman’s terms, it refers to the primary entity or subject under observation or study in any research endeavor. We are studying and analyzing the ‘What and ‘Who’, for example while studying a students performance in academics, the student becomes the Unit of Analysis. Understanding and correctly identifying this unit is essential as it impacts the subsequent phases of research, from data collection to result interpretation.

Types of Units

There are a variety of entities that can function as the Unit of Analysis and it is important to acknowledge their presence.

1. Individuals: People are often the most studied entities.

2. Groups: This could range from families and friend groups to companies.

3. Artefacts: Physical entities like books, photos, or tools.

4. Geographical Units: Regions, cities, or towns.

5. Social Interactions: Tweets, Facebook likes, or any form of social media interaction.

Importance in Research

The significance of correctly identifying the Unit of Analysis cannot be overstated. It’s akin to knowing the ingredients before baking a cake. It:

Provides clarity regarding data collection methods.

Helps in identifying relevant statistical techniques.

Determines the scope of generalisations.

Avoids the pitfalls of the ecological fallacy and reductionism.

Common Mistakes

There’s no beating around the bush here; even seasoned researchers can occasionally trip up:

Ecological Fallacy: Incorrectly deducing individual behaviour from group data.

Reductionism: Oversimplifying a complex process by ignoring certain variables.

Unit of Analysis vs. Unit of Observation

While they appear similar, they have their differences. The Unit of Analysis is about what is studied, and Unit of Observation is the source of the data. For example, while researching the impact of workplace culture on employee morale, the company might be the unit of analysis, but the individual employees providing data are the units of observation.

Tips for Selection

Finding the right Unit of Analysis is sometimes formidable task, so:

Start by clearly defining the research question.

Determine the level at which you wish to generalise results.

Keep in mind the availability of data.

Role in Different Fields

The role of Unit of Analysis varies across different disciplines:

In Sociology

Here, researchers often study social groups, institutions, and structures. They delve into topics like group dynamics, societal norms, and institutions’ influence on individuals.

In Economics

Economists can analyse a gamut of entities, from individual consumers or businesses to entire countries. They might study spending habits, company growth, or global trade patterns.

In Environmental Studies

Research in this area of study centers around particular ecosystems, species, or geographical locations. They can examine the effects of pollution on organisms in the water or the influence of urban development on the quality of air.

In Literature

Literary critics might analyse a particular genre, an author’s body of work, or even individual books or poems. They would study themes, narrative techniques, or cultural contexts.

In Political Science

This might involve studying political parties, government policies, or public opinion. Research could revolve around election patterns, policy impacts, or citizens’ political behaviour.

Applications in Modern Technology

The digital era has significantly expanded the boundaries of the Unit of Analysis.

In Digital Marketing

In the realm of digital engagements, marketers assess a wide range of online interactions, such as clicks, views, thumbs-ups, shares and even the timing and duration of the engagement.

In Machine Learning

Datasets might comprise individual data points, clusters, or even entire databases. Analysts need to be spot-on with their units to train models effectively.

In E-commerce

From user reviews and product ratings to sales data, the e-commerce realm offers a myriad of Units of Analysis.

In Cybersecurity

Security experts examine the possible dangers and cyber attacks, and the attributes of potential hackers.

  • What is the primary purpose of the Unit of Analysis in research?

It helps in specifying the focus of the study, ensuring clarity in data collection, and accuracy in result interpretation.

  • How is the Unit of Analysis different from the Unit of Observation?

The former is what you study, and the latter is where you get your data from.

  • Can a research study have multiple Units of Analysis?

Absolutely! A study can analyse multiple entities simultaneously, provided the research design supports it.

  • Why is it crucial to correctly identify the Unit of Analysis?

Mistakes can lead to ecological fallacies of oversimplification, jeopardizing the study’s validity.

  • How has the digital age influenced the concept of the Unit of Analysis?

It has expanded the scope, introducing new units like clicks, views, and digital interactions.

  • Are there specific fields where the Unit of Analysis plays a more pivotal role?

Its importance is ubiquitous, but its nature might vary from fields like sociology to machine learning.

The Unit of Analysis is undeniably the cornerstone of any research study. From laying the groundwork to influencing data interpretation, it’s an element that demands attention, understanding, and precision. As the world of research evolves in this digital era, it becomes crucial for researchers to adjust and innovate, thus guaranteeing that the Unit of analysis integrates seamlessly with research goals. It’s a concept that, despite its intricacies, can truly elevate the quality of any research endeavor.

External Links/ Sources:

Unit of analysis

UNIT OF ANALYSIS AND UNIT OF OBSERVATION

The Unit of Analysis Explained

10 Miraculous Benefits of Cluster Sampling: A Comprehensive Guide

Get the funds you need with a signature loan, leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Sign up for a free trial today!

Centilio’s end-to-end marketing platform solves every marketing need of your organization.

TRY FOR FREE

Deleting your Account

Add a contact in centilio, accessing the sign journey.

© 2023 Centilio Inc, All Rights Reserved.

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

7.3 Unit of analysis and unit of observation

Learning objectives.

  • Define units of analysis and units of observation, and describe the two common errors people make when they confuse the two

Another point to consider when designing a research project, and which might differ slightly in qualitative and quantitative studies, has to do with units of analysis and units of observation. These two items concern what you, the researcher, actually observe in the course of your data collection and what you hope to be able to say about those observations. A unit of analysis is the entity that you wish to be able to say something about at the end of your study, probably what you’d consider to be the main focus of your study. A unit of observation is the item (or items) that you actually observe, measure, or collect in the course of trying to learn something about your unit of analysis.

In a given study, the unit of observation might be the same as the unit of analysis, but that is not always the case. For example, a study on electronic gadget addiction may interview undergraduate students (our unit of observation) for the purpose of saying something about undergraduate students (our unit of analysis) and their gadget addiction. Perhaps, if we were investigating gadget addiction in elementary school children (our unit of analysis), we might collect observations from teachers and parents (our units of observation) because younger children may not report their behavior accurately. In this case and many others, units of analysis are not the same as units of observation. What is required, however, is for researchers to be clear about how they define their units of analysis and observation, both to themselves and to their audiences.

young boy peering through binoculars in a desert

More specifically, your unit of analysis will be determined by your research question. Your unit of observation, on the other hand, is determined largely by the method of data collection that you use to answer that research question. We’ll take a closer look at methods of data collection later on in the textbook. For now, let’s consider again a study addressing students’ addictions to electronic gadgets. We’ll consider first how different kinds of research questions about this topic will yield different units of analysis. Then, we’ll think about how those questions might be answered and with what kinds of data. This leads us to a variety of units of observation.

If we were to explore which students are most likely to be addicted to their electronic gadgets, our unit of analysis would be individual students. We might mail a survey to students on campus, and our aim would be to classify individuals according to their membership in certain social groups in order to see how membership in those classes correlated with gadget addiction. For example, we might find that majors in new media, men, and students with high socioeconomic status are all more likely than other students to become addicted to their electronic gadgets. Another possibility would be to explore how students’ gadget addictions differ and how are they similar. In this case, we could conduct observations of addicted students and record when, where, why, and how they use their gadgets. In both cases, one using a survey and the other using observations, data are collected from individual students. Thus, the unit of observation in both examples is the individual.

Another common unit of analysis in social science inquiry is groups. Groups of course vary in size, and almost no group is too small or too large to be of interest to social scientists. Families, friendship groups, and group therapy participants are some common examples of micro-level groups examined by social scientists. Employees in an organization, professionals in a particular domain (e.g., chefs, lawyers, social workers), and members of clubs (e.g., Girl Scouts, Rotary, Red Hat Society) are all meso-level groups that social scientists might study. Finally, at the macro-level, social scientists sometimes examine citizens of entire nations or residents of different continents or other regions.

A study of student addictions to their electronic gadgets at the group level might consider whether certain types of social clubs have more or fewer gadget-addicted members than other sorts of clubs. Perhaps we would find that clubs that emphasize physical fitness, such as the rugby club and the scuba club, have fewer gadget-addicted members than clubs that emphasize cerebral activity, such as the chess club and the women’s studies club. Our unit of analysis in this example is groups because groups are what we hope to say something about. If we had instead asked whether individuals who join cerebral clubs are more likely to be gadget-addicted than those who join social clubs, then our unit of analysis would have been individuals. In either case, however, our unit of observation would be individuals.

Organizations are yet another potential unit of analysis that social scientists might wish to say something about. Organizations include entities like corporations, colleges and universities, and even nightclubs. At the organization level, a study of students’ electronic gadget addictions might explore how different colleges address the problem of electronic gadget addiction. In this case, our interest lies not in the experience of individual students but instead in the campus-to-campus differences in confronting gadget addictions. A researcher conducting a study of this type might examine schools’ written policies and procedures, so her unit of observation would be documents. However, because she ultimately wishes to describe differences across campuses, the college would be her unit of analysis.

In sum, there are many potential units of analysis that a social worker might examine, but some of the most common units include the following:

  • Individuals
  • Organizations

One common error people make when it comes to both causality and units of analysis is something called the ecological fallacy . This occurs when claims about one lower-level unit of analysis are made based on data from some higher-level unit of analysis. In many cases, this occurs when claims are made about individuals, but only group-level data have been gathered. For example, we might want to understand whether electronic gadget addictions are more common on certain campuses than on others. Perhaps different campuses around the country have provided us with their campus percentage of gadget-addicted students, and we learn from these data that electronic gadget addictions are more common on campuses that have business programs than on campuses without them. We then conclude that business students are more likely than non-business students to become addicted to their electronic gadgets. However, this would be an inappropriate conclusion to draw. Because we only have addiction rates by campus, we can only draw conclusions about campuses, not about the individual students on those campuses. Perhaps the social work majors on the business campuses are the ones that caused the addiction rates on those campuses to be so high. The point is we simply don’t know because we only have campus-level data. By drawing conclusions about students when our data are about campuses, we run the risk of committing the ecological fallacy.

On the other hand, another mistake to be aware of is reductionism. Reductionism occurs when claims about some higher-level unit of analysis are made based on data from some lower-level unit of analysis. In this case, claims about groups or macro-level phenomena are made based on individual-level data. An example of reductionism can be seen in some descriptions of the civil rights movement. On occasion, people have proclaimed that Rosa Parks started the civil rights movement in the United States by refusing to give up her seat to a white person while on a city bus in Montgomery, Alabama, in December 1955. Although it is true that Parks played an invaluable role in the movement, and that her act of civil disobedience gave others courage to stand up against racist policies, beliefs, and actions, to credit Parks with starting the movement is reductionist. Surely the confluence of many factors, from fights over legalized racial segregation to the Supreme Court’s historic decision to desegregate schools in 1954 to the creation of groups such as the Student Nonviolent Coordinating Committee (to name just a few), contributed to the rise and success of the American civil rights movement. In other words, the movement is attributable to many factors—some social, others political and others economic. Did Parks play a role? Of course she did—and a very important one at that. But did she cause the movement? To say yes would be reductionist.

It would be a mistake to conclude from the preceding discussion that researchers should avoid making any claims whatsoever about data or about relationships between levels of analysis. While it is important to be attentive to the possibility for error in causal reasoning about different levels of analysis, this warning should not prevent you from drawing well-reasoned analytic conclusions from your data. The point is to be cautious and conscientious in making conclusions between levels of analysis. Errors in analysis come from a lack of rigor and deviating from the scientific method.

Key Takeaways

  • A unit of analysis is the item you wish to be able to say something about at the end of your study while a unit of observation is the item that you actually observe.
  • When researchers confuse their units of analysis and observation, they may be prone to committing either the ecological fallacy or reductionism.
  • Ecological fallacy- claims about one lower-level unit of analysis are made based on data from some higher-level unit of analysis
  • Reductionism- when claims about some higher-level unit of analysis are made based on data at some lower-level unit of analysis
  • Unit of analysis- entity that a researcher wants to say something about at the end of her study
  • Unit of observation- the item that a researcher actually observes, measures, or collects in the course of trying to learn something about her unit of analysis

Image attributions

Binoculars by nightowl CC-0

Scientific Inquiry in Social Work Copyright © 2018 by Matthew DeCarlo is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Webinar ‘Praxis-Check Qualitätssicherung bei Online-Umfragen’

22.04.2024 11:00 - 11:45 UHR 

Choosing the Right Unit of Analysis for Your Research Project

Table of content.

  • Understanding the Unit of Analysis in Research
  • Factors to Consider When Selecting the Right Unit of Analysis
  • Common Mistakes to Avoid

A research project is like setting out on a voyage through uncharted territory; the unit of analysis is your compass, guiding every decision from methodology to interpretation.

It’s the beating heart of your data collection and the lens through which you view your findings. With deep-seated experience in research methodologies , our expertise recognizes that choosing an appropriate unit of analysis not only anchors your study but illuminates paths towards meaningful conclusions.

The right choice empowers researchers to extract patterns, answer pivotal questions, and offer insights into complex phenomena. But tread carefully—selecting an ill-suited unit can distort results or obscure significant relationships within data.

Remember this: A well-chosen unit of analysis acts as a beacon for accuracy and relevance throughout your scholarly inquiry. Continue reading to unlock the strategies for selecting this cornerstone of research design with precision—your project’s success depends on it.

Engage with us as we delve deeper into this critical aspect of research mastery.

Key Takeaways

  • Your research questions and hypotheses drive the choice of your unit of analysis, shaping how you collect and interpret data.
  • Avoid common mistakes like reductionism , which oversimplifies complex issues, and the ecological fallacy , where group-level findings are wrongly applied to individuals.
  • Consider the availability and quality of data when selecting your unit of analysis to ensure your research is feasible and conclusions are valid.
  • Differentiate between units of analysis (what you’re analyzing) and units of observation (what or who you’re observing) for clarity in your study.
  • Ensure that your chosen unit aligns with both the theoretical framework and practical considerations such as time and resources.

The unit of analysis in research refers to the level at which data is collected and analyzed. It is essential for researchers to understand the different types of units of analysis, as well as their significance in shaping the research process and outcomes.

Definition and Importance

With resonio, the unit of analysis you choose lays the groundwork for your market research focus. Whether it’s individuals, organizations, or specific events, resonio’s platform facilitates targeted data collection and analysis to address your unique research questions. Our tool simplifies this selection process, ensuring that you can efficiently zero in on the most relevant unit for insightful and actionable results.

This crucial component serves as a navigational aid for your market research. The market research tool not only guides you in data collection but also in selecting the most effective sampling methods and approaches to hypothesis testing. Getting robust and reliable data, ensuring your research is both effective and straightforward.

Choosing the right unit of analysis is crucial, as it defines your research’s direction. resonio makes this easier, ensuring your choice aligns with your theoretical approach and data collection methods, thereby enhancing the validity and reliability of your results.

Additionally, resonio aids in steering clear of errors like reductionism and ecological fallacy, ensuring your conclusions match the data’s level of analysis

Difference between Unit of Analysis and Unit of Observation

Understanding the difference between the unit of analysis and observation is key. Let us clarify this distinction: the unit of analysis is what you’ll ultimately analyze, while the unit of observation is what you observe or measure during the study.

For example, in using resonio for educational research, individual test scores are the units of analysis, while the students providing these scores are the units of observation.

This distinction is essential as it clarifies the specific aspect under scrutiny and what will yield measurable data. It also emphasizes that researchers must carefully consider both elements to ensure their alignment with research questions and objectives .

Types of Units of Analysis: Individual, Aggregates, and Social

Choosing the right unit of analysis for a research project is critical. The types of units of analysis include individual, aggregates, and social.

  • Individual: This type focuses on analyzing the attributes and characteristics of individual units, such as people or specific objects.
  • Aggregates: Aggregates involve analyzing groups or collections of individual units, such as neighborhoods, organizations, or communities.
  • Social: Social units of analysis emphasize analyzing broader social entities, such as cultures, societies, or institutions.

When selecting the right unit of analysis for a research project, researchers must consider various factors such as their research questions and hypotheses , data availability and quality, feasibility and practicality, as well as the theoretical framework and research design .

Each of these factors plays a crucial role in determining the most appropriate unit of analysis for the study.

Research Questions and Hypotheses

The research questions and hypotheses play a crucial role in determining the appropriate unit of analysis for a research project. They guide the researcher in identifying what exactly needs to be studied and analyzed, thereby influencing the selection of the most relevant unit of analysis.

The alignment between the research questions/hypotheses and the unit of analysis is essential to ensure that the study’s focus meets its intended objectives. Furthermore, clear research questions and hypotheses help define specific parameters for data collection and analysis, directly impacting which unit of analysis will best serve the study’s purpose.

It’s important to carefully consider how each research question or hypothesis relates to different potential units of analysis , as this connection will shape not only what you are studying but also how you will study it .

Data Availability and Quality

When considering the unit of analysis for a research project, researchers must take into account the availability and quality of data. The chosen unit of analysis should align with the available data sources to ensure that meaningful and accurate conclusions can be drawn.

Researchers need to evaluate whether the necessary data at the chosen level of analysis is accessible and reliable. Ensuring high-quality data will contribute to the validity and reliability of the study , enabling researchers to make sound interpretations and draw robust conclusions from their findings.

Choosing a unit of analysis without considering data availability and quality may lead to limitations in conducting thorough analysis or drawing valid conclusions. It is crucial for researchers to assess both factors before finalizing their selection, as it directly impacts the feasibility, accuracy, and rigor of their research project.

Feasibility and Practicality

When considering the feasibility and practicality of a unit of analysis for a research project, it is essential to assess the availability and quality of data related to the chosen unit.

Researchers should also evaluate whether the selected unit aligns with their theoretical framework and research design. The practical aspects such as time, resources, and potential challenges associated with analyzing the chosen unit must be thoroughly considered before finalizing the decision.

Moreover, it is crucial to ensure that the selected unit of analysis is feasible within the scope of the research questions and hypotheses. Additionally, researchers need to determine if the chosen unit can be effectively studied based on existing literature and sampling techniques utilized in similar studies.

By carefully evaluating these factors, researchers can make informed decisions regarding which unit of analysis will best suit their research goals.

Theoretical Framework and Research Design

The theoretical framework and research design establish the structure for a study based on existing theories and concepts. It guides the selection of the unit of analysis by providing a foundation for understanding how variables interact and influence one another.

Theoretical frameworks help to shape research questions , hypotheses, and data collection methods, ensuring that the chosen unit of analysis aligns with the study’s objectives. Research design serves as a blueprint outlining the procedures and techniques used to gather and analyze data, allowing researchers to make informed decisions regarding their unit of analysis while considering feasibility, practicality, and data availability .

Researchers often make the mistake of reductionism, where they oversimplify complex phenomena by focusing on one aspect. Another common mistake is the ecological fallacy, where conclusions about individual behavior are made based on group-level data.

Reductionism

Reductionism occurs when a researcher oversimplifies a complex phenomenon by analyzing it at too basic a level. This can lead to the loss of important nuances and details critical for understanding the broader context.

For instance, studying individual test scores without considering external factors like teaching quality or student motivation is reductionist. By focusing solely on one aspect, researchers miss out on comprehensive insights that may impact their findings.

In research projects, reductionism limits the depth of analysis and may result in skewed conclusions that don’t accurately reflect the real-world complexities. It’s essential for researchers to avoid reductionism by carefully selecting an appropriate unit of analysis that allows for a holistic understanding of the phenomenon under study.

Ecological Fallacy

The ecological fallacy involves making conclusions about individuals based on group-level data . This occurs when researchers mistakenly assume that relationships observed at the aggregate level also apply to individuals within that group.

For example, if a study finds a correlation between high levels of education and income at the city level, it doesn’t mean the same relationship applies to every individual within that city.

This fallacy can lead to erroneous generalizations and inaccurate assumptions about individuals based on broader trends. It is crucial for researchers to be mindful of this potential pitfall when selecting their unit of analysis, ensuring that their findings accurately represent the specific characteristics and behaviors of the individuals or entities under investigation.

Selecting the appropriate unit of analysis is critical for a research project’s success, shaping its focus and scope. Researchers must carefully align the chosen unit with their study objectives to ensure relevance.

The impact on findings and conclusions from this choice cannot be understated. Correctly choosing the unit of analysis can considerably influence the direction and outcomes of a research undertaking.

Robert Koch

I write about AI, SEO, Tech, and Innovation. Led by curiosity, I stay ahead of AI advancements. I aim for clarity and understand the necessity of change, taking guidance from Shaw: 'Progress is impossible without change,' and living by Welch's words: 'Change before you have to'.

  • Privacy Overview
  • Strictly Necessary Cookies
  • Additional Cookies

This website uses cookies to provide you with the best user experience possible. Cookies are small text files that are cached when you visit a website to make the user experience more efficient. We are allowed to store cookies on your device if they are absolutely necessary for the operation of the site. For all other cookies we need your consent.

You can at any time change or withdraw your consent from the Cookie Declaration on our website. Find the link to your settings in our footer.

Find out more in our privacy policy about our use of cookies and how we process personal data.

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot properly without these cookies.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as additional cookies.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

Columbia University Libraries

Data & statistics for journalists: unit of analysis.

  • Data vs Statistics
  • Geography of Data

Unit of Analysis

  • Spatial Data
  • Demographics
  • Economics & Business
  • Government (U.S.)
  • International
  • Maps & Geospatial Data
  • Property / Housing
  • Managing Data
  • Statistical Software: R

The  unit of analysis  is the entity that you're analyzing. It's called this because it's your analysis (what you want to examine) that determines what this unit is, rather than the data itself.

For instance, let's say that you have a dataset with 40 students, divided between two classrooms of 20 students each, and a test score for each student. You can analyze this data in several ways:

  • Individual unit of analysis: Compare the test scores of each student to the other students. (You're analyzing students, individuals.)
  • Group unit of analysis:  Compare the average test score of the two classrooms. (You're analyzing the classrooms, comparing two groups of individuals.)

Knowing your unit of analysis is helpful, because it helps you determine what kind of data you need. The other piece of this puzzle is whether you need  macrodata  (aggregated data) or  microdata.

Microdata & Macrodata

So what is the difference between  macrodata  (aggregated data) and  microdata ?

  • MICRODATA Contains a record for every individual (e.g., person, company, etc.) in the survey/study. Source for US Census microdata:  IPUMS
  • MACRODATA  (Aggregated Data) Higher-level data compiled from smaller (individual) units of data. For example, Census data in Social Explorer  has been aggregated to preserve the confidentiality of individual respondents. Source for US Census macrodata: Social Explorer  
  • << Previous: Time
  • Next: Spatial Data >>
  • Last Updated: Nov 20, 2023 6:16 PM
  • URL: https://guides.library.columbia.edu/journalism-data
  • Donate Books or Items
  • Suggestions & Feedback
  • Report an E-Resource Problem
  • The Bancroft Prizes
  • Student Library Advisory Committee
  • Jobs & Internships
  • Behind the Scenes at Columbia's Libraries

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Social Sci LibreTexts

7.3: Unit of analysis and unit of observation

  • Last updated
  • Save as PDF
  • Page ID 25640

  • Matthew DeCarlo
  • Radford University via Open Social Work Education

Learning Objectives

  • Define units of analysis and units of observation, and describe the two common errors people make when they confuse the two

Another point to consider when designing a research project, and which might differ slightly in qualitative and quantitative studies, has to do with units of analysis and units of observation. These two items concern what you, the researcher, actually observe in the course of your data collection and what you hope to be able to say about those observations. A unit of analysis is the entity that you wish to be able to say something about at the end of your study, probably what you’d consider to be the main focus of your study. A unit of observation is the item (or items) that you actually observe, measure, or collect in the course of trying to learn something about your unit of analysis.

In a given study, the unit of observation might be the same as the unit of analysis, but that is not always the case. For example, a study on electronic gadget addiction may interview undergraduate students (our unit of observation) for the purpose of saying something about undergraduate students (our unit of analysis) and their gadget addiction. Perhaps, if we were investigating gadget addiction in elementary school children (our unit of analysis), we might collect observations from teachers and parents (our units of observation) because younger children may not report their behavior accurately. In this case and many others, units of analysis are not the same as units of observation. What is required, however, is for researchers to be clear about how they define their units of analysis and observation, both to themselves and to their audiences.

51.jpg

More specifically, your unit of analysis will be determined by your research question. Your unit of observation, on the other hand, is determined largely by the method of data collection that you use to answer that research question. We’ll take a closer look at methods of data collection later on in the textbook. For now, let’s consider again a study addressing students’ addictions to electronic gadgets. We’ll consider first how different kinds of research questions about this topic will yield different units of analysis. Then, we’ll think about how those questions might be answered and with what kinds of data. This leads us to a variety of units of observation.

If we were to explore which students are most likely to be addicted to their electronic gadgets, our unit of analysis would be individual students. We might mail a survey to students on campus, and our aim would be to classify individuals according to their membership in certain social groups in order to see how membership in those classes correlated with gadget addiction. For example, we might find that majors in new media, men, and students with high socioeconomic status are all more likely than other students to become addicted to their electronic gadgets. Another possibility would be to explore how students’ gadget addictions differ and how are they similar. In this case, we could conduct observations of addicted students and record when, where, why, and how they use their gadgets. In both cases, one using a survey and the other using observations, data are collected from individual students. Thus, the unit of observation in both examples is the individual.

Another common unit of analysis in social science inquiry is groups. Groups of course vary in size, and almost no group is too small or too large to be of interest to social scientists. Families, friendship groups, and group therapy participants are some common examples of micro-level groups examined by social scientists. Employees in an organization, professionals in a particular domain (e.g., chefs, lawyers, social workers), and members of clubs (e.g., Girl Scouts, Rotary, Red Hat Society) are all meso-level groups that social scientists might study. Finally, at the macro-level, social scientists sometimes examine citizens of entire nations or residents of different continents or other regions.

A study of student addictions to their electronic gadgets at the group level might consider whether certain types of social clubs have more or fewer gadget-addicted members than other sorts of clubs. Perhaps we would find that clubs that emphasize physical fitness, such as the rugby club and the scuba club, have fewer gadget-addicted members than clubs that emphasize cerebral activity, such as the chess club and the women’s studies club. Our unit of analysis in this example is groups because groups are what we hope to say something about. If we had instead asked whether individuals who join cerebral clubs are more likely to be gadget-addicted than those who join social clubs, then our unit of analysis would have been individuals. In either case, however, our unit of observation would be individuals.

Organizations are yet another potential unit of analysis that social scientists might wish to say something about. Organizations include entities like corporations, colleges and universities, and even nightclubs. At the organization level, a study of students’ electronic gadget addictions might explore how different colleges address the problem of electronic gadget addiction. In this case, our interest lies not in the experience of individual students but instead in the campus-to-campus differences in confronting gadget addictions. A researcher conducting a study of this type might examine schools’ written policies and procedures, so her unit of observation would be documents. However, because she ultimately wishes to describe differences across campuses, the college would be her unit of analysis.

In sum, there are many potential units of analysis that a social worker might examine, but some of the most common units include the following:

  • Individuals
  • Organizations

One common error people make when it comes to both causality and units of analysis is something called the ecological fallacy . This occurs when claims about one lower-level unit of analysis are made based on data from some higher-level unit of analysis. In many cases, this occurs when claims are made about individuals, but only group-level data have been gathered. For example, we might want to understand whether electronic gadget addictions are more common on certain campuses than on others. Perhaps different campuses around the country have provided us with their campus percentage of gadget-addicted students, and we learn from these data that electronic gadget addictions are more common on campuses that have business programs than on campuses without them. We then conclude that business students are more likely than non-business students to become addicted to their electronic gadgets. However, this would be an inappropriate conclusion to draw. Because we only have addiction rates by campus, we can only draw conclusions about campuses, not about the individual students on those campuses. Perhaps the social work majors on the business campuses are the ones that caused the addiction rates on those campuses to be so high. The point is we simply don’t know because we only have campus-level data. By drawing conclusions about students when our data are about campuses, we run the risk of committing the ecological fallacy.

On the other hand, another mistake to be aware of is reductionism. Reductionism occurs when claims about some higher-level unit of analysis are made based on data from some lower-level unit of analysis. In this case, claims about groups or macro-level phenomena are made based on individual-level data. An example of reductionism can be seen in some descriptions of the civil rights movement. On occasion, people have proclaimed that Rosa Parks started the civil rights movement in the United States by refusing to give up her seat to a white person while on a city bus in Montgomery, Alabama, in December 1955. Although it is true that Parks played an invaluable role in the movement, and that her act of civil disobedience gave others courage to stand up against racist policies, beliefs, and actions, to credit Parks with starting the movement is reductionist. Surely the confluence of many factors, from fights over legalized racial segregation to the Supreme Court’s historic decision to desegregate schools in 1954 to the creation of groups such as the Student Nonviolent Coordinating Committee (to name just a few), contributed to the rise and success of the American civil rights movement. In other words, the movement is attributable to many factors—some social, others political and others economic. Did Parks play a role? Of course she did—and a very important one at that. But did she cause the movement? To say yes would be reductionist.

It would be a mistake to conclude from the preceding discussion that researchers should avoid making any claims whatsoever about data or about relationships between levels of analysis. While it is important to be attentive to the possibility for error in causal reasoning about different levels of analysis, this warning should not prevent you from drawing well-reasoned analytic conclusions from your data. The point is to be cautious and conscientious in making conclusions between levels of analysis. Errors in analysis come from a lack of rigor and deviating from the scientific method.

Key Takeaways

  • A unit of analysis is the item you wish to be able to say something about at the end of your study while a unit of observation is the item that you actually observe.
  • When researchers confuse their units of analysis and observation, they may be prone to committing either the ecological fallacy or reductionism.
  • Ecological fallacy- claims about one lower-level unit of analysis are made based on data from some higher-level unit of analysis
  • Reductionism- when claims about some higher-level unit of analysis are made based on data at some lower-level unit of analysis
  • Unit of analysis- entity that a researcher wants to say something about at the end of her study
  • Unit of observation- the item that a researcher actually observes, measures, or collects in the course of trying to learn something about her unit of analysis

Image attributions

Binoculars by nightowl CC-0

  • Skip to primary navigation
  • Skip to content

Avidnote

What is a unit of analysis?

' src=

The unit of analysis is an important concept whether you are conducting quantitative or qualitative research. It is related to another concept – the unit of observation. Though both are often used interchangeably (and can actually mean the same thing in some studies) they are not exactly the same conceptually.

This paper takes a closer look at what a unit of analysis is.

Unit of analysis explained

A unit of analysis is the main subject or entity whom the researcher intends to comment on in the study. It is mainly determined by the research question. Simply put, the unit of analysis is basically the ‘who’ or what’ that the researcher is interested in analyzing. For instance, an individual a group, organization, country, social phenomenon, etc. 

Unit of observation explained

A unit of observation is any item from which data can be collected and measured. The unit of observation determines the data collection and measurement techniques to be used. Just like a unit of analysis, an individual, group, country, social phenomenon, etc can also be a unit of observation.

The examples below highlight the way varying research questions can bring about varying units of analysis. They will also examine how different units of observation can arise due to the types of data used to find answers to the research questions.

Consider the question “Which nation has the brightest chance of winning the forthcoming senior world cup.” Here, the unit of analysis is a country. To answer this question may require sampling the opinions of some soccer aficionados or experts. Hence, a survey can be conducted to aggregate expert views (e.g., coaches, players, analysts, reporters, administrators, etc) all over the world.

The objectives of the survey can include finding out if variables like continent of origin, venue of the tournament, climatic conditions, quality of players, level of preparation, and administrative efficiency play any role in the emergence of the champion. The survey’s findings may indicate that the quality of players, level of preparation, and the efficiency of a country’s soccer administrators are the most important determinants for winning the trophy. 

Suppose an alternative question is asked, say “what are the differences and similarities in the ways countries prepare for the senior world cup.” One way to answer this question (assuming it is a world cup season) is to closely observe the preparation programmes of participating countries, including camping and physical training activities.  

It can be deduced from the above examples that the unit of analysis is different in each case. In the first question, the country is the unit of analysis while in the second, a social phenomenon – preparation programme is the unit of analysis. In both examples, the unit of observation is the same – countries.

As noted in the definitions above, groups can also constitute a unit of analysis. In the question about which country is likely to win the senior world cup, for example, a group survey of a couple of soccer clubs can also be used to elicit responses. In this case, the unit of analysis is a group [say a professional football club].

For organizations, take the senior world cup example mentioned above as an example. Suppose a researcher poses the question “Are the levels of funding provided by soccer associations enough for them to challenge for the world cup?” Note that the main concern here is on soccer administrators and not on the teams of players. To determine the adequacy or otherwise of national teams’ funding, the researcher might need to source for and study various forms of documents. This means that documents are the unit of observation in this scenario. If he decides to make country-by-country comparisons on national team funding, then his unit of analysis will be the countries investigated.

Rules, policies, and principles are yet another form of units of analysis. Policy research, for example, will most likely involve analyzing several documents. Consider a soccer association that employs a lawyer to help draft a code of conduct for players [unit of analysis] preparing for the world cup in a closed camp. To come up with an acceptable code of conduct, the lawyer may decide to study all past code of conduct documents [unit of observation] of the association and maybe how the rules in the code have been observed and otherwise as well as the kind of penalties for the various violations of camp rules.

Unit of analysis and unit of observation as one

It has been suggested above that both concepts can be one and the same in some situations. For instance, a tutor can be hired to improve the oral or spoken English proficiency of a student struggling in that area. After a couple of months, the tutor decides to assess and evaluate the proficiency levels of his or her student based on what has been taught thus far. In this example, the student is both a unit of analysis as well as a unit of observation.

As noted from the discussion above, both the unit of analysis and the unit of observation are research concepts. These units can be individuals, groups, countries, organizations, social phenomena, etc. Though both concepts can be the same in some studies, differences also exist between them in other studies. Because of this confusing tendency, it is necessary that the researcher is as clear as possible when explaining the similarities or differences between both concepts.

You may also like

what is unit of analysis in research

How to Ensure Data Privacy While Using AI Research Tools

Learn how to protect your data privacy while using AI research tools. Discover strategies and best practices for safeguarding your research data. Join us for valuable insights and practical tips to navigate the intersection of AI and

what is unit of analysis in research

5 Ways AI Enhances Data Analysis

Discover how AI revolutionizes data analysis with these 5 key enhancements. Elevate your capabilities, gain deeper insights, and make informed decisions. Click to explore more!

Leave a comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

Privacy Overview

Adding {{itemName}} to cart

Added {{itemName}} to cart

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of elife

Unit of analysis issues in laboratory-based research

Nick r parsons.

1 Warwick Medical School, University of Warwick, Coventry, United Kingdom

M Dawn Teare

2 Sheffield School of Health and Related Research, University of Sheffield, Sheffield, United Kingdom

Alice J Sitch

3 Public Health Building, University of Birmingham, Birmingham, United Kingdom

Many studies in the biomedical research literature report analyses that fail to recognise important data dependencies from multilevel or complex experimental designs. Statistical inferences resulting from such analyses are unlikely to be valid and are often potentially highly misleading. Failure to recognise this as a problem is often referred to in the statistical literature as a unit of analysis (UoA) issue. Here, by analysing two example datasets in a simulation study, we demonstrate the impact of UoA issues on study efficiency and estimation bias, and highlight where errors in analysis can occur. We also provide code (written in R) as a resource to help researchers undertake their own statistical analyses.

Introduction

Defining the experimental unit is a key step in the design of any experiment. The experimental unit is the smallest object or material that can be randomly and independently assigned to a particular treatment or intervention in an experiment ( Mead et al., 2012 ). The experimental unit (e.g. a tissue sample, individual animal or study participant) is the object a scientist wants to make inferences about in the wider population, based on a sample in the experiment. In the simplest possible experimental setting where each experimental unit provides a single outcome or observation, and only in this setting, the experimental unit is the same as both the unit of observation (i.e the unit described by the observed outcomes) and the unit of analysis (UoA) (i.e. that which is analysed). In general this will not always be the case, so care must be taken, both when planning and reporting research, to clearly define the experimental unit, and what data are being analysed and how these relate to the aims of the study.

In laboratory based research in the biomedical sciences it is almost always the case that multiple observations or measurements are made for each experimental unit. These multiple observations, which could be simple replicate measurements from a single sample or observations from multiple sub-samples taken from a single sample, allow the variability of the measure and the stability of the experimental setting to be assessed. They improve the overall statistical power of a research study. However, multiple or repeat observations taken from the same experimental unit tend to be more similar than observations taken from different experimental units, irrespective of the treatments applied or when no treatments are applied. Therefore data within experimental units are likely to be dependent ( correlated ), whereas data from different experimental units are generally assumed to be independent , all other things being equal (i.e after removing the direct and indirect effects of the experimental interventions and setting).

The majority of widely reported statistical methods (e.g. t-tests, analyses of variance, generalized linear models, chi-squared tests) assume independence between all observations in an analysis, possibly after conditioning on other observed data variables. If the UoA is the same as the experimental unit (i.e. a single observation or summary measure is available for each unit) then the independence assumption is likely to be met. However, many studies reported in the biomedical research literature using multilevel design, often also referred to as mixed-effects, nested or hierarchical designs ( Gelman and Hill, 2007 ), or more complex structured designs, fail to recognise the fact that independence assumptions are unlikely to be valid, and thus the reported analyses are also unlikely to be valid. Statistical inferences made from such analyses are often highly misleading.

UoA issues , as they are termed in the statistical literature ( Altman and Bland, 1997 ), are not limited to biomedical laboratory studies, and are recognised as a major cause of concern more generally for reported analyses in bioscience and medicine ( Aarts et al., 2014 ; Altman and Bland, 1997 ; Bunce et al., 2014 ; Fleming et al., 2013 ; Lazic, 2010 ; Calhoun et al., 2008 ; Divine et al., 1992 ), and also feed into widely acknowledged issues around the lack of reproducibility and repeatability of much biomedical research ( Academy of Medical Sciences, 2017 ; Bustin and Nolan, 2016 ; Ioannidis et al., 2014 ; McNutt, 2014 ).

The RIPOSTE (Reducing IrreProducibility in labOratory STudiEs) framework was established to support the dialogue between scientists and statisticians in order to improve the design, conduct and analysis of laboratory studies in biomedical sciences in order to reduce irreproducibility ( Masca et al., 2015 ). The aim of this manuscript, which evolved directly from a number of recommendations made by the RIPOSTE framework, is to help laboratory scientists identify potential UoA issues, to understand the problems an incorrect analysis may cause and to provide practical guidance on how to undertake a valid analysis using the open source R statistical software ( R Core Team, 2016 ; Ihaka and Gentleman, 1996 ). A simple introduction to the basics of R is available from Venables et al., 2017 and sources of information on implementation of statistical methods in the biosciences are widely available (see, for example, Aho, 2014 ).

A simulation study is undertaken in order to quantify losses in efficiency and inflation of the false positive rate that an incorrect analysis may cause (Appendix 1). The principles of experimental design are briefly discussed, with some general guidance on implemtation and good practice (Appendix 2), and two example datasets are introduced as a means to highlight a number of key issues that are widely misunderstood within the biomedical science literature. Code in the R programming language is provided both as a template for those wishing to undertake similar analyses and in order that all results here can be replicated (Appendix 3); script is available at Parsons, 2017 . In addition, a formal mathematical presentation of the most common analysis error in this setting is also provided (Appendix 4).

Methods and materials

A fundamental aspect of the design of all experimental studies is a clear identification of the experimental unit . By definition, this is the smallest object or material that can be randomly and independently assigned to a particular treatment or intervention in the experiment ( Mead et al., 2012 ). The experimental unit is usually the unit of statistical analysis and should provide information on the study outcomes independent of the other experimental units. Where here the term outcome refers to a quantity or characteristic measured or observed for an individual unit in an experiment; most experiments will have many outcomes (e.g. expression of multiple genes, or mutiple assays) for each unit. The term multiple outcomes refers to such situtations, but is not the same as repeated outcomes (or more often repeated measures ) which refers to measuring the same outcome at multiple time-points. Experimental designs are generally improved by increasing the number of (independent) experimental units, rather than increasing the number of observations within the unit beyond what is require to measure within unit variation with reasonable precision. If only a single observation of a laboratory test is obtained for each subject, data can be analysed using conventional statistical methods provided all the usual cautions and necessary assumptions are met. However, if there are for instance multiple observations of a laboratory test observed for each subject (e.g. due to multiple testing, duplicated analyses of samples or other laboratory processes) then the analysis must properly take account of this.

If all observations are treated equally in an analysis, ignoring the dependency in the data that arises from multiple observations from each sample, this leads to inflation of the false positive (type I error) rate and incorrect (often highly inflated) estimates of statistical power, resulting in invalid statistical inference (see Appendix 1). Errors due to incorrect identification of the experimental unit were identified as an issue of concern in clinical medicine more than 20 years ago, and continue to be so ( Altman and Bland, 1997 ). The majority of such UoA issues involve multiple counting of measurements from individual subjects (experimental units); these issues have particular traction in for instance orthopaedics, ophthalmics and dentistry, where they typically result from measurements on right and left hips, knees or eyes of a study participant or a series of measurements on many teeth from the same person.

The drive to improve standards of reporting and thereby design and analysis of randomized clinical trials, which resulted in the widely known CONSORT guidelines ( CONSORT GROUP (Consolidated Standards of Reporting Trials) et al., 2001 ), has now expanded to cover many related areas of biomedical research activity. For instance, work by ( Kilkenny et al., 2009 ) highlighted poor standards of reporting of experiments using animals, and made specific mention of the poor reporting of the number of experimental units; this work led directly to the ARRIVE guidelines (Animal Research: Reporting of In Vivo Experiments; Kilkenny et al., 2010 ) that explicitly require authors to report the study experimental unit when describing the design. The recent Academy of Medical Sciences symposium on the reproducibility and reliability of biomedical research ( Academy of Medical Sciences, 2017 ) specifically highlighted poor experimental design and inappropriate analysis as key problem areas, and highlighted the need for additional resources such as the NC3Rs (National Centre for the Replacement, Reduction and Refinement of Animals in Research) free online experimental design assistant ( NC3Rs, 2017 ).

The experimental unit should always be identified and taken into account when designing a research study. If a study is assessing the effect of an intervention delivered to groups rather than individuals then the design must address the issue of clustering; this is common in many health studies where a number of subjects may receive an intervention in a group setting or in animal experiments where a group of animals in a controlled environment may be regarded as a cluster. This is also the case if a study is designed to take repeated measurements from individual subjects or units, from a source sample or replicate analyses of a sample itself. Individuals in a study may also be subject to inherent clustering (e.g. family membership) which needs to be identified and accounted for.

As a prelude to discussion of analysis issues, it is important to distinguish between a number of widely reported and distinct types of data resulting from a variety of experimental designs. The word subject is used here loosely to mean the subject under study in an experiment and need not necessarily be an individual person, participant or animal.

  • Individual subjects: In many studies the UoA will naturally be an individual subject, and be synonymous with the experimental unit. A single measurement is available for each subject, and inferences from studies comprising groups of subjects apply to the wider population to which the individual subject belongs. For example, a blood sample is collected from n patients ( experimental units ) and a haemoglobin assay is undertaken for each sample. Statistical analysis compares haemoglobin levels between groups of patients, where the variability between samples is used to assess the significance of differences in means between groups of patients.
  • Groups of subjects: Measurements are available for subjects. However, rather than being an individual subject, the experimental unit could be a group of subjects that are exposed to a treatment or intervention. In this case, inferences from analyses of variation between experimental units, apply to the groups, but not necessarily to individual subjects within the groups. For example, suppose n  ×  m actively growing maize plants are planted together at high density in groups of size n in m controlled growing environments (growth rooms) of varying size and conditions (e.g. light and temperature). Chlorophyll fluorescence is used to measure stress for individual plants after two weeks of growth. Due to the expected strong competition between plants, inferences about the effects of the environmental interventions on growth are made at the room level only. Alternatively, in a different experiment the same plants are divided between growth rooms, kept spatially separated in notionally exactly equivalent conditions, after being previously given one of two different high strength foliar fertiliser treatments. Changes in plant height (from baseline) are used to assess the effect of the foliar interventions on individual plants. Although the intention was to keep growth rooms as similar as possible, inevitably room-effects meant that outcomes for individual plants tended to be more similar if they came from the same room, than if they came from different rooms. In this setting the plant is the experimental unit , but account needs to be made for the room-effects in the analysis.
  • Multiple measurements from a single source sample: In laboratory studies, the experimental unit is often a sample from a subject or animal, which is perhaps treated and multiple measurements taken. Statistical inferences from analyses of data from such samples should apply to the individual tissue (source) from which the sample was taken, as this is the experimental unit . For example, consider the haemoglobin example (i), if the assay is repeated m times for each of the n  blood samples, then there would be n  ×  m data values available for analysis. The analysis should take account of the fact that the replicate measurements made for each sample tell us nothing useful about the variability between samples, which are the experimental units .
  • Multiple sub-samples from a single sample: Often a single sample from an experimental unit is sub-divided and results of assays or tests of these sub-samples yield data that provide an assessment of the variability between sub-samples. It is important to note that this is not the same as taking multiple samples from an experimental unit. The variability between experimental units is not the same as, and must be distinguished from, variability within an experimental unit and this must be reflected in the analysis of data from such studies. For example, n samples of cancerous tissue ( experimental unit ) are each divided into m sub-samples and lymph node assays made for each. The variability between the m sub-samples, for each of the n experimental units, is not necessarily the same as the variability that might have been evident if more than one tissue sample had been taken from each experimental unit. This could be due to real differences as the multiple samples are from different sources, or batch-effects due to how the samples are processed or treated before testing.
  • Repeated measures: One of the most important types of experimental design is the so-called repeated-measures design, in which measurements are taken on the same experimental unit at a number of time-points (e.g. on the same animal or tissue sample after treatment, at more than one occasion). These multiple measurements in time are generally assumed to be correlated and regarded as repeat measurements from an experimental unit and not separate experimental units. The likely autocorrelation between temporally related measurements from the experimental units should be reflected in the analysis of such studies. For example, height measurements for the n  ×  m plants in (ii) could have been made at each of t occasions. The t height measurements are a useful means of assessing temporal changes for individual plants ( experimental unit ), such as the rate of increase (e.g. per day). However, due to the likely strong correlations, increasing the number of assessment occasions will generally add much less information to the analysis than would be obtained by increasing the number of experimental units.

Clearly many of these distinct design types can be combined to create more complex settings; e.g. plants might be housed together in batches that cause responses from the plants in the same batch to be correlated ( batch-effects ), and samples taken from the plants, divided into sub-samples, and processed at two different testing centres, possibly resulting in additional centre-effects . For such complex designs, it is advisable to seek expert statistical advice, however the focus in the sections discussing analysis is mainly on cases (ii), (iii) and (iv). Case (i) is handled adequately by conventional statistical analysis, and although case (v) is important, it is too large a topic to discuss in great depth here (see e.g. ( Diggle et al., 2013 ) for a wide ranging discussion of longitudinal data analysis). More general design issues are discussed in Appendix 2.

Sample size

Power analysis provides a formal statistical assessment of sample size requirements for many common experimental designs; power here is the probability (usually expressed as a percentage) that the chosen test correctly rejects the study null hypothesis, and is usually set at either 80% or 90%. Many simple analytic expressions exist for calculating sample sizes for common types of design, particular for clinical settings where methods are well developed and widely used ( Chow et al., 2008 ). Power increases as the square root of the sample size n , so power is gained by increasing n but at a diminishing rate with n . Also power is inversely related to the variance of the outcome σ 2 , so choosing a better or more stable outcome or assay or test procedure will increase power.

For the most simple design with a normally distributed outcome, comparing two groups of n subjects (e.g. as in Design case (i)), the sample size is given by n = 2 σ 2  × {( z α /2 + z β ) 2 / d 2 }, where d is the difference we wish to detect, z β represents the the upper 100 ×  β standard normal centile, and 1 -  β is the power and α the significance level; for the standard significance of 5% and power of 90%, ( z α /2 + z β ) 2 = (1.96+1.28) 2  ≈ 10.5.

Where there are clusters of subjects (e.g. as in Design case (ii)), then the correlation between observations within clusters will have an impact on the sample size ( Hemming et al., 2011 ). The conventional sample size expression needs to be inflated by a variance inflation factor (VIF), also called a design effect , given by VIF = 1 + ( m - 1) × ICC, where there are m observations in each cluster (e.g. a batch) and ICC is the intraclass (within cluster) correlation coefficient that quantifies the strength of association between subjects within a cluster. The ICC can either be estimated from pilot data or from previous studies in the same area (see examples), or otherwise a value must be assumed. For small cluster sizes ( m  < 5) and intraclass correlations (ICC < 0.01), the sample size needs only to be inflated by typically less than 10% (see Table 1 ). However for larger values of both m and ICC, sample sizes may need to be doubled, trebled or more to achieve the required power.

For more complex settings, often the only realistic option for sample size estimation is simulation. Raw data values are created from an assumed distribution (e.g. multivariate normal distribution with known means and covariances) using a random number generator, and the planned analysis performed on these data. This process can be repeated many (usually thousands of) times and the design characteristics (e.g. power and type I error rate) calculated for various sample sizes. This has typically been a task that requires expert statistical input, but increasingly code is available in R to make this much easier ( Green and MacLeod, 2016 ; Johnson et al., 2015 ). Many application area dependent rules of thumb exist when selecting a sample size, the most general being the resource equation approach of ( Mead et al., 2012 ), which suggests that approximately 15 degrees of freedom are required to estimate the error variance at each level of an analysis.

Incorrect analysis of data that have known or expected dependencies leads to inflation of the false positive rate (type I error rate) and invalid estimates of statistical power, leading to incorrect statistical inference; a simulation study (Appendix 1) shows how various design characteristics can affect the properties of a hypothetical study. Focussing on linear statistical modelling ( McCullagh and Nelder, 1998 ), which is by far the most widely used methodology for analysis when reporting research in the biomedical sciences, there are generally two distinct approaches to analysis when there are known UoA issues ( Altman and Bland, 1997 ).

Subject-based analysis

The simplest approach to analysis is to use a single observation for each subject. This could be achieved by selecting a single representative observation or more usually by calculating a summary measure for each subject. The summary measure is often the mean value, but could be for instance the area under a response curve or the gradient (rate) measure from a linear model. Given that this results in a single observation for each subject, analysis can proceed using the summary measure data in the conventional way using a generalized linear model (GLM; ( McCullagh and Nelder, 1998 )) assuming independence between all observations.

A GLM relates a (link function) transformed response variable to a linear combination of explanatory variables via a number of model parameters that are estimated from the observed data. The explanatory variables are so-called fixed-effects that represent the (systematic) observed data that are used to model the response variable. The lack of model fit is called the residual or error , and represents unstructured deviations from the model predictions that are beyond control. The subject-based approach is valid but has the disadvantage that not all of the available data are used in the definitive analysis, resulting in some lack of efficiency. Care must be taken when choosing a single measure for each subject, ensuring the selection does not introduce bias and if a summary measure is generated, this value must be meaningful and if appropriate the analysis should be weighted to account for the precision in estimation of the summary measure.

Mixed-effect analysis

A better approach than the subject-based analysis, is a mixed-effect analysis ( Galwey, 2014 ; Pinheiro and Bates, 2000 ). A (generalized) linear mixed effects model (GLME) is an extension of the conventional GLM, where structure is added to the error term, leaving the systematic fixed terms unchanged, by adding so-called random-effect terms that partition the error term into a set of structured (often nested) terms. In the simplest possible setting ( Bouwmeester et al., 2013 ), the error term is replaced by a subject-error term to model the variation between subjects and a within-subject error term to model the within subject variation. This partition of the error into multiple strata allows, for instance, the correct variability ( subject-error term) to be used to compare groups of subjects. Random-effects are often thought of as terms that are not of direct inferential interest (in contrast to the fixed-effects) but are such that they need to be properly accounted for in the model; e.g. a random selection of subjects or centres in a clinical trial, shelves in an incubator that form a temperature gradient or repeat assays from a tissue sample.

The algorithms used to estimate the model terms for a GLME and details of how to model complex error structures will not be discussed further, but more details can be found in for instance Pinheiro and Bates, 2000 . Mixed-effects models can be fitted in most statistical software packages, but the focus here is on the R open source statistical software ( R Core Team, 2016 ). Detailed examples of implementation and code are provided in Appendix 3 and a script is available at Parsons, 2017 to reproduce all the analysis shown here using the R packages nlme ( Pinheiro et al., 2016 ) and lme4 ( Bates et al., 2015 ).

In order to better appreciate the importance of UoA issues, to understand how these issues arise and to show statistically how analyses should be implemented, two example datasets from real experiments are described and analysed in some detail. The aims of the experiments are clearly not of direct importance, but the logic, process and conduct of the analyses are intended to be sufficiently general in nature so as to elucidate many key problematic issues.

Example 1: Adjuvant radiotherapy and lymph node size in colorectal cancer

Six subjects diagnosed with colorectal cancer, after confirmatory magnetic resonance imaging, underwent neoadjuvant therapy comprising of a short course of radiotherapy (RT) over one week prior to resection surgery. These subjects were compared with six additional cancer subjects, of similar age and disease severity, who did not receive the adjuvant therapy. The aim of the study was to assess whether the therapy reduced lymph node size in the resection specimen (i.e. the sample removed during surgery). The resection specimen for each subject was divided into two sub-samples after collection, and each was fixed in formalin for 48-72 hr. These sub-samples were processed and analysed at two occasions, by different members of the laboratory team. The samples were sliced at 5mm intervals and images captured and analysed in an automated process that identified lymph node material which was measured by a specialist pathologist to give a measure of individual lymph node size (i.e. diameter), based on assumed sphericity. Three slices per sub-sample were collected for each subject. Table 2 shows the measured lymph node sizes in mm for each sample.

Naive analysis

The simplest analysis and the one that may appear to be correct if no information on the design or data structure shown in Table 2 were known, would be a t-test that compares the mean lymph node size between the RT groups. This shows that there is reasonable evidence to support a statistically significant difference in mean lymph node size between those subjects who received RT (Short RT) and those who did not (None); mean in group None = 2.403 mm and in group RT Short = 2.120 mm, difference in means = 0.283 mm (95% CI; 0.057 to 0.508), with a t-statistic = 2.501 on 70 degrees of freedom, and a p-value = 0.015. The conclusion from this analysis is that lymph node sizes were statistically significantly smaller in the group that had received adjuvant RT. Why should the veracity of this result be questioned?

The assumptions made when undertaking any statistical analysis must be considered carefully. The t-statistic is calculated as the absolute value of the difference between the group means, divided by the pooled standard error of the difference (sed) between the group means. This latter quantity is given by s e d = s × ( 1 / n 1 + 1 / n 2 ) , where n 1 and n 2 are the sample sizes in the two groups and s 2 is the pooled variance given by s 2 = ( ( n 1 − 1 ) s 1 2 + ( n 2 − 1 ) s 2 2 ) / ( n 1 + n 2 − 2 ) ; where s 1 2 and s 2 2 are the variances within each group. The important thing to realize here is that the variances within each of the RT groups are calculated by simply taking the totality of data for all six subjects in each group, across all sample types and slices. One of the key assumptions of the t-test is that of independence . Specifically, this requires the lymph node sizes to be all independent of each other; i.e. the observed size for one particular node is not systematically related to the other lymph node size data used for the statistical test. What is meant by related to in this context?

It seems highly likely that the lymph node sizes for repeat slices for any particular sample for a subject are more similar than size measurements from other subjects. Similarly, it might be expected that lymph node sizes for the two samples for each subject are more similar than lymph nodes size measurements from other subjects. If the possibility that this is important is ignored, and a t-test is undertaken, then the variability measured between samples and between slices within samples is being used to assess differences between subjects. If the assumption of independence is not valid, then by ignoring this, claims for statistical significance may be being made that are not supported by the data (See Appendix 4 for a mathematical description of the naive analysis ).

Given that the lymph node size measurements within samples and subjects are likely to be more similar to each other than to data from other subjects, how should the analysis be conducted? Visual inspection of the data can often reveal patterns that are not apparent from tabular summaries; Figure 1 shows a strip plot of the data from Table 2 .

An external file that holds a picture, illustration, etc.
Object name is elife-32486-fig1.jpg

It is clear, from a visual inspection alone of Figure 1 , that data from repeat slices within samples are more similar (clustered together) than data from the repeat samples within each subject. And also that data from the multiple samples and slices for each subject are generally clustered together; data from a single subject are usually very different from other subjects, irrespective of the RT grouping. One, albeit crude, solution to such issues is to calculate a summary measure for each of the experimental units at the level at which the analysis is made, and use these measures for further analysis. The motivation for doing this is that it is usually reasonable to assume that experimental units (subjects) are independent of one another, so if a t-test is undertaken on summary measures from each of the twelve subjects it is also reasonable to assume that the necessary assumption of independence is true.

Using the mean lymph node size for each subject as the summary measure (subjects 1 to 12; 1.85, 2.78, 1.79, 2.24, 3.15, 2.60, 2.42, 1.57, 1.82, 2.26, 2.02, and 2.62 mm), a t-test shows that there is no evidence to support a statistically significant difference in mean lymph node size between those subjects who received RT (Short RT) and those who did not (None); mean in group None = 2.403 mm and in group RT Short = 2.120 mm, difference in means = 0.283 mm (95% CI; -0.321 to 0.886), with a t-statistic = 1.043 on 10 degrees of freedom, and a p-value = 0.322. Note that the group means are the same but now the t-statistic is based on 10 degrees of freedom, rather than the 70 of the naive analysis, and the confidence interval is considerably wider than that estimated for the naive analysis. The conclusion from this analysis is that there is no evidence to support a difference in lymph node size between groups. Why is the result of this t-test so different from the previous naive analysis?

In the naive analysis the variability between measurements within the main experimental units (subjects) and the variability between experimental units was used to assess the difference between experimental units. In the analysis in this section the variability between experimental units alone has been used to assess the effect of the intervention applied to the experimental units. The multiple measurements within each experimental unit improve the precision of the estimate of the unit mean, but provide no information on the variability between units, that is important in assessing interventions that are applied to the experimental units. This analysis is clearly an improvement on the naive analysis, but it uses only summary measures for each experimental unit, rather than the full data, it tells us nothing about the relative importance of the variability between subjects, between samples and between slices and it does not allow us to assess the importance of these design factors to the conclusions of the analysis.

Linear mixed-effects analysis

To correctly explain and model the lymph node data a linear mixed-effects model must be used. The experimental design used in the lymph node study provides the information needed to construct the random-effects for the mixed-effects model. Here there are multiple levels within the design that are naturally nested within each other; samples are nested within subjects, and slices are nested within samples. Fitting such a mixed-effects model gives the following estimate for the intervention effect (RT treatment groups); difference in means = 0.283 mm (95% CI; -0.321 to 0.886), with a p-value = 0.322 (t-statistic = 1.043 on 10 degrees of freedom). For a balanced design, intervention effect estimates for the mixed-effects model are equivalent to those from the subject-based analysis. A balanced design is one where there are equal numbers of observations for all possible combinations of design factor levels; in this example there are the same number of slices within samples and samples within subjects.

The mixed effects model allows the variability within the data to be examined explicitly. Output from model fitting also provides estimates of the standard deviations of the random effects for each level of the design; these are for subjects, σ P = 0.436 (95% CI; 0.262 to 0.727), samples σ S = 0.236 (95% CI; 0.151 to 0.362) and residuals (slices) σ ϵ = 0.122 (95% CI; 0.100 to 0.149). Squaring to get variances, indicates that the variability, in lymph node size, between subjects was three and half times more than the variability between samples, and nearly thirteen times as much as the variability between repeat slices within samples. The intraclass correlation coefficient measures the strength of association between units within the same group; for subjects ICC P = 0.733, where ICC P = σ P 2 / ( σ P 2 + σ S 2 + σ ϵ 2 ) . This large value, which represents the correlation between two randomly selected observations on the same subject, shows why the independence assumption required for the naive analysis is wrong (i.e. independence implies that ICC = 0). This demonstrates clearly why pooling variability without careful thought about the sampling strategy and design of an experiment is unwise, and likely to lead to erroneous conclusions.

Various competing models for random effects can be compared using likelihood ratio tests (LRT). For instance in this example suppose that the two samples collected for the same subject had been arbitrarily labelled as sample 1 and sample 2 , and in practice there was no real difference in the methods used to process or capture images of nodes from the two samples. In such a setting, a more appropriate random effects model may be to have a subject effect only and ignore the effects of samples within subjects. Constructing such a model and comparing to the more complex model gives a LRT = 39.92 and p-value < 0.001, providing strong support in favour of the full multilevel model. Diagnostic analyses can be undertaken after fitting mixed-effects model, in an analogous manner to linear models ( Fox et al., 2011 ).

Figure 2 shows boxplots of residuals for each subject and a quantile-quantile plot to assess Normality of the residuals. Inspection of the residual plots for the lymph node size data, show that assumptions of approximate Normality are reasonable; e.g. the quantile-quantile plot of the residuals from the model fit fall (approximately) along a straight line when plotted against theoretical residuals from a Normal distribution. If residuals fail to be so well behaved and deviate in a number of well understood ways, or if for instance variances are non-equal or vary with the outcome (heterogeneity), then transforming the data prior to linear mixed-effects analysis can improve the situation ( Mangiafico, 2017 ). However, in general, if the Normality assumption is not sustainable, data are better analysed using generalized linear mixed effects models ( Pinheiro and Bates, 2000 ; Galwey, 2014 ), that better account for the distributional properties of the data.

An external file that holds a picture, illustration, etc.
Object name is elife-32486-fig2.jpg

Quantile-quantile (Q–Q) plot of the model residuals ( ∘ ) on the horizontal axis against theoretical residuals from a Normal distribution on the vertical axis ( b ).

Unbalanced data analysis

Intervention effect estimates for the mixed-effects and subject-based analyses presented here are equivalent, due to the balanced nature of the design. Every subject has complete data for all samples and slices. By calculating means for each subject averaging occurs across the same mix of samples and slices, so irrespective of the effects on the analysis of these factors, the means will be directly comparable and estimated with equivalent precision. Whilst balance is a desirable property of any experimental design, it is often unrealistic and impractical to obtain data structured in this way; for instance in this example, samples may be contaminated or damaged during processing or insufficient material may be available for all three slices.

Repeating the above mixed-effects analysis after randomly removing 50% of the data (see Table 2 ), gives an estimated difference in lymph node size between groups = 0.263 mm (95% CI; -0.397 to 0.922), with a p-value = 0.391, and estimates of the standard deviations of the random effects for each level of the design, σ P = 0.421 (95% CI; 0.224 to 0.794), σ S = 0.279 (95% CI; 0.160 to 0.489) and σ ϵ = 0.124 (95% CI; 0.088 to 0.174). These are, perhaps surprisingly given that only half the data from the previous analysis are being used, very similar to estimates from the complete data. However, in the unbalanced setting the subject-based analysis is no longer valid, as it ignores the variation in sample sizes between subjects; the estimated difference in lymph node size between groups is 0.199 mm (95% CI; -0.474 to 0.872) for the subject-based analysis.

Example 2: Lymph node counts after random sampling

The most extreme example of non-normal data is for binary responses, which generally results from yes/no or present/absence type outcomes. Extending the lymph node example, in a parallel study, rather than measure the sizes of selected nodes or conduct a time-consuming count of all nodes, a random sampling strategy was used to select regions of interest (RoI) in which fives nodes were randomly selected and compared to a 2mm reference standard ( ≥ 2mm; yes or no). This could be done rapidly by a non-specialist. Five samples were processed for each of twelve subjects, in an equivalent design to the lymph node size study; data are shown in Table 3 .

Non-normal data analysis

For some subjects there was insufficient tissue for five samples, resulting in an unbalanced design. The odds of an event (i.e. observing or not observing a lymph node with diameter  ≥ 2mm), is the ratio of the probabilities of the two possible states of the binary event, and the odds ratio is the ratio of the odds in the two groups of subjects (e.g. those receiving either None or Short RT). A naive analysis of these data suggest an estimate of the odds ratio of (43/82)/(79/46) = 0.31, for RT Short versus None groups; 43 lymph nodes with maximum diameters  ≥ 2mm from 125 in the RT Short group versus 79 from 125 in the None group. Being in the RT Short group results in a lower odds of lymph nodes with diameters  ≥ 2mm. This is the result one would obtain by conventional logistic regression analysis; odds-ratio 0.31 (95% CI; 0.18 to 0.51; p-value < 0.001) providing very strong evidence that lymph node diameters were lower in the RT Short group.

In logistic regression analysis the estimated regression coefficients are interpreted as log odds-ratios, which can be transformed to odds ratios using the exponential function ( Hosmer et al., 2013 ). However, one should be instinctively cautious about this result, as it is clear from Table 3 that variation within subjects is much less than between subjects; i.e. some subjects have low counts across all samples and others have high counts across all samples. The above analysis ignores this fact and pools variation between samples and between subjects to test for differences between two groups of subjects. This is clearly not a good idea.

Fitting a GLME model with a subject random effect, gives an estimated odds-ratio for the Short RT group of 0.26 (95% CI; 0.09 to 0.78; p-value = 0.016). The predicted probability of detecting a lymph node with a diameter  ≥ 2mm was 0.65 for the None RT group and 0.33 for the Short RT. The overall conclusions of the study have not changed, however the level of significance associated with the result is massively overstated in the simple logistic regression, due to the much smaller estimate of the standard error of the log odds-ratio (0.264 for logistic regression versus 0.564 for the mixed-effects logistic regression). By failing to properly account for the difference in variability between measurements made on the same subject relative to the variability in measurements between subjects results in overoptimistic conclusions.

The examples, simulations and code provided highlight the importance of correctly identifying the UoA in a study, and show the impact on the study inferences of selecting an inappropriate analysis. The simulation study (Appendix 1) shows that the false positive rate can be extremely high and efficiency very low if analyses are undertaken that do not respect well known statistical principles. The examples reported are typical of studies in the biomedical sciences and together with the code provide a resource for scientists who may wish to undertake such analyses (Appendix 3). Although clearly discussion with a statistician, at the earliest possible stage in a study, should always be strongly encouraged, in practice this may not be possible if statisticians are not an integral part of the research team. The RIPOSTE framework ( Masca et al., 2015 ) called for the prospective registration ( Altman, 2014 ) and publication of study protocols for laboratory studies, which we believe if implemented would go a long way towards addressing many of the issues discussed here by causing increased scrutiny at all stages of an experimental study.

The examples, design and analysis methods presented here have deliberately used terminology such as experimental unit , subject and sample to make the arguments more comprehensible, particularly for non-statisticians, who often find these topics conceptually much easier to understand using such language. This may have contributed to the widespread belief amongst many laboratory scientists that these issues are important only in human experimentation. Where, for instance, the subject is a participant in a clinical trial and the idea that subjects provide data that are independent of one another, but correlated within a subject seems perfectly natural. However, although such language is used here, it is important to emphasise that the issues discussed apply to all experimental studies and are arguably likely to be more not less important for laboratory studies than for human studies. The lack of appreciation of the importance of UoA issues in laboratory science may be due to the misconception that the within subject associations observed for human subjects arise mainly from the subjective nature of the measures used in clinical trials on human subjects; e.g. patient-reported outcomes. Contrasting these with the more objective (hard) measures that dominate in much biomedical laboratory based science leads many to assume that that these issues are not important when analysing data and reporting studies in their own research area.

Mixed-effects models are now routinely used in the medical and social sciences (where they are often known as multilevel models), to for instance allow for the clustering in patient data from a recruiting centre in a clinical trial, or to model the association in outcomes within schools and classrooms from students ( Brown and Prescott, 2015 ; Snijders and Bosker, 2012 ). Mixed-effects models originated from the work of pioneering statistician/geneticist R. A. Fisher ( Fisher, 1919 ), whose classic texts on experimental design have led to their extensive and very early use in agricultural field experimentation ( Mead et al., 2012 ). However, the use of mixed-effects models in the biological sciences has not spread from the field to the laboratory.

Mixed-effects models are not used as widely in biomedical laboratory studies as in many other scientific disciplines, which is a concern, as given the nature of the experimental work reported one would expect these models to be equally widely used and reported as they are elsewhere. This is most likley simply a matter of lack of knowledge and convention; if colleagues or peers do not routinely use these methods then why should I? By highlighting the issue and providing some guidance the hope is that this article may address the first of these issues. Journals and other interest groups (e.g. funding bodies and learned societies) have a part to play also, particularly in ensuring that work is reviewed by experienced and properly qualified statisticians at all stages from application to publication ( Masca et al., 2015 ).

Acknowledgements

This work is supported by the NIHR Statistics Group ( https://statistics-group.nihr.ac.uk/ ). NIHR had no role in the design and conduct of the study, or the decision to submit the work for publication.

Biographies

Nick R Parsons Warwick Medical School, University of Warwick, Coventry, United Kingdom

M Dawn Teare Sheffield School of Health and Related Research, University of Sheffield, Sheffield, United Kingdom

Alice J Sitch Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom

Simulation study: Demonstrating UoA issues

Consider a small hypothetical study that aims to compare outcomes from subjects randomly allocated to two contrasting treatment options, A and B. Samples were collected from subjects and detailed laboratory work undertaken to provide 24 outcome measurements for each of the two groups. For treatment group A, a measurement was obtained from 24 individual subjects; measurements for group A are known to be uncorrelated, i.e. independent of one another. However, for treatment group B no such information was available. How would the sampling strategy for group B impact on the analysis undertaken and how could it affect the interpretation of the results of the analysis?

Consider the following possibilities; (i) the sampling strategy used for treatment group B was the same as treatment group A (i.e. 24 independent samples), (ii) in group B 2 measurements were available from each of 12 subjects, (iii) 4 measurements were available from each of 6 subjects, (iv) 6 measurements were available from each of 4 subjects, (v) 8 measurements were available from each of 3 subjects and (vi) 12 measurements were available from each of 2 subjects.

Experience from previous studies suggests that the measurements made on the same individual subjects are likely to be positively correlated; i.e. if one measurement is large then the others will also be large, or conversely if one measurement is small others will also be small.

Assume for the ease of illustration that the measurements were Normally distributed, and of equal variance in each treatment group, and analyses were made using an independent samples t-test, at the 5% level. One key characteristic that is important here is the false positive rate (type I error rate); i.e. the probability of incorrectly rejecting the null hypothesis. Here the null hypothesis is that the sample mean from treatment groups A and B are the same. Figure 1(a) shows the type I error rates, based on 100000 simulations, for comparison of groups A and B, where the null hypothesis is known to be true, for scenarios (i) - (vi) for within subject correlations ρ = 0, ρ = 0.2, ρ = 0.5 and ρ = 0.8. If data within subjects are uncorrelated ( ρ = 0), then the type I error rate is maintained at the required 5% level over all scenarios (i) to (vi), and clearly in scenario (i), where there are 24 single samples in group B, it makes no sense to consider within subject correlations as there is only a single measurement for each subject, the type I error rate is controlled at the 5% level. Otherwise, as the number of subjects gets smaller (greater clustering) and the correlation within subjects gets larger, the type I error rate increases rapidly. In the extreme scenario where there are data from 2 subjects only, with a high correlation ( ρ = 0.8) the null hypothesis is incorrectly rejected approximately 45% of the time.

If grouped data are naively analysed, ignoring likely strong associations between measurements within the same group, it is very likely that incorrect inferences are made about differences between treatment groups.

If the true grouping structure in B were known, then how might this be properly accounted for in the analysis? One simple option to improve on the naive analysis, of assumed independence, is to randomly select a single value from each subject; this will control the type I error rate at the required level across all scenarios and correlations ( Figure 1b ), but will provide rather inefficient estimates of the treatment difference between groups ( Figure 1c ).

An alternative simple strategy is to calculate the within-subject means, this provides an unduly conservative (type I error rate  ≤ 5%) test ( Figure 1b ), as the true variability in the data is typically underestimated by using the subject means. However, the analysis based on subject means rather than randomly selected values provides more efficient estimates of the treatment difference between groups (Figure 1(c)), with the efficiency depending on the within subject correlation; as the correlation within subjects increases then the value of calculating a mean, in preference to selecting a single value for each subject, diminishes markedly.

Appendix 1—figure 1.

An external file that holds a picture, illustration, etc.
Object name is elife-32486-app1-fig1.jpg

( a ). The type I error rate can be controlled to the required level by randomly selecting a single measurement for each subject, ρ = 0 (black circle), ρ = 0.2 (red circle), ρ = 0.5 (blue circle) and ρ = 0.8 (green circle), or made conservative ( ≤ 5%) by taking the mean of the measurements for each subject, ρ = 0 (black open circle), ρ = 0.2 (red open circle), ρ = 0.5 (blue open circle) and ρ = 0.8 (green open circle) ( b ). The relative efficiency of treatment effect estimates declines as the number of clusters become smaller and is always higher for the mean than the randomly selected single measurement strategy ( c ). The scenarios (i) – (vi) are as described in the text.

Some fundamental principles of experimental design

Appendix 2—figure 1..

An external file that holds a picture, illustration, etc.
Object name is elife-32486-app2-fig1.jpg

Consider a putative study ( Figure 1 ), where n samples ( experimental units ) of material are available for experimentation. Interventions (A and B) are assigned to the experimental units and sub–samples collected for processing and incubation prior to final testing 48 hours later. The scientist undertaking the study has control over the sampling strategy and the design; e.g. how to allocate samples to A and B, whether to divide samples and how to split material between incubators and the testing procedures used for data collection. What are the key issues that they need to consider before proceeding to do the study?

  • If possible, always randomly assign interventions to experimental units. Randomization ensures, on average, that there is balance for unknown confounders between interventions
  • A confounder is a variable that is associated with both a response and explanatory variable, and consequently causes a spurious association between them. For example, if all samples for intervention A were stored in incubator 1 and all samples for B were stored in incubator 2, and the incubators were found to be operating at different temperatures, then are the observed effects on the outcome due to the interventions or the differences in temperature between incubators? We do not know, as the effects of the interventions and temperature (incubators) are fully confounded
  • If there are known confounding factors, it is always a good idea to modify the design to take account of these; e.g. by blocking
  • Blocking involves dividing experimental units into homogenous subgroups (at the start of the experiment) and allocating (randomizing) interventions to experimental units within blocks so that the numbers are balanced; e.g. interventions A and B are split equally between incubators.
  • Blocking a design to protect against any suspected (or unsuspected) effects on the outcomes caused by processing, storage or assessment procedures is always a good idea; e.g. if more than one individual performs assays, or more than one instrument is used then split interventions so as to obtain balance.
  • In general, it is always better to increase the number of sample experimental units than the number of sub–samples. Study power is directly driven by the number of experimental units n .
  • Increasing the number of sub-samples m helps to improve the precision of estimation of the sample effect and allows assay error to be assessed, but has only an indirect effect on study power. Usually there is little benefit to be gained by making m much greater than five.
  • If there are two interventions, then it is always best to divide experimental units equally between interventions. If the aim of an experiment is to compare multiple interventions to a standard or control intervention then it is to better to allocate more experimental units to the standard arm of the study. For example, if a third standard arm (S) were added to the study, in addition to A and B, then it would be better (optimal) to allocate samples in the ratio 2:1:1 to interventions S:A:B.
  • All others things being equal, a better design is obtained if the variances of the explanatory variables are increased, as this is likely to provide a larger effect on the study outcomes. For example, suppose A and B were doses of a drug and a higher dose of the drug resulted in a larger value of the primary study outcome. If the doses for A and B were set at the extremes of the normal range, then the effect on the primary outcome is likely to be much larger than if the doses were only marginally different.
  • If a number of design factors are used then try and make sure that they are independent (uncorrelated). For example, the current design has a single design factor comprising two doses of a drug (A and B). If a second design factor were added, e.g. intravenous (C) or oral delivery (D), then crossing the factors such that the experimental samples are split (evenly) between the four combination A.C, A.D, B.C and B.D provides the optimal design. The factors are independent; using the terminology of experimental design, they are orthogonal .

R code for examples

R is an open source statistical software package and programming language ( R Core Team, 2016 ; Ihaka and Gentleman, 1996 ) that is used extensively by statisticians across all areas of scientific research and beyond. The core capabilities of R can be further extended by user developed code packages for very specific methods or specialized tasks; many thousands of such packages exist and can be easily installed by the user from The Comprehensive R Archive Network (CRAN) ( CRAN, 2017 ) during an R session. Many excellent introductions to the basics of R are available online and from CRAN ( Venables et al., 2017 ), so here the focus is on usage for fitting the models described in the main text with notes on syntax and coding restricted to implementation of these only. A script is available at Parsons, 2017 to replicate all the analyses reproduced here.

The first dataset considered here is that for the adjuvant radiotherapy and lymph node size in colorectal cancer example. For small studies such as this, data can be entered manually into an R script file, by assigning individual observed data variables to a number of named vectors, using the <- operator, and combining together into a data frame (data.frame function), which is the simplest R object for storing a series of data fields which are associated together.

The factors define the design of the experiment, and are built using the rep function that allows structures to be replicated in a concise manner. The first 6 rows of the data frame LymphNode can be examined using the head function.

This is the standard rectanguler form that will be familiar to those who use other statistical software packages or spreadsheets for data storage. More generally data can be read (imported) into R from a wide range of data formats; for instance if data were laid out as above in a spreadsheet programme it could be saved in comma separated format (csv) (e.g. data.csv) and read into R using the following code LymphNode <- read.csv("data.csv"). Naive analysis of data LymphNode would be implemened using the t.test function

This is equivalent to fitting a linear regression model using the R linear model function lm, other than a change in the direction of the differencing of the group means. The R formula notation y ~ x symbolically expresses the model specification linking the response variable y to explanaory variable x; here the response variable is lymph node size LNsize and the explanatory variable is the radiotheraphy treatment RadioTherapy. A full report of the fitted model object mod can be seen using the summary(mod) function. For brevity, the full output is not shown here, but rather individual functions are used to display particular aspects of the fit; e.g. for coefficients coef(mod), confidence intervals confint(mod) and an analysis of variance table anova(mod).

Th analysis by subject proceeds by first calculating lymph node size means for each subject, LNsize.means, using the tapply and mean functions, prior to fitting the linear model, including the new RT.means factor. There is now no need to specify a data frame using the data argument to lm, as response and explanatory variables are newly created objects themselves, so R can find them without having to look within a data frame, as was the case for the previous model.

The linear mixed-effects package nlme must be installed before proceeding to model fitting. The model syntax for fitting these models is similar to standard linear models in most respects, with the addition of a random argument to describe the structure of the data. Full details of how to specify the model can be found in standard texts such as ( Pinheiro and Bates, 2000 ). Confidence intervals of fixed and random effects are provided using the intervals command.

Competing models can be compared using likelihood ratio tests.

Model fit can be explored using a range of diagnostic plots. For instance, standardized residuals versus fitted values by subject,

observed versus fitted values by subject,

box-plots of residuals by subject,

and quantile-quantile plots.

For the sake of exposition, creating an unbalanced dataset from the original LymphNode data is achieved by randomly removing some data values and re-fitting the mixed-effects model.

A subject-based analysis ignores the differences in precision of estimation of means between subjects.

The second dataset considered here is grouped binary data from the lymph node count example; NA indicates a missing value. For model fitting the non-missing data can be found using the subset and complete.cases functions.

Fitting a conventional logistic regression model to the data provides a naive analysis, with estimated coefficients that are log odds-ratios. The glm command indicates that a generalized linear model is fitted, with distributional properties identified using the family argument, which for binary data is canonically the binomial distribution with logit link function.

Fitting linear mixed-effects models for non-normal data requires the lme4 package. Model set-up and syntax for lme4 is similar to nlme; for details of implementation for lme4 see ( Bates et al., 2015 ) and the vignettes provided with the package.

Predictions for the fitted model can be obtained for new data using the predict function, here with no random effects included.

The standard errors of the radiotherapy effects for the conventional logistic regression and mixed-effects model are obtained from the variance-covariance matrices of the fitted model parameters using the vcov function.

Mathematical description of the naive analysis

The standard method of analysis for simple designed experiments is analysis of variance (ANOVA), which uses variability about mean values to assess significance, under an assumed approximate Normal distribution. Focussing on samples as experimental units, it is decided to collect m replicate measurements of an outcome y on each of T  ×  N samples, divided into T equally sized treatment groups. Indexing outcomes as y i j t , where i = 1, …,  N , j = 1, …,  m and t = 1, …,  T , the total sums-of-squares (deviations around the mean) which sumarises overall data variability is

where the overall (grand) mean is y ¯ … = 1 T ⁢ N ⁢ m ⁢ ∑ i ∑ j ∑ t y i ⁢ j ⁢ t . The Treatment sums-of-squares (SS) is that part of the variation due to the interventions and is given by

where the treatment means are given by y ¯ . . t = 1 N ⁢ m ⁢ ∑ i ∑ j y i ⁢ j ⁢ t . The residual or error SS is given by

and is such that SS T o t a l = SS T r e a t + SS E r r o r . This error SS can be partitioned into that between samples

and that within samples

where the sample means are given by y ¯ i . t = 1 m ⁢ ∑ j y i ⁢ j ⁢ t and SS E r r o r = SS E r r o r . S a m p l e s + SS E r r o r . W i t h i n . In a naive analysis, ignoring the sampling structure, significance between treatments is incorrectly assessed using an F-test of the ratio of the treatment mean-square MS T r e a t = SS T r e a t /( T - 1) to the error mean-square MS E r r o r = SS E r r o r / T ( N m - 1) on T - 1 and T ( N m - 1) degrees of freedom. However, the correct analysis is that which uses an F-test of the ratio of the treatment mean-square MS T r e a t to the between samples error mean-square MS E r r o r . S a m p l e s = SS E r r o r . S a m p l e s / T ( N - 1) on T - 1 and T ( N - 1) degrees of freedom.

This analysis uses the variability between samples only to assess the significance of the treatment effects. The naive analysis pools variability between and within samples and uses this to assess the treatment effects. The naive analysis is generally the default analysis obtained in the majority of statistics software, such as R, if the error structure is not specifically stated in the call to analysis of variance.

Funding Statement

The authors declare that there was no funding for this work.

Author contributions

Conceptualization, Writing—original draft, Writing—review and editing, Analysis and interpretation of data.

Competing interests

No competing interests declared.

  • Aarts E, Verhage M, Veenvliet JV, Dolan CV, van der Sluis S. A solution to dependency: using multilevel analysis to accommodate nested data. Nature Neuroscience. 2014; 17 :491–496. doi: 10.1038/nn.3648. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Academy of Medical Sciences Reproducibility and reliability of biomedical research. [6 December 2017]; 2017 https://acmedsci.ac.uk/policy/policy-projects/reproducibility-and-reliability-of-biomedical-research
  • Aho KA. Foundational and Applied Statistics for Biologists Using R. Boca Raton, Florida: CRC Press; 2014. [ Google Scholar ]
  • Altman DG, Bland JM. Statistics notes. Units of analysis. BMJ. 1997; 314 :1874. doi: 10.1136/bmj.314.7098.1874. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, Gøtzsche PC, Lang T, CONSORT GROUP (Consolidated Standards of Reporting Trials) The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Annals of Internal Medicine. 2001; 134 :663–694. doi: 10.7326/0003-4819-134-8-200104170-00012. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Altman DG. The time has come to register diagnostic and prognostic research. Clinical Chemistry. 2014; 60 :580–582. doi: 10.1373/clinchem.2013.220335. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. Journal of Statistical Software. 2015; 67 :1–48. doi: 10.18637/jss.v067.i01. [ CrossRef ] [ Google Scholar ]
  • Bouwmeester W, Twisk JW, Kappen TH, van Klei WA, Moons KG, Vergouwe Y. Prediction models for clustered data: comparison of a random intercept and standard regression model. BMC Medical Research Methodology. 2013; 13 :10. doi: 10.1186/1471-2288-13-19. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brown H, Prescott R. Applied Mixed Models in Medicine. Chichester: Wiley; 2015. [ Google Scholar ]
  • Bunce C, Patel KV, Xing W, Freemantle N, Doré CJ, Ophthalmic Statistics G, Ophthalmic Statistics Group Ophthalmic statistics note 1: unit of analysis. British Journal of Ophthalmology. 2014; 98 :408–412. doi: 10.1136/bjophthalmol-2013-304587. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bustin SA, Nolan T. Improving the reliability of peer-reviewed publications: we are all in it together. Biomolecular Detection and Quantification. 2016; 7 :A1–A5. doi: 10.1016/j.bdq.2015.11.002. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • CRAN The Comprehensive R Archive Network. 2017 https://cran.r-project.org/
  • Calhoun AW, Guyatt GH, Cabana MD, Lu D, Turner DA, Valentine S, Randolph AG. Addressing the unit of analysis in medical care studies: a systematic review. Medical Care. 2008; 46 :635–643. doi: 10.1097/MLR.0b013e3181649412. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chow S, Shao J, Wang H. Sample Size Calculations in Clinical Research. Boca Raton: Chapman and Hall; 2008. [ Google Scholar ]
  • Diggle PK, Heagerty P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. Oxford: Oxford University Press; 2013. [ Google Scholar ]
  • Divine GW, Brown JT, Frazier LM. The unit of analysis error in studies about physicians' patient care behavior. Journal of General Internal Medicine. 1992; 7 :623–629. doi: 10.1007/BF02599201. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fisher RA. XV. The correlation between relatives on the supposition of mendelian inheritance. Transactions of the Royal Society of Edinburgh. 1919; 52 :399–433. doi: 10.1017/S0080456800012163. [ CrossRef ] [ Google Scholar ]
  • Fleming PS, Koletsi D, Polychronopoulou A, Eliades T, Pandis N. Are clustering effects accounted for in statistical analysis in leading dental specialty journals? Journal of Dentistry. 2013; 41 :265–270. doi: 10.1016/j.jdent.2012.11.012. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fox J, Weisberg S, Fox J. An R Companion to Applied Regression. Thousand Oaks: SAGE Publications; 2011. [ Google Scholar ]
  • Galwey N. Introduction to Mixed Modelling: Beyond Regression and Analysis of Variance. Chichester: Wiley; 2014. [ Google Scholar ]
  • Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press; 2007. [ Google Scholar ]
  • Green P, MacLeod CJ. SIMR : an R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution. 2016; 7 :493–498. doi: 10.1111/2041-210X.12504. [ CrossRef ] [ Google Scholar ]
  • Hemming K, Girling AJ, Sitch AJ, Marsh J, Lilford RJ. Sample size calculations for cluster randomised controlled trials with a fixed number of clusters. BMC Medical Research Methodology. 2011; 11 :102. doi: 10.1186/1471-2288-11-102. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hosmer DW, Lemeshow S, Sturdivant RX. Applied Logistic Regression. Hoboken: Wiley; 2013. [ Google Scholar ]
  • Ihaka R, Gentleman R. R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996; 5 :299–314. [ Google Scholar ]
  • Ioannidis JP, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, Schulz KF, Tibshirani R. Increasing value and reducing waste in research design, conduct, and analysis. The Lancet. 2014; 383 :166–175. doi: 10.1016/S0140-6736(13)62227-8. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Johnson PC, Barry SJ, Ferguson HM, Müller P. Power analysis for generalized linear mixed models in ecology and evolution. Methods in Ecology and Evolution. 2015; 6 :133–142. doi: 10.1111/2041-210X.12306. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kilkenny C, Parsons N, Kadyszewski E, Festing MF, Cuthill IC, Fry D, Hutton J, Altman DG. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS One. 2009; 4 :e7824. doi: 10.1371/journal.pone.0007824. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biology. 2010; 8 :e1000412. doi: 10.1371/journal.pbio.1000412. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lazic SE. The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neuroscience. 2010; 11 :5. doi: 10.1186/1471-2202-11-5. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mangiafico SS. Summary and analysis of extension program evaluation in R: transforming data. [6 December 2017]; 2017 http://rcompanion.org/handbook/I_12.html
  • Masca NGD, Hensor EMA, Cornelius VR, Buffa FM, Marriott HM, Eales JM, Messenger MP, Anderson AE, Boot C, Bunce C, Goldin RD, Harris J, Hinchliffe RF, Junaid H, Kingston S, Martin-Ruiz C, Nelson CP, Peacock J, Seed PT, Shinkins B, Staples KJ, Toombs J, Wright AKA, Teare MD. RIPOSTE: a framework for improving the design and analysis of laboratory-based research. eLife. 2015; 4 :e05519. doi: 10.7554/eLife.05519. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McCullagh P, Nelder JA. Generalized Linear Models. Boca Raton: Chapman and Hall; 1998. [ Google Scholar ]
  • McNutt M. Journals unite for reproducibility. Science. 2014; 346 :679. doi: 10.1126/science.aaa1724. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mead R, Gilmour SG, Mead A. Statistical Principles for the Design of Experiments. Cambridge: Cambridge University Press; 2012. [ Google Scholar ]
  • NC3Rs EDA: experimental design assistant. [6 December 2017]; 2017 https://eda.nc3rs.org.uk
  • Parsons NR. R code for unit of analysis manuscript. 357fe1f GitHub. 2017 https://github.com/AstroHerring/UoAManuscript
  • Pinheiro JC, Bates DM. Mixed-Effects Models in S and S-PLUS. New York: Springer; 2000. [ Google Scholar ]
  • Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team 3.1-127 Nlme: Linear and Nonlinear Mixed Effects Models. 2016
  • R Core Team . R: A Language and Environment for Statistical Computing. Vienna, Austria. : R Foundation for Statistical Computing; 2016. https://www.R-project.org [ Google Scholar ]
  • Snijders TAB, Bosker RJ. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. Los Angeles: Sage; 2012. [ Google Scholar ]
  • Venables WN, Smith DM, Team RDC. An introduction to R. version 3.4.1. [6 December 2017]; 2017 https://cran.r-project.org/doc/manuals/R-intro.pdf
  • eLife. 2018; 7: e32486.

Decision letter

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Unit of analysis issues continue to be a cause of concern in reporting of laboratory-based research" for consideration by eLife . Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Mark Jit as the Reviewing Editor and Peter Rodgers as the eLife Features Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Jenny Barrett (Reviewer #2); Chris Jones (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The reviewers and editors were in agreement on the value of the concept and approach of the manuscript. There were a large number of issues that we felt needed to be addressed, but we do not believe that any of them will take a long time to complete.

The tutorial describes issues related to non-independence in data from laboratory and other experiments and further show how they may be overcome, both in a simple way (using subject-level averages) and a more comprehensive way (using mixed models). This is a common problem, and the paper does a good job of both explaining it and giving researchers the tools to deal with it. Its utility is greatly enhanced by very clear detailed illustrative examples and R code to carry out the analyses discussed.

The current title indicates that the paper is going to show that "Unit of analysis issues continue to be a cause of concern in reporting of laboratory-based research", but that is not what the paper does. Rather, the paper provides guidelines on how to understand the concept of "Unit of analysis" and analyse experiments appropriately. The title should be changed to reflect this.

Essential revisions:

Currently the article contains no guidance on sample size calculation for either the "simple" analyses or the more complex analyses. Nor does it contain any guidance on minimal sample size for the modelling methods suggested. Some comments on sample size and power would be valuable as these are issues that are often neglected by lab scientists. It would also be useful for anyone considering more complex analyses to have an idea of the minimum sample size that can realistically be used to fit the models.

Subsection “Design”. Different designs. Please include some examples of experiments for each situation, as this would make it easier for lab scientists to recognise their type of sample in this list. The example of groups of subjects seems to refer to situations where interest is in the group itself. A common situation instead is where interest is on the effect of treatment on an individual (the experimental unit), but the individuals happen to be grouped (correlated), and it could be useful to clarify this distinction. For example, in laboratory studies the samples may have been analysed in different batches.

Appendix 2 in its current form may not be very helpful or informative to the majority of readers. It does not really explain how to choose among alternative designs, and the equations are likely to be forbidding to non-statisticians. While there are no space limitations in eLife , it should be rewritten to focus on the design issues: when should you get more measurements per subject, vs. more subjects? What good are such within-subject replicates (e.g. small improvements in precision, but particularly to be able to measure assay error). It would also benefit from a box summarising what it is showing in a couple of simple sentences, so people that can't get through the equations can at least understand the point it is making.

The code in Appendix 3 is very helpful, but it is difficult to read in its present form. We recommend publishing it in text form using indentation, colours, and explanatory text interspersed with the sections of code to explain it. Ideally, it should be written as a tutorial (with portions of text and code interspersed).
It would also be good to show how the data for each of the examples is structured within a database – i.e. with variables representing the individual, clustering, groupings etc. Lab scientists are generally less familiar with how data is entered/stored in databases/stats software, and they may be familiar with GraphPad Prism, which accepts data in very different formats to the standard format required for the analyses presented in this paper. Appendix 3 could be expanded to include the data frames next to the R code (at the start of each example).

Author response

Title: The current title indicates that the paper is going to show that "Unit of analysis issues continue to be a cause of concern in reporting of laboratory-based research", but that is not what the paper does. Rather, the paper provides guidelines on how to understand the concept of "Unit of analysis" and analyse experiments appropriately. The title should be changed to reflect this.

Title changed to “Unit of analysis issues in laboratory based research: a review of concepts and guidance on study design and reporting”.

Essential revisions: Currently the article contains no guidance on sample size calculation for either the "simple" analyses or the more complex analyses. Nor does it contain any guidance on minimal sample size for the modelling methods suggested. Some comments on sample size and power would be valuable as these are issues that are often neglected by lab scientists. It would also be useful for anyone considering more complex analyses to have an idea of the minimum sample size that can realistically be used to fit the models.

A new subsection has been added, after the ‘Analysis’ subsection, that discusses sample size estimation from initially a very simple design, to more complex GLMMs via simulation.

Simple examples have been added to the design types in the subsection “Design”. The ‘Groups of subjects’ example has been expanded to cover the kind of ‘batch-effects’ identified by the reviewer.

Appendix 2 in its current form may not be very helpful or informative to the majority of readers. It does not really explain how to choose among alternative designs, and the equations are likely to be forbidding to non-statisticians. While there are no space limitations in eLife, it should be rewritten to focus on the design issues: when should you get more measurements per subject, vs. more subjects? What good are such within-subject replicates (e.g. small improvements in precision, but particularly to be able to measure assay error). It would also benefit from a box summarising what it is showing in a couple of simple sentences, so people that can't get through the equations can at least understand the point it is making.

Appendix 2 has been modified to discuss fundamental design issues for a putative example experiment. It now focuses more on design issues, and uses less mathematical language that should be more accessible to readers of eLife . The mathematical details of the (incorrect) naïve analysis has been moved to a separate new appendix (Appendix 4).

Appendix 3 (R code for examples) has been completely revised and re-written along the lines suggested here. It is now written in the style of a tutorial with code indented and coloured to distinguish it from the main text. R output is also now provided to help those wishing to check exactly what would be produced if the code were pasted directly into R.

We agree that the data entry in the previous example R code was not realistic. Appendix 3 now explicitly shows the format of the data in R. A note is also added to explain how data would normally be entered using the read statement that will import data into R from standard spreadsheets or databases.

Realtor.com Economic Research

  • Data library

2024 Housing Market Forecast and Predictions: Housing Affordability Finally Begins to Turnaround

Danielle Hale

As we look ahead to 2024 , we see a mix of continuity and change in both the housing market and economy. Against a backdrop of modest economic growth, slightly higher unemployment, and easing inflation longer term interest rates including mortgage rates begin a slow retreat. The shift from climbing to falling mortgage rates improves housing affordability, but saps some of the urgency home shoppers had previously sensed. Less frenzied housing demand and plenty of rental home options keep home sales relatively stable at low levels in 2024, helping home prices to adjust slightly lower even as the number of for-sale homes continues to dwindle. 

Realtor.com ® 2024 Forecast for Key Housing Indicators

what is unit of analysis in research

Home Prices Dip, Improving Affordability

Home prices grew at a double-digit annual clip for the better part of two years spanning the second half of 2020 through 2022, a notable burst following a growing streak that spanned back to 2012. As mortgage rates climbed, home price growth flatlined, actually declining on an annual basis in early 2023 before an early-year dip in mortgage rates spurred enough buyer demand to reignite competition for still-limited inventory. Home prices began to climb again, and while they did not reach a new monthly peak, on average for the year we expect that the 2023 median home price will slightly exceed the 2022 annual median.

Nevertheless, even during the brief period when prices eased, using a mortgage to buy a home remained expensive. Since May 2022, purchasing the typical for-sale home listing at the prevailing rate for a 30-year fixed-rate mortgage with a 20% down payment meant forking over a quarter or more of the typical household paycheck. In fact, in October 2023, it required 39% of the typical household income and this share is expected to average 36.7% for the full calendar year in 2023. This figure has typically ranged around 21%, so it is well above historical average. We expect that the return to pricing in line with financing costs will begin in 2024, and home prices, mortgage rates, and income growth will each contribute to the improvement. Home prices are expected to ease slightly, dropping less than 2% for the year on average. Combined with lower mortgage rates and income growth this will improve the home purchase mortgage payment share relative to median income to an average 34.9% in 2024, with the share slipping under 30% by the end of the year.

what is unit of analysis in research

Home Sales Barely Budge Above 2023’s Likely Record Low

After soaring during the pandemic, existing home sales were weighed down in the latter half of 2022 as mortgage rates took off, climbing from just over 3% at the start of the year to a peak of more than 7% in the fourth quarter. The reprieve in mortgage rates in early 2023, when they dipped to around 6%, brought some life to home sales, but the renewed climb of mortgage rates has again exerted significant pressure on home sales that is exacerbated by the fact that a greater than usual number of households bought homes over the past few years, and despite stories of pandemic purchase regret , for the most part, these homeowners continue to be happy in their homes. 

This is consistent with what visitors to Realtor.com report when asked why they are not planning to sell their homes. The number one reason homeowners aren’t trying to sell is that they just don’t need to; concern about losing an existing low-rate mortgage is the top financial concern cited. Our current projection is for 2023 home sales to tally just over 4 million, a dip of 19% over the 2022 5 million total. 

existing_sales_yearly

With many of the same forces at play heading into 2024, the housing chill will continue, with sales expected to remain essentially unchanged at just over 4 million. Although mortgage rates are expected to ease throughout the course of the year, the continuation of high costs will mean that existing homeowners will have a very high threshold for deciding to move, with many likely choosing to stay in place.  Moves of necessity–for job changes, family situation changes, and downsizing to a more affordable market–are likely to drive home sales in 2024. 

what is unit of analysis in research

Shoppers Find Even Fewer Existing Homes For Sale

Even before the pandemic, housing inventory was on a long, slow downward trajectory. Insufficient building meant that the supply of houses did not keep up with household formation and left little slack in the housing market. Both homeowner and rental vacancy remain below historic averages . In contrast with the existing home market, which remains sluggish, builders have been catching up, with construction remaining near pre-pandemic highs for single-family and hitting record levels for multi-family . 

what is unit of analysis in research

Despite this, the lack of excess capacity in housing has been painfully obvious in the for-sale home market. The number of existing homes on the market has dwindled. With home sales activity to continue at a relatively low pace, the number of unsold homes on the market is also expected to remain low.  Although mortgage rates are expected to begin to ease, they are expected to exceed 6.5% for the calendar year. This means that the lock-in effect, in which the gap between market mortgage rates and the mortgage rates existing homeowners enjoy on their outstanding mortgage, will remain a factor. Roughly two-thirds of outstanding mortgages have a rate under 4% and more than 90% have a rate less than 6%.

what is unit of analysis in research

Rental Supply Outpaces Demand to Drive Mild Further Decline in Rents

After almost a full year of double-digit rent growth between mid-2021 and mid-2022, the rental market has finally cooled down, as evidenced by the year-over-year decline that started in May 2023 . In 2024, we expect the rental market will closely resemble the dynamics witnessed in 2023, as the tug of war between supply and demand results in a mild annual decline of -0.2% in the median asking rent.

what is unit of analysis in research

New multi-family supply will continue to be a key element shaping the 2024 rental market.  In the third quarter of 2023, the annual pace of newly completed multi-family homes stood at 385,000 units. Although absorption rates remained elevated in the second quarter, especially at lower price points, the rental vacancy rate ticked up to 6.6% in the third quarter. This uptick in rental vacancy suggests the recent supply has outpaced demand, but context is important. After recent gains, the rental vacancy rate is on par with its level right before the onset of the pandemic in early 2020, still below its 7.2% average from the 2013 to 2019 period.  Looking ahead, the strong construction pipeline– which hit a record high for units under construction this summer –is expected to continue fueling rental supply growth in 2024 pushing rental vacancy back toward its long-run average. 

While the surge in new multi-family supply gives renters options, the sheer number of renters will minimize the potential price impact. The median asking rent in 2024 is expected to drop only slightly below its 2023 level. Renting is expected to continue to be a more budget friendly option than buying in the vast majority of markets, even though home prices and mortgage rates are both expected to dip, helping pull the purchase market down slightly from record unaffordability. 

Young adult renters who lack the benefit of historically high home equity to tap into for a home purchase will continue to find the housing market challenging. Specifically, as many Millennials age past first-time home buying age and more Gen Z approach these years, the current housing landscape is likely to keep these households in the rental market for a longer period as they work to save up more money for the growing down payment needed to buy a first home. This trend is expected to sustain robust demand for rental properties. Consequently, we anticipate that rental markets favored by young adults , a list which includes a mix of affordable areas and tech-heavy job markets in the South, Midwest, and West, will be rental markets to watch in 2024.

Key Wildcards:

  • Wildcard 1: Mortgage Rates With both mortgage rates and home prices expected to turn the corner in 2024, record high unaffordability will become a thing of the past, though as noted above, the return to normal won’t be accomplished within the year. This prediction hinges on the expectation that inflation will continue to subside, enabling the recent declines in longer-term interest rates to continue. If inflation were to instead see a surprise resurgence, this aspect of the forecast would change, and home sales could slip lower instead of steadying.
  • Wildcard 2: Geopolitics In our forecast for 2023 , we cited the risk of geopolitical instability on trade and energy costs as something to watch. In addition to Russia’s ongoing war in Ukraine, instability in the Middle East has not only had a catastrophic human toll, both conflicts have the potential to impact the economic outlook in ways that cannot be fully anticipated. 
  • Wildcard 3: Domestic Politics: 2024 Elections In 2020, amid the upheaval of pandemic-era adaptations, many Americans were on the move. We noted that Realtor.com traffic patterns indicated that home shoppers in very traditionally ‘blue’ or Democratic areas were tending to look for homes in markets where voters have more typically voted ‘red’ or Republican. While consumers also reported preferring to live in locations where their political views align with the majority , few actually reported wanting to move for this reason alone. 

Housing Perspectives:

What will the market be like for homebuyers, especially first-time homebuyers.

First-time homebuyers will continue to face a challenging housing market in 2024, but there are some green shoots. The record-high share of income required to purchase the median priced home is expected to begin to decline as mortgage rates ease, home prices soften, and incomes grow. In 2023 we expect that for the year as a whole, the monthly cost of financing the typical for-sale home will average more than $2,240, a nearly 20% increase over the mortgage payment in 2022, and roughly double the typical payment for buyers in 2020. This amounted to a whopping nearly 37% of the typical household income. In 2024 as modest price declines take hold and mortgage rates dip, the typical purchase cost is expected to slip just under $2,200 which would amount to nearly 35% of income. While far higher than historically average, this is a significant first step in a buyer-friendly direction.

How can homebuyers prepare? 

Homebuyers can prepare for this year’s housing market by getting financially ready. Buyers can use a home affordability calculator , like this one at Realtor.com to translate their income and savings into a home price range. And shoppers can pressure test the results by using a mortgage calculator to consider different down payment, price, and loan scenarios to see how their monthly costs would be impacted. Working with a lender can help potential buyers explore different loan products such as FHA or VA loans that may offer lower mortgage interest rates or more flexible credit criteria. 

Although prices are anticipated to fall in 2024, housing costs remain high, and a down payment can be a big obstacle for buyers. Recent research shows that the typical down payment on a home reached a record high of $30,000 .  To make it easier to cobble together a down payment, shoppers can access information about down payment assistance options at Realtor.com/fairhousing and in the monthly payment section of home listing pages. Furthermore, home shoppers can explore loan products geared toward helping families access homeownership by enabling down payments as low as 3.5% in the case of FHA loans and 0% in the case of VA loans .

What will the market be like for home sellers?

Home sellers are likely to face more competition from builders than from other sellers in 2024. Because builders are continuing to maintain supply and increasingly adapting to market conditions, they are increasingly focused on lower-priced homes and willing to make price adjustments when needed. As a result, potential sellers will want to consider the landscape for new construction housing in their markets and any implications for pricing and marketing before listing their home for sale.

What will the market be like for renters?

In 2024, renting is expected to continue to be a more cost-effective option than buying in the short term even though we anticipate the advantage for renting to diminish as home prices and mortgage rates decline. 

However, for those considering the pursuit of long-term equity through homeownership, it’s essential to not only stay alert about market trends but also to carefully consider the intended duration of residence in their next home. When home prices rise rapidly, like they did during the pandemic, the higher cost of purchasing a home may break even with the cost of renting in as little as 3 years. Generally, it takes longer to reach the breakeven point, typically within a 5 to 7-year timeframe. Importantly, when home prices are falling and rents are also declining, as is expected to be the case in 2024, it can take longer to recoup some of the higher costs of buying a home. Individuals using Realtor.com’s Rent vs. Buy Calculator can thoroughly evaluate the costs and benefits associated with renting versus buying over time and how many years current market trends suggest it will take before buying is the better financial decision. This comprehensive tool can provide insights tailored to a household’s specific rent versus buying decision and empowers consumers to consider not only the optimal choice for the current month but also how the trade-offs evolve over several years.

Local Market Predictions:

All real estate is local and while the national trends are instructive, what matters most is what’s expected in your local market. 

Sign up for updates

Join our mailing list to receive the latest data and research.

COMMENTS

  1. Unit of Analysis: Definition, Types & Examples

    Unit of Analysis: Definition, Types & Examples. The unit of analysis is the people or things whose qualities will be measured. The unit of analysis is an essential part of a research project. It's the main thing that a researcher looks at in his research. A unit of analysis is the object about which you hope to have something to say at the ...

  2. What is a Unit of Analysis? Overview & Examples

    Learn what a unit of analysis is, how to choose it, and why it matters in research. A unit of analysis is the smallest unit a researcher can use to identify and describe a phenomenon, such as individuals, groups, or artifacts.

  3. The Unit of Analysis Explained

    Learn what the unit of analysis is and how it affects your research project or study. Find out the different types of units of analysis and how to choose the appropriate one based on your research question and data.

  4. Unit of Analysis: Definition, Types & Examples

    Learn what a unit of analysis is and how to choose the right one for your research project. Explore the four main types of units (individuals, groups, artifacts, and geographical units) and see examples of each.

  5. Unit of analysis

    The unit of analysis is the entity that frames what is being looked at in a study, or is the entity being studied as a whole. In social science research, at the macro level, the most commonly referenced unit of analysis, considered to be a society is the state (polity) (i.e. country). At meso level, common units of observation include groups, organizations, and institutions, and at micro level ...

  6. 4.4 Units of Analysis and Units of Observation

    Learn how to define and distinguish units of analysis and observation in social science research. See examples of different units of analysis and observation for a hypothetical study of students' cell phone addictions.

  7. 2.1: Unit of Analysis

    The unit of analysis refers to the person, collective, or object that is the target of the investigation. Typical unit of analysis include individuals, groups, organizations, countries, technologies, objects, and such. For instance, if we are interested in studying people's shopping behavior, their learning outcomes, or their attitudes to new ...

  8. Units of Analysis and Methodologies for Qualitative Studies

    Units of Analysis and Methodologies for Qualitative Studies. Identifying & Planning Research. Jan 29, 2024. By Janet Salmons, PhD Manager, Sage Research Methods Community. Selecting the methodology is an essential piece of research design. This post is excerpted and adapted from Chapter 2 of Doing Qualitative Research Online (2022).

  9. Unit of Analysis

    Learn what the unit of analysis is and why it matters for different types of research. The unit of analysis is the entity that you analyze in your study, such as individuals, groups, or artifacts, and it depends on the type of analysis you do.

  10. 6.1. Units of Analysis

    The unit of analysis refers to the person, collective, or object that you are focusing on and want to learn about through your research. As depicted in Figure 6.1, your unit of analysis would be the type of entity (say, an individual) you're interested in. Your sample would be a group of such entities (say, a group of individuals you survey ...

  11. Unravelling the "Unit of Analysis": A Comprehensive Guide to the 5 Key

    Comprehending the Unit of Analysis is crucial, as it establishes the foundation for consequent stages for the research process. Unit of Analysis. The Unit of Analysis is a pivotal concept in the realm of research and data collection. In layman's terms, it refers to the primary entity or subject under observation or study in any research endeavor.

  12. Qualitative Data Analysis: The Unit of Analysis

    The unit of analysis is the portion of content that guides the coding process in qualitative research. Learn how to choose a suitable unit of analysis that retains the context and meaning of the data.

  13. 7.3 Unit of analysis and unit of observation

    A unit of analysis is the entity that you wish to be able to say something about at the end of your study, probably what you'd consider to be the main focus of your study. A unit of observation is the item (or items) that you actually observe, measure, or collect in the course of trying to learn something about your unit of analysis. In a ...

  14. Choosing the Right Unit of Analysis for Your Research Project

    The unit of analysis in research refers to the level at which data is collected and analyzed. It is essential for researchers to understand the different types of units of analysis, as well as their significance in shaping the research process and outcomes. Definition and Importance. With resonio, the unit of analysis you choose lays the ...

  15. What is the Unit of Analysis in a Review?

    The unit of analysis in a review is the study, not the article. Learn how to manage multiple reports of the same study and avoid bias in your literature review.

  16. What is the Unit of Analysis in a Review?

    The rationale for using studies as the unit of analysis is two-fold: First, we can only include the same study sample once in a review. Including more than one article from the same study in a review, treating each article as a separate study, introduces bias into the review. That particular sample would be given undue weight in the synthesis ...

  17. Data & Statistics for Journalists: Unit of Analysis

    Unit of Analysis. The unit of analysis is the entity that you're analyzing. It's called this because it's your analysis (what you want to examine) that determines what this unit is, rather than the data itself. For instance, let's say that you have a dataset with 40 students, divided between two classrooms of 20 students each, and a test score ...

  18. 7.3: Unit of analysis and unit of observation

    A unit of analysis is the item you wish to be able to say something about at the end of your study while a unit of observation is the item that you actually observe. When researchers confuse their units of analysis and observation, they may be prone to committing either the ecological fallacy or reductionism.

  19. What is a unit of analysis?

    A unit of analysis is the main subject or entity whom the researcher intends to comment on in the study. It is mainly determined by the research question. Simply put, the unit of analysis is basically the 'who' or what' that the researcher is interested in analyzing. For instance, an individual a group, organization, country, social ...

  20. 7.3 Unit of analysis and unit of observation

    A unit of analysis is the entity that you wish to say something about at the end of your study, and it is considered the focus of your study. A unit of observation is the item (or items) that you observe, measure, or collect while trying to learn something about your unit of analysis. In some studies, the unit of observation may be the same as ...

  21. The unit of analysis in learning research: Approaches for imagining a

    The unit of analysis is a central piece in any methodology - it determines the object of inquiry. In the growingly diverse landscape of learning research, being clear about our units of analysis becomes a matter of utmost importance.

  22. Unit of analysis issues in laboratory-based research

    Design. The experimental unit should always be identified and taken into account when designing a research study. If a study is assessing the effect of an intervention delivered to groups rather than individuals then the design must address the issue of clustering; this is common in many health studies where a number of subjects may receive an intervention in a group setting or in animal ...

  23. (PDF) Unit of observation versus unit of analysis

    In research, there is a unit of observation. ... The unit of analysis is described as the group about which information is collected and analyzed and the unit of observation consists of the groups ...

  24. 2024 Housing Market Predictions and Forecast

    In 2024 as modest price declines take hold and mortgage rates dip, the typical purchase cost is expected to slip just under $2,200 which would amount to nearly 35% of income. While far higher than ...