Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Data Collection | Definition, Methods & Examples

Data Collection | Definition, Methods & Examples

Published on June 5, 2020 by Pritha Bhandari . Revised on June 21, 2023.

Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem .

While methods and aims may differ between fields, the overall process of data collection remains largely the same. Before you begin collecting data, you need to consider:

  • The  aim of the research
  • The type of data that you will collect
  • The methods and procedures you will use to collect, store, and process the data

To collect high-quality data that is relevant to your purposes, follow these four steps.

Table of contents

Step 1: define the aim of your research, step 2: choose your data collection method, step 3: plan your data collection procedures, step 4: collect the data, other interesting articles, frequently asked questions about data collection.

Before you start the process of data collection, you need to identify exactly what you want to achieve. You can start by writing a problem statement : what is the practical or scientific issue that you want to address and why does it matter?

Next, formulate one or more research questions that precisely define what you want to find out. Depending on your research questions, you might need to collect quantitative or qualitative data :

  • Quantitative data is expressed in numbers and graphs and is analyzed through statistical methods .
  • Qualitative data is expressed in words and analyzed through interpretations and categorizations.

If your aim is to test a hypothesis , measure something precisely, or gain large-scale statistical insights, collect quantitative data. If your aim is to explore ideas, understand experiences, or gain detailed insights into a specific context, collect qualitative data. If you have several aims, you can use a mixed methods approach that collects both types of data.

  • Your first aim is to assess whether there are significant differences in perceptions of managers across different departments and office locations.
  • Your second aim is to gather meaningful feedback from employees to explore new ideas for how managers can improve.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Based on the data you want to collect, decide which method is best suited for your research.

  • Experimental research is primarily a quantitative method.
  • Interviews , focus groups , and ethnographies are qualitative methods.
  • Surveys , observations, archival research and secondary data collection can be quantitative or qualitative methods.

Carefully consider what method you will use to gather data that helps you directly answer your research questions.

When you know which method(s) you are using, you need to plan exactly how you will implement them. What procedures will you follow to make accurate observations or measurements of the variables you are interested in?

For instance, if you’re conducting surveys or interviews, decide what form the questions will take; if you’re conducting an experiment, make decisions about your experimental design (e.g., determine inclusion and exclusion criteria ).

Operationalization

Sometimes your variables can be measured directly: for example, you can collect data on the average age of employees simply by asking for dates of birth. However, often you’ll be interested in collecting data on more abstract concepts or variables that can’t be directly observed.

Operationalization means turning abstract conceptual ideas into measurable observations. When planning how you will collect data, you need to translate the conceptual definition of what you want to study into the operational definition of what you will actually measure.

  • You ask managers to rate their own leadership skills on 5-point scales assessing the ability to delegate, decisiveness and dependability.
  • You ask their direct employees to provide anonymous feedback on the managers regarding the same topics.

You may need to develop a sampling plan to obtain data systematically. This involves defining a population , the group you want to draw conclusions about, and a sample, the group you will actually collect data from.

Your sampling method will determine how you recruit participants or obtain measurements for your study. To decide on a sampling method you will need to consider factors like the required sample size, accessibility of the sample, and timeframe of the data collection.

Standardizing procedures

If multiple researchers are involved, write a detailed manual to standardize data collection procedures in your study.

This means laying out specific step-by-step instructions so that everyone in your research team collects data in a consistent way – for example, by conducting experiments under the same conditions and using objective criteria to record and categorize observations. This helps you avoid common research biases like omitted variable bias or information bias .

This helps ensure the reliability of your data, and you can also use it to replicate the study in the future.

Creating a data management plan

Before beginning data collection, you should also decide how you will organize and store your data.

  • If you are collecting data from people, you will likely need to anonymize and safeguard the data to prevent leaks of sensitive information (e.g. names or identity numbers).
  • If you are collecting data via interviews or pencil-and-paper formats, you will need to perform transcriptions or data entry in systematic ways to minimize distortion.
  • You can prevent loss of data by having an organization system that is routinely backed up.

Finally, you can implement your chosen methods to measure or observe the variables you are interested in.

The closed-ended questions ask participants to rate their manager’s leadership skills on scales from 1–5. The data produced is numerical and can be statistically analyzed for averages and patterns.

To ensure that high quality data is recorded in a systematic way, here are some best practices:

  • Record all relevant information as and when you obtain data. For example, note down whether or how lab equipment is recalibrated during an experimental study.
  • Double-check manual data entry for errors.
  • If you collect quantitative data, you can assess the reliability and validity to get an indication of your data quality.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

methods of data collection on research

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

When conducting research, collecting original data has significant advantages:

  • You can tailor data collection to your specific research aims (e.g. understanding the needs of your consumers or user testing your website)
  • You can control and standardize the process for high reliability and validity (e.g. choosing appropriate measurements and sampling methods )

However, there are also some drawbacks: data collection can be time-consuming, labor-intensive and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 21). Data Collection | Definition, Methods & Examples. Scribbr. Retrieved March 30, 2024, from https://www.scribbr.com/methodology/data-collection/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, qualitative vs. quantitative research | differences, examples & methods, sampling methods | types, techniques & examples, what is your plagiarism score.

Logo for Open Educational Resources

Chapter 10. Introduction to Data Collection Techniques

Introduction.

Now that we have discussed various aspects of qualitative research, we can begin to collect data. This chapter serves as a bridge between the first half and second half of this textbook (and perhaps your course) by introducing techniques of data collection. You’ve already been introduced to some of this because qualitative research is often characterized by the form of data collection; for example, an ethnographic study is one that employs primarily observational data collection for the purpose of documenting and presenting a particular culture or ethnos. Thus, some of this chapter will operate as a review of material already covered, but we will be approaching it from the data-collection side rather than the tradition-of-inquiry side we explored in chapters 2 and 4.

Revisiting Approaches

There are four primary techniques of data collection used in qualitative research: interviews, focus groups, observations, and document review. [1] There are other available techniques, such as visual analysis (e.g., photo elicitation) and biography (e.g., autoethnography) that are sometimes used independently or supplementarily to one of the main forms. Not to confuse you unduly, but these various data collection techniques are employed differently by different qualitative research traditions so that sometimes the technique and the tradition become inextricably entwined. This is largely the case with observations and ethnography. The ethnographic tradition is fundamentally based on observational techniques. At the same time, traditions other than ethnography also employ observational techniques, so it is worthwhile thinking of “tradition” and “technique” separately (see figure 10.1).

Figure 10.1. Data Collection Techniques

Each of these data collection techniques will be the subject of its own chapter in the second half of this textbook. This chapter serves as an orienting overview and as the bridge between the conceptual/design portion of qualitative research and the actual practice of conducting qualitative research.

Overview of the Four Primary Approaches

Interviews are at the heart of qualitative research. Returning to epistemological foundations, it is during the interview that the researcher truly opens herself to hearing what others have to say, encouraging her interview subjects to reflect deeply on the meanings and values they hold. Interviews are used in almost every qualitative tradition but are particularly salient in phenomenological studies, studies seeking to understand the meaning of people’s lived experiences.

Focus groups can be seen as a type of interview, one in which a group of persons (ideally between five and twelve) is asked a series of questions focused on a particular topic or subject. They are sometimes used as the primary form of data collection, especially outside academic research. For example, businesses often employ focus groups to determine if a particular product is likely to sell. Among qualitative researchers, it is often used in conjunction with any other primary data collection technique as a form of “triangulation,” or a way of increasing the reliability of the study by getting at the object of study from multiple directions. [2] Some traditions, such as feminist approaches, also see the focus group as an important “consciousness-raising” tool.

If interviews are at the heart of qualitative research, observations are its lifeblood. Researchers who are more interested in the practices and behaviors of people than what they think or who are trying to understand the parameters of an organizational culture rely on observations as their primary form of data collection. The notes they make “in the field” (either during observations or afterward) form the “data” that will be analyzed. Ethnographers, those seeking to describe a particular ethnos, or culture, believe that observations are more reliable guides to that culture than what people have to say about it. Observations are thus the primary form of data collection for ethnographers, albeit often supplemented with in-depth interviews.

Some would say that these three—interviews, focus groups, and observations—are really the foundational techniques of data collection. They are far and away the three techniques most frequently used separately, in conjunction with one another, and even sometimes in mixed methods qualitative/quantitative studies. Document review, either as a form of content analysis or separately, however, is an important addition to the qualitative researcher’s toolkit and should not be overlooked (figure 10.1). Although it is rare for a qualitative researcher to make document review their primary or sole form of data collection, including documents in the research design can help expand the reach and the reliability of a study. Document review can take many forms, from historical and archival research, in which the researcher pieces together a narrative of the past by finding and analyzing a variety of “documents” and records (including photographs and physical artifacts), to analyses of contemporary media content, as in the case of compiling and coding blog posts or other online commentaries, and content analysis that identifies and describes communicative aspects of media or documents.

methods of data collection on research

In addition to these four major techniques, there are a host of emerging and incidental data collection techniques, from photo elicitation or photo voice, in which respondents are asked to comment upon a photograph or image (particularly useful as a supplement to interviews when the respondents are hesitant or unable to answer direct questions), to autoethnographies, in which the researcher uses his own position and life to increase our understanding about a phenomenon and its historical and social context.

Taken together, these techniques provide a wide range of practices and tools with which to discover the world. They are particularly suited to addressing the questions that qualitative researchers ask—questions about how things happen and why people act the way they do, given particular social contexts and shared meanings about the world (chapter 4).

Triangulation and Mixed Methods

Because the researcher plays such a large and nonneutral role in qualitative research, one that requires constant reflectivity and awareness (chapter 6), there is a constant need to reassure her audience that the results she finds are reliable. Quantitative researchers can point to any number of measures of statistical significance to reassure their audiences, but qualitative researchers do not have math to hide behind. And she will also want to reassure herself that what she is hearing in her interviews or observing in the field is a true reflection of what is going on (or as “true” as possible, given the problem that the world is as large and varied as the elephant; see chapter 3). For those reasons, it is common for researchers to employ more than one data collection technique or to include multiple and comparative populations, settings, and samples in the research design (chapter 2). A single set of interviews or initial comparison of focus groups might be conceived as a “pilot study” from which to launch the actual study. Undergraduate students working on a research project might be advised to think about their projects in this way as well. You are simply not going to have enough time or resources as an undergraduate to construct and complete a successful qualitative research project, but you may be able to tackle a pilot study. Graduate students also need to think about the amount of time and resources they have for completing a full study. Masters-level students, or students who have one year or less in which to complete a program, should probably consider their study as an initial exploratory pilot. PhD candidates might have the time and resources to devote to the type of triangulated, multifaceted research design called for by the research question.

We call the use of multiple qualitative methods of data collection and the inclusion of multiple and comparative populations and settings “triangulation.” Using different data collection methods allows us to check the consistency of our findings. For example, a study of the vaccine hesitant might include a set of interviews with vaccine-hesitant people and a focus group of the same and a content analysis of online comments about a vaccine mandate. By employing all three methods, we can be more confident of our interpretations from the interviews alone (especially if we are hearing the same thing throughout; if we are not, then this is a good sign that we need to push a little further to find out what is really going on). [3] Methodological triangulation is an important tool for increasing the reliability of our findings and the overall success of our research.

Methodological triangulation should not be confused with mixed methods techniques, which refer instead to the combining of qualitative and quantitative research methods. Mixed methods studies can increase reliability, but that is not their primary purpose. Mixed methods address multiple research questions, both the “how many” and “why” kind, or the causal and explanatory kind. Mixed methods will be discussed in more detail in chapter 15.

Let us return to the three examples of qualitative research described in chapter 1: Cory Abramson’s study of aging ( The End Game) , Jennifer Pierce’s study of lawyers and discrimination ( Racing for Innocence ), and my own study of liberal arts college students ( Amplified Advantage ). Each of these studies uses triangulation.

Abramson’s book is primarily based on three years of observations in four distinct neighborhoods. He chose the neighborhoods in such a way to maximize his ability to make comparisons: two were primarily middle class and two were primarily poor; further, within each set, one was predominantly White, while the other was either racially diverse or primarily African American. In each neighborhood, he was present in senior centers, doctors’ offices, public transportation, and other public spots where the elderly congregated. [4] The observations are the core of the book, and they are richly written and described in very moving passages. But it wasn’t enough for him to watch the seniors. He also engaged with them in casual conversation. That, too, is part of fieldwork. He sometimes even helped them make it to the doctor’s office or get around town. Going beyond these interactions, he also interviewed sixty seniors, an equal amount from each of the four neighborhoods. It was in the interviews that he could ask more detailed questions about their lives, what they thought about aging, what it meant to them to be considered old, and what their hopes and frustrations were. He could see that those living in the poor neighborhoods had a more difficult time accessing care and resources than those living in the more affluent neighborhoods, but he couldn’t know how the seniors understood these difficulties without interviewing them. Both forms of data collection supported each other and helped make the study richer and more insightful. Interviews alone would have failed to demonstrate the very real differences he observed (and that some seniors would not even have known about). This is the value of methodological triangulation.

Pierce’s book relies on two separate forms of data collection—interviews with lawyers at a firm that has experienced a history of racial discrimination and content analyses of news stories and popular films that screened during the same years of the alleged racial discrimination. I’ve used this book when teaching methods and have often found students struggle with understanding why these two forms of data collection were used. I think this is because we don’t teach students to appreciate or recognize “popular films” as a legitimate form of data. But what Pierce does is interesting and insightful in the best tradition of qualitative research. Here is a description of the content analyses from a review of her book:

In the chapter on the news media, Professor Pierce uses content analysis to argue that the media not only helped shape the meaning of affirmative action, but also helped create white males as a class of victims. The overall narrative that emerged from these media accounts was one of white male innocence and victimization. She also maintains that this narrative was used to support “neoconservative and neoliberal political agendas” (p. 21). The focus of these articles tended to be that affirmative action hurt white working-class and middle-class men particularly during the recession in the 1980s (despite statistical evidence that people of color were hurt far more than white males by the recession). In these stories fairness and innocence were seen in purely individual terms. Although there were stories that supported affirmative action and developed a broader understanding of fairness, the total number of stories slanted against affirmative action from 1990 to 1999. During that time period negative stories always outnumbered those supporting the policy, usually by a ratio of 3:1 or 3:2. Headlines, the presentation of polling data, and an emphasis in stories on racial division, Pierce argues, reinforced the story of white male victimization. Interestingly, the news media did very few stories on gender and affirmative action. The chapter on the film industry from 1989 to 1999 reinforces Pierce’s argument and adds another layer to her interpretation of affirmative action during this time period. She sampled almost 60 Hollywood films with receipts ranging from four million to 184 million dollars. In this chapter she argues that the dominant theme of these films was racial progress and the redemption of white Americans from past racism. These movies usually portrayed white, elite, and male experiences. People of color were background figures who supported the protagonist and “anointed” him as a savior (p. 45). Over the course of the film the protagonists move from “innocence to consciousness” concerning racism. The antagonists in these films most often were racist working-class white men. A Time to Kill , Mississippi Burning , Amistad , Ghosts of Mississippi , The Long Walk Home , To Kill a Mockingbird , and Dances with Wolves receive particular analysis in this chapter, and her examination of them leads Pierce to conclude that they infused a myth of racial progress into America’s cultural memory. White experiences of race are the focus and contemporary forms of racism are underplayed or omitted. Further, these films stereotype both working-class and elite white males, and underscore the neoliberal emphasis on individualism. ( Hrezo 2012 )

With that context in place, Pierce then turned to interviews with attorneys. She finds that White male attorneys often misremembered facts about the period in which the law firm was accused of racial discrimination and that they often portrayed their firms as having made substantial racial progress. This was in contrast to many of the lawyers of color and female lawyers who remembered the history differently and who saw continuing examples of racial (and gender) discrimination at the law firm. In most of the interviews, people talked about individuals, not structure (and these are attorneys, who really should know better!). By including both content analyses and interviews in her study, Pierce is better able to situate the attorney narratives and explain the larger context for the shared meanings of individual innocence and racial progress. Had this been a study only of films during this period, we would not know how actual people who lived during this period understood the decisions they made; had we had only the interviews, we would have missed the historical context and seen a lot of these interviewees as, well, not very nice people at all. Together, we have a study that is original, inventive, and insightful.

My own study of how class background affects the experiences and outcomes of students at small liberal arts colleges relies on mixed methods and triangulation. At the core of the book is an original survey of college students across the US. From analyses of this survey, I can present findings on “how many” questions and descriptive statistics comparing students of different social class backgrounds. For example, I know and can demonstrate that working-class college students are less likely to go to graduate school after college than upper-class college students are. I can even give you some estimates of the class gap. But what I can’t tell you from the survey is exactly why this is so or how it came to be so . For that, I employ interviews, focus groups, document reviews, and observations. Basically, I threw the kitchen sink at the “problem” of class reproduction and higher education (i.e., Does college reduce class inequalities or make them worse?). A review of historical documents provides a picture of the place of the small liberal arts college in the broader social and historical context. Who had access to these colleges and for what purpose have always been in contest, with some groups attempting to exclude others from opportunities for advancement. What it means to choose a small liberal arts college in the early twenty-first century is thus different for those whose parents are college professors, for those whose parents have a great deal of money, and for those who are the first in their family to attend college. I was able to get at these different understandings through interviews and focus groups and to further delineate the culture of these colleges by careful observation (and my own participation in them, as both former student and current professor). Putting together individual meanings, student dispositions, organizational culture, and historical context allowed me to present a story of how exactly colleges can both help advance first-generation, low-income, working-class college students and simultaneously amplify the preexisting advantages of their peers. Mixed methods addressed multiple research questions, while triangulation allowed for this deeper, more complex story to emerge.

In the next few chapters, we will explore each of the primary data collection techniques in much more detail. As we do so, think about how these techniques may be productively joined for more reliable and deeper studies of the social world.

Advanced Reading: Triangulation

Denzin ( 1978 ) identified four basic types of triangulation: data, investigator, theory, and methodological. Properly speaking, if we use the Denzin typology, the use of multiple methods of data collection and analysis to strengthen one’s study is really a form of methodological triangulation. It may be helpful to understand how this differs from the other types.

Data triangulation occurs when the researcher uses a variety of sources in a single study. Perhaps they are interviewing multiple samples of college students. Obviously, this overlaps with sample selection (see chapter 5). It is helpful for the researcher to understand that these multiple data sources add strength and reliability to the study. After all, it is not just “these students here” but also “those students over there” that are experiencing this phenomenon in a particular way.

Investigator triangulation occurs when different researchers or evaluators are part of the research team. Intercoding reliability is a form of investigator triangulation (or at least a way of leveraging the power of multiple researchers to raise the reliability of the study).

Theory triangulation is the use of multiple perspectives to interpret a single set of data, as in the case of competing theoretical paradigms (e.g., a human capital approach vs. a Bourdieusian multiple capital approach).

Methodological triangulation , as explained in this chapter, is the use of multiple methods to study a single phenomenon, issue, or problem.

Further Readings

Carter, Nancy, Denise Bryant-Lukosius, Alba DiCenso, Jennifer Blythe, Alan J. Neville. 2014. “The Use of Triangulation in Qualitative Research.” Oncology Nursing Forum 41(5):545–547. Discusses the four types of triangulation identified by Denzin with an example of the use of focus groups and in-depth individuals.

Mathison, Sandra. 1988. “Why Triangulate?” Educational Researcher 17(2):13–17. Presents three particular ways of assessing validity through the use of triangulated data collection: convergence, inconsistency, and contradiction.

Tracy, Sarah J. 2010. “Qualitative Quality: Eight ‘Big-Tent’ Criteria for Excellent Qualitative Research.” Qualitative Inquiry 16(10):837–851. Focuses on triangulation as a criterion for conducting valid qualitative research.

  • Marshall and Rossman ( 2016 ) state this slightly differently. They list four primary methods for gathering information: (1) participating in the setting, (2) observing directly, (3) interviewing in depth, and (4) analyzing documents and material culture (141). An astute reader will note that I have collapsed participation into observation and that I have distinguished focus groups from interviews. I suspect that this distinction marks me as more of an interview-based researcher, while Marshall and Rossman prioritize ethnographic approaches. The main point of this footnote is to show you, the reader, that there is no single agreed-upon number of approaches to collecting qualitative data. ↵
  • See “ Advanced Reading: Triangulation ” at end of this chapter. ↵
  • We can also think about triangulating the sources, as when we include comparison groups in our sample (e.g., if we include those receiving vaccines, we might find out a bit more about where the real differences lie between them and the vaccine hesitant); triangulating the analysts (building a research team so that your interpretations can be checked against those of others on the team); and even triangulating the theoretical perspective (as when we “try on,” say, different conceptualizations of social capital in our analyses). ↵

Introduction to Qualitative Research Methods Copyright © 2023 by Allison Hurst is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

methods of data collection on research

Home QuestionPro QuestionPro Products

Data Collection Methods: Sources & Examples

Data Collection Methods

Data is a collection of facts, figures, objects, symbols, and events gathered from different sources. Organizations collect data using various data collection methods to make better decisions. Without data, it would be difficult for organizations to make appropriate decisions, so data is collected from different audiences at various points in time.

For example, an organization must collect data on product demand, customer preferences, and competitors before launching a new product. If data is not collected beforehand, the organization’s newly launched product may fail for many reasons, such as less demand and inability to meet customer needs. 

Although data is a valuable asset for every organization, it does not serve any purpose until it is analyzed or processed to achieve the desired results.

What are Data Collection Methods?

Data collection methods are techniques and procedures for gathering information for research purposes. They can range from simple self-reported surveys to more complex experiments and can involve either quantitative or qualitative approaches.

Some common data collection methods include surveys, interviews, observations, focus groups, experiments, and secondary data analysis . The data collected through these methods can then be analyzed and used to support or refute research hypotheses and draw conclusions about the study’s subject matter.

LEARN ABOUT: Self-Selection Bias

Understanding Data Collection Methods

Data collection methods encompass a variety of techniques and tools for gathering both quantitative and qualitative data. These methods are integral to the data collection process, ensuring accurate and comprehensive data acquisition. 

Quantitative data collection methods involve systematic approaches to collecting data, like numerical data, such as surveys, polls, and statistical analysis, aimed at quantifying phenomena and trends. 

Conversely, qualitative data collection methods focus on capturing non-numerical information, such as interviews, focus groups, and observations, to delve deeper into understanding attitudes, behaviors, and motivations. 

Employing a combination of quantitative and qualitative data collection techniques can enrich organizations’ datasets and gain comprehensive insights into complex phenomena.

Effective utilization of accurate data collection tools and techniques enhances the accuracy and reliability of collected data, facilitating informed decision-making and strategic planning.

Importance of Data Collection Methods

Data collection methods play a crucial role in the research process as they determine the quality and accuracy of the data collected. Here are some major importance of data collection methods.

  • Quality and Accuracy: The choice of data collection method directly impacts the quality and accuracy of the data obtained. Properly designed methods help ensure that the data collected is relevant to the research questions and free from errors.
  • Relevance, Validity, and Reliability: Effective data collection methods help ensure that the data collected is relevant to the research objectives, valid (measuring what it intends to measure), and reliable (consistent and reproducible).
  • Bias Reduction and Representativeness: Carefully chosen data collection methods can help minimize biases inherent in the research process, such as sampling bias or response bias. They also aid in achieving a representative sample, enhancing the findings’ generalizability.
  • Informed Decision Making: Accurate and reliable data collected through appropriate methods provide a solid foundation for making informed decisions based on research findings. This is crucial for both academic research and practical applications in various fields.
  • Achievement of Research Objectives: Data collection methods should align with the research objectives to ensure that the collected data effectively addresses the research questions or hypotheses. Properly collected data facilitates the attainment of these objectives.
  • Support for Validity and Reliability: Validity and reliability are essential aspects of research validity. The choice of data collection methods can either enhance or detract from the validity and reliability of research findings. Therefore, selecting appropriate methods is critical for ensuring the credibility of the research.

The importance of data collection methods cannot be overstated, as they play a key role in the research study’s overall success and internal validity .

LEARN ABOUT: Data Asset Management

Types of Data Collection Methods

The choice of data collection method depends on the research question being addressed, the type of data needed, and the resources and time available. Data collection methods can be categorized into primary and secondary methods.

1. Primary Data Collection Methods

Primary data is collected from first-hand experience and is not used in the past. The data gathered by primary data collection methods are highly accurate and specific to the research’s motive.

Primary data collection methods can be divided into two categories: quantitative methods and qualitative methods .

Quantitative Methods:

Quantitative techniques for market research and demand forecasting usually use statistical tools. In these techniques, demand is forecasted based on historical data. These methods of primary data collection are generally used to make long-term forecasts. Statistical analysis methods are highly reliable as subjectivity is minimal.

methods of data collection on research

  • Time Series Analysis: A time series refers to a sequential order of values of a variable, known as a trend, at equal time intervals. Using patterns, an organization can predict the demand for its products and services over a projected time period. 
  • Smoothing Techniques: Smoothing techniques can be used in cases where the time series lacks significant trends. They eliminate random variation from the historical demand, helping identify patterns and demand levels to estimate future demand.  The most common methods used in smoothing demand forecasting are the simple moving average and weighted moving average methods. 
  • Barometric Method: Also known as the leading indicators approach, researchers use this method to speculate future trends based on current developments. When past events are considered to predict future events, they act as leading indicators.

Qualitative Methods:

Qualitative data collection methods are especially useful when historical data is unavailable or when numbers or mathematical calculations are unnecessary.

Qualitative research is closely associated with words, sounds, feelings, emotions, colors, and non-quantifiable elements. These techniques are based on experience, judgment, intuition, conjecture, emotion, etc.

Quantitative methods do not provide the motive behind participants’ responses, often don’t reach underrepresented populations, and require long periods of time to collect the data. Hence, it is best to combine quantitative methods with qualitative methods.

1. Surveys: Surveys collect data from the target audience and gather insights into their preferences, opinions, choices, and feedback related to their products and services. Most survey software offers a wide range of question types.

You can also use a ready-made survey template to save time and effort. Online surveys can be customized to match the business’s brand by changing the theme, logo, etc. They can be distributed through several channels, such as email, website, offline app, QR code, social media, etc. 

You can select the channel based on your audience’s type and source. Once the data is collected, survey software can generate various reports and run analytics algorithms to discover hidden insights. 

A survey dashboard can give you statistics related to response rate, completion rate, demographics-based filters, export and sharing options, etc. Integrating survey builders with third-party apps can maximize the effort spent on online real-time data collection . 

Practical business intelligence relies on the synergy between analytics and reporting , where analytics uncovers valuable insights, and reporting communicates these findings to stakeholders.

2. Polls: Polls comprise one single or multiple-choice question . They are useful when you need to get a quick pulse of the audience’s sentiments. Because they are short, it is easier to get responses from people.

Like surveys, online polls can be embedded into various platforms. Once the respondents answer the question, they can also be shown how they compare to others’ responses.

Interviews: In this method, the interviewer asks the respondents face-to-face or by telephone. 

3. Interviews: In face-to-face interviews, the interviewer asks a series of questions to the interviewee in person and notes down responses. If it is not feasible to meet the person, the interviewer can go for a telephone interview. 

This form of data collection is suitable for only a few respondents. It is too time-consuming and tedious to repeat the same process if there are many participants.

methods of data collection on research

4. Delphi Technique: In the Delphi method, market experts are provided with the estimates and assumptions of other industry experts’ forecasts. Experts may reconsider and revise their estimates and assumptions based on this information. The consensus of all experts on demand forecasts constitutes the final demand forecast.

5. Focus Groups: Focus groups are one example of qualitative data in education . In a focus group, a small group of people, around 8-10 members, discuss the common areas of the research problem. Each individual provides his or her insights on the issue concerned. 

A moderator regulates the discussion among the group members. At the end of the discussion, the group reaches a consensus.

6. Questionnaire: A questionnaire is a printed set of open-ended or closed-ended questions that respondents must answer based on their knowledge and experience with the issue. The questionnaire is part of the survey, whereas the questionnaire’s end goal may or may not be a survey.

Secondary Data Collection Methods

Secondary data is data that has been used in the past. The researcher can obtain data from the data sources , both internal and external, to the organizational data . 

Internal sources of secondary data:

  • Organization’s health and safety records
  • Mission and vision statements
  • Financial Statements
  • Sales Report
  • CRM Software
  • Executive summaries

External sources of secondary data:

  • Government reports
  • Press releases
  • Business journals

Secondary data collection methods can also involve quantitative and qualitative techniques. Secondary data is easily available, less time-consuming, and expensive than primary data. However, the authenticity of the data gathered cannot be verified using these methods.

Secondary data collection methods can also involve quantitative and qualitative observation techniques. Secondary data is easily available, less time-consuming, and more expensive than primary data. 

However, the authenticity of the data gathered cannot be verified using these methods.

Regardless of the data collection method of your choice, there must be direct communication with decision-makers so that they understand and commit to acting according to the results.

For this reason, we must pay special attention to the analysis and presentation of the information obtained. Remember that these data must be useful and functional to us, so the data collection method used has much to do with it.

How QuestionPro Can Help in Data Collection Methods

QuestionPro is a comprehensive online survey software platform that can greatly assist in various data collection methods. Here’s how it can help:

  • Survey Creation: QuestionPro offers a user-friendly interface for creating surveys with various question types, including multiple-choice, open-ended, Likert scale, and more. Researchers can customize surveys to fit their specific research needs and objectives.
  • Diverse Distribution Channels: The platform provides multiple channels for distributing surveys, including email, web links, social media, and embedding surveys on websites. This enables researchers to reach a wide audience and collect data efficiently.
  • Panel Management: QuestionPro offers panel management features, allowing researchers to create and manage panels of respondents for targeted data collection. This is particularly useful for longitudinal studies or when targeting specific demographics.
  • Data Analysis Tools: The platform includes robust data analysis tools that enable researchers to analyze survey responses in real time. Researchers can generate customizable reports, visualize data through charts and graphs, and identify trends and patterns within the data.
  • Data Security and Compliance: QuestionPro prioritizes data security and compliance with regulations such as GDPR and HIPAA. The platform offers features such as SSL encryption, data masking, and secure data storage to ensure the confidentiality and integrity of collected data.
  • Mobile Compatibility: With the increasing use of mobile devices, QuestionPro ensures that surveys are mobile-responsive, allowing respondents to participate in surveys conveniently from their smartphones or tablets.
  • Integration Capabilities: QuestionPro integrates with various third-party tools and platforms, including CRMs, email marketing software, and analytics tools. This allows researchers to streamline their data collection processes and incorporate survey data into their existing workflows.
  • Customization and Branding: Researchers can customize surveys with their branding elements, such as logos, colors, and themes, enhancing the professional appearance of surveys and increasing respondent engagement.

The conclusion you obtain from your investigation will set the course of the company’s decision-making, so present your report clearly, and list the steps you followed to obtain those results.

Make sure that whoever will take the corresponding actions understands the importance of the information collected and that it gives them the solutions they expect.

QuestionPro offers a comprehensive suite of features and tools that can significantly streamline the data collection process, from survey creation to analysis, while ensuring data security and compliance. Remember that at QuestionPro, we can help you collect data easily and efficiently. Request a demo and learn about all the tools we have for you.

FREE TRIAL         LEARN MORE

MORE LIKE THIS

in-app feedback tools

In-App Feedback Tools: How to Collect, Uses & 14 Best Tools

Mar 29, 2024

Customer Journey Analytics Software

11 Best Customer Journey Analytics Software in 2024

VOC software

17 Best VOC Software for Customer Experience in 2024

Mar 28, 2024

CEM software

CEM Software: What it is, 7 Best CEM Software in 2024

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Research-Methodology

Data Collection Methods

Data collection is a process of collecting information from all the relevant sources to find answers to the research problem, test the hypothesis (if you are following deductive approach ) and evaluate the outcomes. Data collection methods can be divided into two categories: secondary methods of data collection and primary methods of data collection.

Secondary Data Collection Methods

Secondary data is a type of data that has already been published in books, newspapers, magazines, journals, online portals etc.  There is an abundance of data available in these sources about your research area in business studies, almost regardless of the nature of the research area. Therefore, application of appropriate set of criteria to select secondary data to be used in the study plays an important role in terms of increasing the levels of research validity and reliability.

These criteria include, but not limited to date of publication, credential of the author, reliability of the source, quality of discussions, depth of analyses, the extent of contribution of the text to the development of the research area etc. Secondary data collection is discussed in greater depth in Literature Review chapter.

Secondary data collection methods offer a range of advantages such as saving time, effort and expenses. However they have a major disadvantage. Specifically, secondary research does not make contribution to the expansion of the literature by producing fresh (new) data.

Primary Data Collection Methods

Primary data is the type of data that has not been around before. Primary data is unique findings of your research. Primary data collection and analysis typically requires more time and effort to conduct compared to the secondary data research. Primary data collection methods can be divided into two groups: quantitative and qualitative.

Quantitative data collection methods are based on mathematical calculations in various formats. Methods of quantitative data collection and analysis include questionnaires with closed-ended questions, methods of correlation and regression, mean, mode and median and others.

Quantitative methods are cheaper to apply and they can be applied within shorter duration of time compared to qualitative methods. Moreover, due to a high level of standardisation of quantitative methods, it is easy to make comparisons of findings.

Qualitative research methods , on the contrary, do not involve numbers or mathematical calculations. Qualitative research is closely associated with words, sounds, feeling, emotions, colours and other elements that are non-quantifiable.

Qualitative studies aim to ensure greater level of depth of understanding and qualitative data collection methods include interviews, questionnaires with open-ended questions, focus groups, observation, game or role-playing, case studies etc.

Your choice between quantitative or qualitative methods of data collection depends on the area of your research and the nature of research aims and objectives.

My e-book, The Ultimate Guide to Writing a Dissertation in Business Studies: a step by step assistance offers practical assistance to complete a dissertation with minimum or no stress. The e-book covers all stages of writing a dissertation starting from the selection to the research area to submitting the completed version of the work within the deadline.

John Dudovskiy

Data Collection Methods

  • Subject List
  • Take a Tour
  • For Authors
  • Subscriber Services
  • Publications
  • African American Studies
  • African Studies
  • American Literature
  • Anthropology
  • Architecture Planning and Preservation
  • Art History
  • Atlantic History
  • Biblical Studies
  • British and Irish Literature
  • Childhood Studies
  • Chinese Studies
  • Cinema and Media Studies
  • Communication
  • Criminology
  • Environmental Science
  • Evolutionary Biology
  • International Law
  • International Relations
  • Islamic Studies
  • Jewish Studies
  • Latin American Studies
  • Latino Studies
  • Linguistics
  • Literary and Critical Theory
  • Medieval Studies
  • Military History
  • Political Science
  • Public Health
  • Renaissance and Reformation
  • Social Work
  • Urban Studies
  • Victorian Literature
  • Browse All Subjects

How to Subscribe

  • Free Trials

In This Article Expand or collapse the "in this article" section Data Collection in Educational Research

Introduction, general overviews.

  • General Quantitative Overviews
  • Questionnaires
  • Quantitative Interviewing
  • Quantitative Observation
  • Technical Properties
  • General Qualitative Overviews
  • In-Depth Interviewing
  • Focus Groups
  • Qualitative Observation
  • Qualitative Document Analysis
  • Visual Analysis

Related Articles Expand or collapse the "related articles" section about

About related articles close popup.

Lorem Ipsum Sit Dolor Amet

Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam ligula odio, euismod ut aliquam et, vestibulum nec risus. Nulla viverra, arcu et iaculis consequat, justo diam ornare tellus, semper ultrices tellus nunc eu tellus.

  • Case Study in Education Research
  • Grounded Theory
  • Methodologies for Conducting Education Research
  • Mixed Methods Research
  • Qualitative Research Design
  • Statistical Assumptions
  • Using Ethnography in Educational Research

Other Subject Areas

Forthcoming articles expand or collapse the "forthcoming articles" section.

  • Gender, Power, and Politics in the Academy
  • Girls' Education in the Developing World
  • Non-Formal & Informal Environmental Education
  • Find more forthcoming articles...
  • Export Citations
  • Share This Facebook LinkedIn Twitter

Data Collection in Educational Research by James H. McMillan , Laura P. Gogia LAST REVIEWED: 05 August 2020 LAST MODIFIED: 30 June 2014 DOI: 10.1093/obo/9780199756810-0087

Data collection methods in educational research are used to gather information that is then analyzed and interpreted. As such, data collection is a very important step in conducting research and can influence results significantly. Once the research question and sources of data are identified, appropriate methods of data collection are determined. Data collection includes a broad range of more specific techniques. Historically, much of the data collection performed in educational research depended on methods developed for studies in the field of psychology, a discipline which took what is termed a “quantitative” approach. This involves using instruments, scales, Tests , and structured observation and interviewing. By the mid- to late twentieth centuries, other disciplines such as anthropology and sociology began to influence educational researchers. Forms of data collection broadened to include what is now called “qualitative” methods, with an emphasis on narratives, participant perspectives, and less structured observation and interviewing. As contemporary educational researchers also draw from fields such as business, political science, and medicine, data collection in education has become a multidisciplinary phenomenon. Because data collection is such a broad topic, General Overviews that attempt to cover all or most techniques tend to offer introductory treatments. Few texts, however, provide comprehensive coverage of every data collection technique. Instead, some cover techniques appropriate for either quantitative or qualitative research approaches. Even more focus on one or two data collection methods within those two research contexts. Consequently, after presenting general overviews, this entry is categorized by data collection appropriate for quantitative and Qualitative Data Collection . These sections, in turn, are subdivided into the major types of quantitative and qualitative data collection techniques. While there are some data collection techniques specific to mixed method research design, which implies a combination of qualitative and quantitative research methodologies, these specific procedures are not emphasized in the present article—readers are referred to the Oxford Bibliography article Mixed Methods Research by Nancy Leech for a comprehensive treatment of mixed method data collection techniques. To locate sources for this article, extensive searches were performed using general-use Internet search engines and educational, psychological, and social science research databases. These searches included keywords around data collection and research methods, as well as specific data collection techniques such as surveys, Tests , Focus Groups , and observation. Frequently cited texts and articles, most recent editions at the time, and sources specific to educational research were given priority. Once these sources were identified, their suggested readings and reference lists were mined for other potential sources. Works or scholars found in multiple reference lists were investigated. When applicable, book reviews in peer-reviewed journals were located and taken into account when curating sources. Sources that demonstrated a high level of impact or offered unique coverage of the topic were included.

General educational research overviews typically include several chapters on data collection, organized into qualitative and quantitative approaches. As a rule they are updated frequently so that they offer timely discussions of methodological trends. Most of them are introductory in nature, written for student researchers. Because of the influence of psychology and other social sciences on the development of data collection in educational research, representative works of psychology ( Trochim 2006 ) and of general social sciences ( Robson 2011 ) are included. Available online, Trochim 2006 is a reader-friendly introduction that provides succinct explanations of most quantitative and qualitative approaches. Olsen 2012 is helpful in showing how data collection techniques used in other disciplines have implications for educational studies. Specific to education, Gall, et al. 2007 is a frequently cited text that contains most educational data collection techniques, although it tends to emphasize more traditional quantitative approaches. Johnson and Christensen 2014 offers a more balanced treatment meant for novice researchers and educational research consumers. Cohen, et al. 2011 also provides a balanced approach, but from a British perspective. Fielding, et al. 2008 offer practical advice on recently developed forms of online data collection, with special attention given to the ethical ramifications of Internet-based data collection. Finally, Arthur, et al. 2012 is unique in this section in that it is an edited work offering short overviews of data collection techniques authored by contemporary leading experts.

Arthur, James, Michael Waring, Robert Coe, and Larry Hedges, eds. 2012. Research methods and methodologies in education . London: SAGE.

A diverse edited text discussing trends in study designs, data collection, and data analysis. It includes twelve chapters devoted to different forms of data collection, written by authors who have recently published extensively on the topic. Annotated bibliographies found at the end of each chapter provide guidance for further reading.

Cohen, Louis, Lawrence Manion, and Keith Morrison. 2011. Research methods in education . 7th ed. London: Routledge.

This long-running, bestselling, comprehensive source offers practical advice with clear theoretical foundations. The newest edition has undergone significant revision. Specific to data collection, revisions include new chapters devoted to data collection via the Internet and visual media. Slides highlighting main points are available on a supplementary website.

Fielding, Nigel, Raymond Lee, and Grant Blank. 2008. The SAGE handbook of online research methods . Thousand Oaks, CA: SAGE.

This extensive handbook presents chapters on Internet research design and data collection written by leading scholars in the field. It discusses using the Internet as an archival resource and a research tool, focusing on the most recent trends in multidisciplinary Internet research.

Gall, Meredith, Joyce Gall, and Walter Borg. 2007. Educational research: An introduction . 8th ed. White Plains, NY: Pearson.

A long-standing, well-respected, nuts-and-bolts perspective on data collection meant to prepare students for conducting original research. Although it tends to emphasize quantitative research methodologies, it has a uniquely rich chapter on historical document analysis.

Johnson, Burke, and Larry Christensen. 2014. Educational research: Quantitative, qualitative, and mixed approaches . 5th ed. Thousand Oaks, CA: SAGE.

A comprehensive introductory text for the consumer and the would-be researcher, with extensive lists of additional resources for gathering all types of data. It discusses quantitative and qualitative research methodologies and data collection evenly but provides extended coverage of questionnaire construction.

Olsen, Wendy. 2012. Data collection: Key debates and methods in social research . London: SAGE.

This recently published toolkit of quantitative, qualitative, and mixed method approaches to data collection provides a more contemporary introduction for both students and research professionals. It offers a helpful overview of data collection as an integral part of research in several different fields of study.

Robson, Colin. 2011. Real world research: A resource for users of social research methods in applied settings . West Sussex, UK: Wiley

This introductory text is intended for all social science. There is an applied, integrated emphasis on contemporary quantitative and qualitative data collection techniques in a separate section of the book, including individual and focus group observations, surveys, unstructured and structured interviewing, and tests.

Trochim, William. 2006. Research methods knowledge base

A free online hypertext textbook on applied social research methods. Data collection techniques associated with qualitative and quantitative research are covered comprehensively. Foundational information appropriate for undergraduates and early graduate students is presented through a series of easy-to-navigate and intuitively ordered webpages. Printed editions are available for purchase in an edition written with James Donnelly (Atomic Dog/Cengage Learning, 2008).

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login .

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here .

  • About Education »
  • Meet the Editorial Board »
  • Academic Achievement
  • Academic Audit for Universities
  • Academic Freedom and Tenure in the United States
  • Action Research in Education
  • Adjuncts in Higher Education in the United States
  • Administrator Preparation
  • Adolescence
  • Advanced Placement and International Baccalaureate Courses
  • Advocacy and Activism in Early Childhood
  • African American Racial Identity and Learning
  • Alaska Native Education
  • Alternative Certification Programs for Educators
  • Alternative Schools
  • American Indian Education
  • Animals in Environmental Education
  • Art Education
  • Artificial Intelligence and Learning
  • Assessing School Leader Effectiveness
  • Assessment, Behavioral
  • Assessment, Educational
  • Assessment in Early Childhood Education
  • Assistive Technology
  • Augmented Reality in Education
  • Beginning-Teacher Induction
  • Bilingual Education and Bilingualism
  • Black Undergraduate Women: Critical Race and Gender Perspe...
  • Blended Learning
  • Changing Professional and Academic Identities
  • Character Education
  • Children’s and Young Adult Literature
  • Children's Beliefs about Intelligence
  • Children's Rights in Early Childhood Education
  • Citizenship Education
  • Civic and Social Engagement of Higher Education
  • Classroom Learning Environments: Assessing and Investigati...
  • Classroom Management
  • Coherent Instructional Systems at the School and School Sy...
  • College Admissions in the United States
  • College Athletics in the United States
  • Community Relations
  • Comparative Education
  • Computer-Assisted Language Learning
  • Computer-Based Testing
  • Conceptualizing, Measuring, and Evaluating Improvement Net...
  • Continuous Improvement and "High Leverage" Educational Pro...
  • Counseling in Schools
  • Critical Approaches to Gender in Higher Education
  • Critical Perspectives on Educational Innovation and Improv...
  • Critical Race Theory
  • Crossborder and Transnational Higher Education
  • Cross-National Research on Continuous Improvement
  • Cross-Sector Research on Continuous Learning and Improveme...
  • Cultural Diversity in Early Childhood Education
  • Culturally Responsive Leadership
  • Culturally Responsive Pedagogies
  • Culturally Responsive Teacher Education in the United Stat...
  • Curriculum Design
  • Data Collection in Educational Research
  • Data-driven Decision Making in the United States
  • Deaf Education
  • Desegregation and Integration
  • Design Thinking and the Learning Sciences: Theoretical, Pr...
  • Development, Moral
  • Dialogic Pedagogy
  • Digital Age Teacher, The
  • Digital Citizenship
  • Digital Divides
  • Disabilities
  • Distance Learning
  • Distributed Leadership
  • Doctoral Education and Training
  • Early Childhood Education and Care (ECEC) in Denmark
  • Early Childhood Education and Development in Mexico
  • Early Childhood Education in Aotearoa New Zealand
  • Early Childhood Education in Australia
  • Early Childhood Education in China
  • Early Childhood Education in Europe
  • Early Childhood Education in Sub-Saharan Africa
  • Early Childhood Education in Sweden
  • Early Childhood Education Pedagogy
  • Early Childhood Education Policy
  • Early Childhood Education, The Arts in
  • Early Childhood Mathematics
  • Early Childhood Science
  • Early Childhood Teacher Education
  • Early Childhood Teachers in Aotearoa New Zealand
  • Early Years Professionalism and Professionalization Polici...
  • Economics of Education
  • Education For Children with Autism
  • Education for Sustainable Development
  • Education Leadership, Empirical Perspectives in
  • Education of Native Hawaiian Students
  • Education Reform and School Change
  • Educational Statistics for Longitudinal Research
  • Educator Partnerships with Parents and Families with a Foc...
  • Emotional and Affective Issues in Environmental and Sustai...
  • Emotional and Behavioral Disorders
  • Environmental and Science Education: Overlaps and Issues
  • Environmental Education
  • Environmental Education in Brazil
  • Epistemic Beliefs
  • Equity and Improvement: Engaging Communities in Educationa...
  • Equity, Ethnicity, Diversity, and Excellence in Education
  • Ethical Research with Young Children
  • Ethics and Education
  • Ethics of Teaching
  • Ethnic Studies
  • Evidence-Based Communication Assessment and Intervention
  • Family and Community Partnerships in Education
  • Family Day Care
  • Federal Government Programs and Issues
  • Feminization of Labor in Academia
  • Finance, Education
  • Financial Aid
  • Formative Assessment
  • Future-Focused Education
  • Gender and Achievement
  • Gender and Alternative Education
  • Gender-Based Violence on University Campuses
  • Gifted Education
  • Global Mindedness and Global Citizenship Education
  • Global University Rankings
  • Governance, Education
  • Growth of Effective Mental Health Services in Schools in t...
  • Higher Education and Globalization
  • Higher Education and the Developing World
  • Higher Education Faculty Characteristics and Trends in the...
  • Higher Education Finance
  • Higher Education Governance
  • Higher Education Graduate Outcomes and Destinations
  • Higher Education in Africa
  • Higher Education in China
  • Higher Education in Latin America
  • Higher Education in the United States, Historical Evolutio...
  • Higher Education, International Issues in
  • Higher Education Management
  • Higher Education Policy
  • Higher Education Research
  • Higher Education Student Assessment
  • High-stakes Testing
  • History of Early Childhood Education in the United States
  • History of Education in the United States
  • History of Technology Integration in Education
  • Homeschooling
  • Inclusion in Early Childhood: Difference, Disability, and ...
  • Inclusive Education
  • Indigenous Education in a Global Context
  • Indigenous Learning Environments
  • Indigenous Students in Higher Education in the United Stat...
  • Infant and Toddler Pedagogy
  • Inservice Teacher Education
  • Integrating Art across the Curriculum
  • Intelligence
  • Intensive Interventions for Children and Adolescents with ...
  • International Perspectives on Academic Freedom
  • Intersectionality and Education
  • Knowledge Development in Early Childhood
  • Leadership Development, Coaching and Feedback for
  • Leadership in Early Childhood Education
  • Leadership Training with an Emphasis on the United States
  • Learning Analytics in Higher Education
  • Learning Difficulties
  • Learning, Lifelong
  • Learning, Multimedia
  • Learning Strategies
  • Legal Matters and Education Law
  • LGBT Youth in Schools
  • Linguistic Diversity
  • Linguistically Inclusive Pedagogy
  • Literacy Development and Language Acquisition
  • Literature Reviews
  • Mathematics Identity
  • Mathematics Instruction and Interventions for Students wit...
  • Mathematics Teacher Education
  • Measurement for Improvement in Education
  • Measurement in Education in the United States
  • Meta-Analysis and Research Synthesis in Education
  • Methodological Approaches for Impact Evaluation in Educati...
  • Mindfulness, Learning, and Education
  • Motherscholars
  • Multiliteracies in Early Childhood Education
  • Multiple Documents Literacy: Theory, Research, and Applica...
  • Multivariate Research Methodology
  • Museums, Education, and Curriculum
  • Music Education
  • Narrative Research in Education
  • Native American Studies
  • Note-Taking
  • Numeracy Education
  • One-to-One Technology in the K-12 Classroom
  • Online Education
  • Open Education
  • Organizing for Continuous Improvement in Education
  • Organizing Schools for the Inclusion of Students with Disa...
  • Outdoor Play and Learning
  • Outdoor Play and Learning in Early Childhood Education
  • Pedagogical Leadership
  • Pedagogy of Teacher Education, A
  • Performance Objectives and Measurement
  • Performance-based Research Assessment in Higher Education
  • Performance-based Research Funding
  • Phenomenology in Educational Research
  • Philosophy of Education
  • Physical Education
  • Podcasts in Education
  • Policy Context of United States Educational Innovation and...
  • Politics of Education
  • Portable Technology Use in Special Education Programs and ...
  • Post-humanism and Environmental Education
  • Pre-Service Teacher Education
  • Problem Solving
  • Productivity and Higher Education
  • Professional Development
  • Professional Learning Communities
  • Program Evaluation
  • Programs and Services for Students with Emotional or Behav...
  • Psychology Learning and Teaching
  • Psychometric Issues in the Assessment of English Language ...
  • Qualitative Data Analysis Techniques
  • Qualitative, Quantitative, and Mixed Methods Research Samp...
  • Quantitative Research Designs in Educational Research
  • Queering the English Language Arts (ELA) Writing Classroom
  • Race and Affirmative Action in Higher Education
  • Reading Education
  • Refugee and New Immigrant Learners
  • Relational and Developmental Trauma and Schools
  • Relational Pedagogies in Early Childhood Education
  • Reliability in Educational Assessments
  • Religion in Elementary and Secondary Education in the Unit...
  • Researcher Development and Skills Training within the Cont...
  • Research-Practice Partnerships in Education within the Uni...
  • Response to Intervention
  • Restorative Practices
  • Risky Play in Early Childhood Education
  • Scale and Sustainability of Education Innovation and Impro...
  • Scaling Up Research-based Educational Practices
  • School Accreditation
  • School Choice
  • School Culture
  • School District Budgeting and Financial Management in the ...
  • School Improvement through Inclusive Education
  • School Reform
  • Schools, Private and Independent
  • School-Wide Positive Behavior Support
  • Science Education
  • Secondary to Postsecondary Transition Issues
  • Self-Regulated Learning
  • Self-Study of Teacher Education Practices
  • Service-Learning
  • Severe Disabilities
  • Single Salary Schedule
  • Single-sex Education
  • Single-Subject Research Design
  • Social Context of Education
  • Social Justice
  • Social Network Analysis
  • Social Pedagogy
  • Social Science and Education Research
  • Social Studies Education
  • Sociology of Education
  • Standards-Based Education
  • Student Access, Equity, and Diversity in Higher Education
  • Student Assignment Policy
  • Student Engagement in Tertiary Education
  • Student Learning, Development, Engagement, and Motivation ...
  • Student Participation
  • Student Voice in Teacher Development
  • Sustainability Education in Early Childhood Education
  • Sustainability in Early Childhood Education
  • Sustainability in Higher Education
  • Teacher Beliefs and Epistemologies
  • Teacher Collaboration in School Improvement
  • Teacher Evaluation and Teacher Effectiveness
  • Teacher Preparation
  • Teacher Training and Development
  • Teacher Unions and Associations
  • Teacher-Student Relationships
  • Teaching Critical Thinking
  • Technologies, Teaching, and Learning in Higher Education
  • Technology Education in Early Childhood
  • Technology, Educational
  • Technology-based Assessment
  • The Bologna Process
  • The Regulation of Standards in Higher Education
  • Theories of Educational Leadership
  • Three Conceptions of Literacy: Media, Narrative, and Gamin...
  • Tracking and Detracking
  • Traditions of Quality Improvement in Education
  • Transformative Learning
  • Transitions in Early Childhood Education
  • Tribally Controlled Colleges and Universities in the Unite...
  • Understanding the Psycho-Social Dimensions of Schools and ...
  • University Faculty Roles and Responsibilities in the Unite...
  • Value of Higher Education for Students and Other Stakehold...
  • Virtual Learning Environments
  • Vocational and Technical Education
  • Wellness and Well-Being in Education
  • Women's and Gender Studies
  • Young Children and Spirituality
  • Young Children's Learning Dispositions
  • Young Children's Working Theories
  • Privacy Policy
  • Cookie Policy
  • Legal Notice
  • Accessibility

Powered by:

  • [66.249.64.20|109.248.223.228]
  • 109.248.223.228

No internet connection.

All search filters on the page have been cleared., your search has been saved..

  • All content
  • Dictionaries
  • Encyclopedias
  • Expert Insights
  • Foundations
  • How-to Guides
  • Journal Articles
  • Little Blue Books
  • Little Green Books
  • Project Planner
  • Tools Directory
  • Sign in to my profile My Profile

Not Logged In

  • Sign in Signed in
  • My profile My Profile

Not Logged In

Data Collection: Key Debates and Methods in Social Research

  • By: Wendy Olsen
  • Publisher: SAGE Publications Ltd
  • Publication year: 2012
  • Online pub date: December 22, 2014
  • Discipline: Anthropology
  • Methods: Survey research , Qualitative data collection , Data collection
  • DOI: https:// doi. org/10.4135/9781473914230
  • Keywords: attitudes , debates , knowledge , law , population , software , surveying Show all Show less
  • Print ISBN: 9781847872562
  • Online ISBN: 9781473914230
  • Buy the book icon link

Subject index

This innovative book for students and researchers alike gives an indispensable introduction to key issues and practical methods needed for data collection.

It uses clear definitions, relevant interdisciplinary examples from around the world and up-to-date suggestions for further reading to demonstrate how to gather and use qualitative, quantitative, and mixed data sets.

The book is divided into seven distinct parts, encouraging researchers to combine methods of data collection: Data Collection: An Introduction to Research Practices; Collecting Qualitative Data; Observation and Informed Methods; Experimental and Systematic Data Collection; Survey Methods for Data Collection; The Case-Study Method of Data Collection; Concluding Suggestions for Data-Collection Concepts

A stimulating, practical guide which can be read as individual concepts from the methods toolkit, or as a whole, this text is an important resource for students and research professionals.

Front Matter

  • About the Author
  • Introduction
  • Chapter 1.1 | Research and Data Collection
  • Chapter 1.2 | Findings
  • Chapter 1.3 | Data
  • Chapter 1.4 | Causes
  • Chapter 1.5 | Sampling
  • Chapter 2.1 | Interviews
  • Chapter 2.2 | Transcripts
  • Chapter 2.3 | Coding
  • Chapter 2.4 | Meaning
  • Chapter 2.5 | Interpretation
  • Chapter 2.6 | Observer Bias
  • Chapter 2.7 | Representations
  • Chapter 2.8 | Focus Groups
  • Chapter 2.9 | Document Analysis
  • Chapter 2.10 | Accuracy
  • Chapter 2.11 | Ethical Clearance
  • Chapter 3.1 | Participation
  • Chapter 3.2 | Praxis
  • Chapter 3.3 | Action Research
  • Chapter 3.4 | Observation Methods
  • Chapter 3.5 | Online Data Collection
  • Chapter 4.1 | Questionnaire Design
  • Chapter 4.2 | Handling Treatment Data
  • Chapter 4.3 | The Ethics of Volunteers
  • Chapter 4.4 | Market-Research Techniques
  • Chapter 4.5 | Creating Systematic Case-Study Data
  • Chapter 5.1 | Operationalisation
  • Chapter 5.2 | Measurement
  • Chapter 5.3 | Causality
  • Chapter 5.4 | Data Cleaning
  • Chapter 5.5 | Data Extraction
  • Chapter 5.6 | Outliers
  • Chapter 5.7 | Subsetting of Data
  • Chapter 5.8 | Survey Weights
  • Chapter 6.1 | Case-Study Research
  • Chapter 6.2 | Comparative Research
  • Chapter 6.3 | Configurations
  • Chapter 6.4 | Contingency
  • Chapter 6.5 | Causal Mechanisms
  • Chapter 7.1 | Facts
  • Chapter 7.2 | Reality
  • Chapter 7.3 | Retroduction

Back Matter

Sign in to access this content, get a 30 day free trial, more like this, sage recommends.

We found other relevant content for you on other Sage platforms.

Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches

  • Sign in/register

Navigating away from this page will delete your results

Please save your results to "My Self-Assessments" in your profile before navigating away from this page.

Sign in to my profile

Sign up for a free trial and experience all Sage Learning Resources have to offer.

You must have a valid academic email address to sign up.

Get off-campus access

  • View or download all content my institution has access to.

Sign up for a free trial and experience all Sage Research Methods has to offer.

  • view my profile
  • view my lists

Table of Contents

What is data collection, why do we need data collection, what are the different data collection methods, data collection tools, the importance of ensuring accurate and appropriate data collection, issues related to maintaining the integrity of data collection, what are common challenges in data collection, what are the key steps in the data collection process, data collection considerations and best practices, choose the right data science program, are you interested in a career in data science, what is data collection: methods, types, tools.

What is Data Collection? Definition, Types, Tools, and Techniques

The process of gathering and analyzing accurate data from various sources to find answers to research problems, trends and probabilities, etc., to evaluate possible outcomes is Known as Data Collection. Knowledge is power, information is knowledge, and data is information in digitized form, at least as defined in IT. Hence, data is power. But before you can leverage that data into a successful strategy for your organization or business, you need to gather it. That’s your first step.

So, to help you get the process started, we shine a spotlight on data collection. What exactly is it? Believe it or not, it’s more than just doing a Google search! Furthermore, what are the different types of data collection? And what kinds of data collection tools and data collection techniques exist?

If you want to get up to speed about what is data collection process, you’ve come to the right place. 

Transform raw data into captivating visuals with Simplilearn's hands-on Data Visualization Courses and captivate your audience. Also, master the art of data management with Simplilearn's comprehensive data management courses  - unlock new career opportunities today!

Data collection is the process of collecting and evaluating information or data from multiple sources to find answers to research problems, answer questions, evaluate outcomes, and forecast trends and probabilities. It is an essential phase in all types of research, analysis, and decision-making, including that done in the social sciences, business, and healthcare.

Accurate data collection is necessary to make informed business decisions, ensure quality assurance, and keep research integrity.

During data collection, the researchers must identify the data types, the sources of data, and what methods are being used. We will soon see that there are many different data collection methods . There is heavy reliance on data collection in research, commercial, and government fields.

Before an analyst begins collecting data, they must answer three questions first:

  • What’s the goal or purpose of this research?
  • What kinds of data are they planning on gathering?
  • What methods and procedures will be used to collect, store, and process the information?

Additionally, we can break up data into qualitative and quantitative types. Qualitative data covers descriptions such as color, size, quality, and appearance. Quantitative data, unsurprisingly, deals with numbers, such as statistics, poll numbers, percentages, etc.

Before a judge makes a ruling in a court case or a general creates a plan of attack, they must have as many relevant facts as possible. The best courses of action come from informed decisions, and information and data are synonymous.

The concept of data collection isn’t a new one, as we’ll see later, but the world has changed. There is far more data available today, and it exists in forms that were unheard of a century ago. The data collection process has had to change and grow with the times, keeping pace with technology.

Whether you’re in the world of academia, trying to conduct research, or part of the commercial sector, thinking of how to promote a new product, you need data collection to help you make better choices.

Now that you know what is data collection and why we need it, let's take a look at the different methods of data collection. While the phrase “data collection” may sound all high-tech and digital, it doesn’t necessarily entail things like computers, big data , and the internet. Data collection could mean a telephone survey, a mail-in comment card, or even some guy with a clipboard asking passersby some questions. But let’s see if we can sort the different data collection methods into a semblance of organized categories.

Primary and secondary methods of data collection are two approaches used to gather information for research or analysis purposes. Let's explore each data collection method in detail:

1. Primary Data Collection:

Primary data collection involves the collection of original data directly from the source or through direct interaction with the respondents. This method allows researchers to obtain firsthand information specifically tailored to their research objectives. There are various techniques for primary data collection, including:

a. Surveys and Questionnaires: Researchers design structured questionnaires or surveys to collect data from individuals or groups. These can be conducted through face-to-face interviews, telephone calls, mail, or online platforms.

b. Interviews: Interviews involve direct interaction between the researcher and the respondent. They can be conducted in person, over the phone, or through video conferencing. Interviews can be structured (with predefined questions), semi-structured (allowing flexibility), or unstructured (more conversational).

c. Observations: Researchers observe and record behaviors, actions, or events in their natural setting. This method is useful for gathering data on human behavior, interactions, or phenomena without direct intervention.

d. Experiments: Experimental studies involve the manipulation of variables to observe their impact on the outcome. Researchers control the conditions and collect data to draw conclusions about cause-and-effect relationships.

e. Focus Groups: Focus groups bring together a small group of individuals who discuss specific topics in a moderated setting. This method helps in understanding opinions, perceptions, and experiences shared by the participants.

2. Secondary Data Collection:

Secondary data collection involves using existing data collected by someone else for a purpose different from the original intent. Researchers analyze and interpret this data to extract relevant information. Secondary data can be obtained from various sources, including:

a. Published Sources: Researchers refer to books, academic journals, magazines, newspapers, government reports, and other published materials that contain relevant data.

b. Online Databases: Numerous online databases provide access to a wide range of secondary data, such as research articles, statistical information, economic data, and social surveys.

c. Government and Institutional Records: Government agencies, research institutions, and organizations often maintain databases or records that can be used for research purposes.

d. Publicly Available Data: Data shared by individuals, organizations, or communities on public platforms, websites, or social media can be accessed and utilized for research.

e. Past Research Studies: Previous research studies and their findings can serve as valuable secondary data sources. Researchers can review and analyze the data to gain insights or build upon existing knowledge.

Now that we’ve explained the various techniques, let’s narrow our focus even further by looking at some specific tools. For example, we mentioned interviews as a technique, but we can further break that down into different interview types (or “tools”).

Word Association

The researcher gives the respondent a set of words and asks them what comes to mind when they hear each word.

Sentence Completion

Researchers use sentence completion to understand what kind of ideas the respondent has. This tool involves giving an incomplete sentence and seeing how the interviewee finishes it.

Role-Playing

Respondents are presented with an imaginary situation and asked how they would act or react if it was real.

In-Person Surveys

The researcher asks questions in person.

Online/Web Surveys

These surveys are easy to accomplish, but some users may be unwilling to answer truthfully, if at all.

Mobile Surveys

These surveys take advantage of the increasing proliferation of mobile technology. Mobile collection surveys rely on mobile devices like tablets or smartphones to conduct surveys via SMS or mobile apps.

Phone Surveys

No researcher can call thousands of people at once, so they need a third party to handle the chore. However, many people have call screening and won’t answer.

Observation

Sometimes, the simplest method is the best. Researchers who make direct observations collect data quickly and easily, with little intrusion or third-party bias. Naturally, it’s only effective in small-scale situations.

Accurate data collecting is crucial to preserving the integrity of research, regardless of the subject of study or preferred method for defining data (quantitative, qualitative). Errors are less likely to occur when the right data gathering tools are used (whether they are brand-new ones, updated versions of them, or already available).

Among the effects of data collection done incorrectly, include the following -

  • Erroneous conclusions that squander resources
  • Decisions that compromise public policy
  • Incapacity to correctly respond to research inquiries
  • Bringing harm to participants who are humans or animals
  • Deceiving other researchers into pursuing futile research avenues
  • The study's inability to be replicated and validated

When these study findings are used to support recommendations for public policy, there is the potential to result in disproportionate harm, even if the degree of influence from flawed data collecting may vary by discipline and the type of investigation.

Let us now look at the various issues that we might face while maintaining the integrity of data collection.

In order to assist the errors detection process in the data gathering process, whether they were done purposefully (deliberate falsifications) or not, maintaining data integrity is the main justification (systematic or random errors).

Quality assurance and quality control are two strategies that help protect data integrity and guarantee the scientific validity of study results.

Each strategy is used at various stages of the research timeline:

  • Quality control - tasks that are performed both after and during data collecting
  • Quality assurance - events that happen before data gathering starts

Let us explore each of them in more detail now.

Quality Assurance

As data collecting comes before quality assurance, its primary goal is "prevention" (i.e., forestalling problems with data collection). The best way to protect the accuracy of data collection is through prevention. The uniformity of protocol created in the thorough and exhaustive procedures manual for data collecting serves as the best example of this proactive step. 

The likelihood of failing to spot issues and mistakes early in the research attempt increases when guides are written poorly. There are several ways to show these shortcomings:

  • Failure to determine the precise subjects and methods for retraining or training staff employees in data collecting
  • List of goods to be collected, in part
  • There isn't a system in place to track modifications to processes that may occur as the investigation continues.
  • Instead of detailed, step-by-step instructions on how to deliver tests, there is a vague description of the data gathering tools that will be employed.
  • Uncertainty regarding the date, procedure, and identity of the person or people in charge of examining the data
  • Incomprehensible guidelines for using, adjusting, and calibrating the data collection equipment.

Now, let us look at how to ensure Quality Control.

Become a Data Scientist With Real-World Experience

Become a Data Scientist With Real-World Experience

Quality Control

Despite the fact that quality control actions (detection/monitoring and intervention) take place both after and during data collection, the specifics should be meticulously detailed in the procedures manual. Establishing monitoring systems requires a specific communication structure, which is a prerequisite. Following the discovery of data collection problems, there should be no ambiguity regarding the information flow between the primary investigators and staff personnel. A poorly designed communication system promotes slack oversight and reduces opportunities for error detection.

Direct staff observation conference calls, during site visits, or frequent or routine assessments of data reports to spot discrepancies, excessive numbers, or invalid codes can all be used as forms of detection or monitoring. Site visits might not be appropriate for all disciplines. Still, without routine auditing of records, whether qualitative or quantitative, it will be challenging for investigators to confirm that data gathering is taking place in accordance with the manual's defined methods. Additionally, quality control determines the appropriate solutions, or "actions," to fix flawed data gathering procedures and reduce recurrences.

Problems with data collection, for instance, that call for immediate action include:

  • Fraud or misbehavior
  • Systematic mistakes, procedure violations 
  • Individual data items with errors
  • Issues with certain staff members or a site's performance 

Researchers are trained to include one or more secondary measures that can be used to verify the quality of information being obtained from the human subject in the social and behavioral sciences where primary data collection entails using human subjects. 

For instance, a researcher conducting a survey would be interested in learning more about the prevalence of risky behaviors among young adults as well as the social factors that influence these risky behaviors' propensity for and frequency. Let us now explore the common challenges with regard to data collection.

There are some prevalent challenges faced while collecting data, let us explore a few of them to understand them better and avoid them.

Data Quality Issues

The main threat to the broad and successful application of machine learning is poor data quality. Data quality must be your top priority if you want to make technologies like machine learning work for you. Let's talk about some of the most prevalent data quality problems in this blog article and how to fix them.

Inconsistent Data

When working with various data sources, it's conceivable that the same information will have discrepancies between sources. The differences could be in formats, units, or occasionally spellings. The introduction of inconsistent data might also occur during firm mergers or relocations. Inconsistencies in data have a tendency to accumulate and reduce the value of data if they are not continually resolved. Organizations that have heavily focused on data consistency do so because they only want reliable data to support their analytics.

Data Downtime

Data is the driving force behind the decisions and operations of data-driven businesses. However, there may be brief periods when their data is unreliable or not prepared. Customer complaints and subpar analytical outcomes are only two ways that this data unavailability can have a significant impact on businesses. A data engineer spends about 80% of their time updating, maintaining, and guaranteeing the integrity of the data pipeline. In order to ask the next business question, there is a high marginal cost due to the lengthy operational lead time from data capture to insight.

Schema modifications and migration problems are just two examples of the causes of data downtime. Data pipelines can be difficult due to their size and complexity. Data downtime must be continuously monitored, and it must be reduced through automation.

Ambiguous Data

Even with thorough oversight, some errors can still occur in massive databases or data lakes. For data streaming at a fast speed, the issue becomes more overwhelming. Spelling mistakes can go unnoticed, formatting difficulties can occur, and column heads might be deceptive. This unclear data might cause a number of problems for reporting and analytics.

Become a Data Science Expert & Get Your Dream Job

Become a Data Science Expert & Get Your Dream Job

Duplicate Data

Streaming data, local databases, and cloud data lakes are just a few of the sources of data that modern enterprises must contend with. They might also have application and system silos. These sources are likely to duplicate and overlap each other quite a bit. For instance, duplicate contact information has a substantial impact on customer experience. If certain prospects are ignored while others are engaged repeatedly, marketing campaigns suffer. The likelihood of biased analytical outcomes increases when duplicate data are present. It can also result in ML models with biased training data.

Too Much Data

While we emphasize data-driven analytics and its advantages, a data quality problem with excessive data exists. There is a risk of getting lost in an abundance of data when searching for information pertinent to your analytical efforts. Data scientists, data analysts, and business users devote 80% of their work to finding and organizing the appropriate data. With an increase in data volume, other problems with data quality become more serious, particularly when dealing with streaming data and big files or databases.

Inaccurate Data

For highly regulated businesses like healthcare, data accuracy is crucial. Given the current experience, it is more important than ever to increase the data quality for COVID-19 and later pandemics. Inaccurate information does not provide you with a true picture of the situation and cannot be used to plan the best course of action. Personalized customer experiences and marketing strategies underperform if your customer data is inaccurate.

Data inaccuracies can be attributed to a number of things, including data degradation, human mistake, and data drift. Worldwide data decay occurs at a rate of about 3% per month, which is quite concerning. Data integrity can be compromised while being transferred between different systems, and data quality might deteriorate with time.

Hidden Data

The majority of businesses only utilize a portion of their data, with the remainder sometimes being lost in data silos or discarded in data graveyards. For instance, the customer service team might not receive client data from sales, missing an opportunity to build more precise and comprehensive customer profiles. Missing out on possibilities to develop novel products, enhance services, and streamline procedures is caused by hidden data.

Finding Relevant Data

Finding relevant data is not so easy. There are several factors that we need to consider while trying to find relevant data, which include -

  • Relevant Domain
  • Relevant demographics
  • Relevant Time period and so many more factors that we need to consider while trying to find relevant data.

Data that is not relevant to our study in any of the factors render it obsolete and we cannot effectively proceed with its analysis. This could lead to incomplete research or analysis, re-collecting data again and again, or shutting down the study.

Deciding the Data to Collect

Determining what data to collect is one of the most important factors while collecting data and should be one of the first factors while collecting data. We must choose the subjects the data will cover, the sources we will be used to gather it, and the quantity of information we will require. Our responses to these queries will depend on our aims, or what we expect to achieve utilizing your data. As an illustration, we may choose to gather information on the categories of articles that website visitors between the ages of 20 and 50 most frequently access. We can also decide to compile data on the typical age of all the clients who made a purchase from your business over the previous month.

Not addressing this could lead to double work and collection of irrelevant data or ruining your study as a whole.

Dealing With Big Data

Big data refers to exceedingly massive data sets with more intricate and diversified structures. These traits typically result in increased challenges while storing, analyzing, and using additional methods of extracting results. Big data refers especially to data sets that are quite enormous or intricate that conventional data processing tools are insufficient. The overwhelming amount of data, both unstructured and structured, that a business faces on a daily basis. 

The amount of data produced by healthcare applications, the internet, social networking sites social, sensor networks, and many other businesses are rapidly growing as a result of recent technological advancements. Big data refers to the vast volume of data created from numerous sources in a variety of formats at extremely fast rates. Dealing with this kind of data is one of the many challenges of Data Collection and is a crucial step toward collecting effective data. 

Low Response and Other Research Issues

Poor design and low response rates were shown to be two issues with data collecting, particularly in health surveys that used questionnaires. This might lead to an insufficient or inadequate supply of data for the study. Creating an incentivized data collection program might be beneficial in this case to get more responses.

Now, let us look at the key steps in the data collection process.

In the Data Collection Process, there are 5 key steps. They are explained briefly below -

1. Decide What Data You Want to Gather

The first thing that we need to do is decide what information we want to gather. We must choose the subjects the data will cover, the sources we will use to gather it, and the quantity of information that we would require. For instance, we may choose to gather information on the categories of products that an average e-commerce website visitor between the ages of 30 and 45 most frequently searches for. 

2. Establish a Deadline for Data Collection

The process of creating a strategy for data collection can now begin. We should set a deadline for our data collection at the outset of our planning phase. Some forms of data we might want to continuously collect. We might want to build up a technique for tracking transactional data and website visitor statistics over the long term, for instance. However, we will track the data throughout a certain time frame if we are tracking it for a particular campaign. In these situations, we will have a schedule for when we will begin and finish gathering data. 

3. Select a Data Collection Approach

We will select the data collection technique that will serve as the foundation of our data gathering plan at this stage. We must take into account the type of information that we wish to gather, the time period during which we will receive it, and the other factors we decide on to choose the best gathering strategy.

4. Gather Information

Once our plan is complete, we can put our data collection plan into action and begin gathering data. In our DMP, we can store and arrange our data. We need to be careful to follow our plan and keep an eye on how it's doing. Especially if we are collecting data regularly, setting up a timetable for when we will be checking in on how our data gathering is going may be helpful. As circumstances alter and we learn new details, we might need to amend our plan.

5. Examine the Information and Apply Your Findings

It's time to examine our data and arrange our findings after we have gathered all of our information. The analysis stage is essential because it transforms unprocessed data into insightful knowledge that can be applied to better our marketing plans, goods, and business judgments. The analytics tools included in our DMP can be used to assist with this phase. We can put the discoveries to use to enhance our business once we have discovered the patterns and insights in our data.

Let us now look at some data collection considerations and best practices that one might follow.

We must carefully plan before spending time and money traveling to the field to gather data. While saving time and resources, effective data collection strategies can help us collect richer, more accurate, and richer data.

Below, we will be discussing some of the best practices that we can follow for the best results -

1. Take Into Account the Price of Each Extra Data Point

Once we have decided on the data we want to gather, we need to make sure to take the expense of doing so into account. Our surveyors and respondents will incur additional costs for each additional data point or survey question.

2. Plan How to Gather Each Data Piece

There is a dearth of freely accessible data. Sometimes the data is there, but we may not have access to it. For instance, unless we have a compelling cause, we cannot openly view another person's medical information. It could be challenging to measure several types of information.

Consider how time-consuming and difficult it will be to gather each piece of information while deciding what data to acquire.

3. Think About Your Choices for Data Collecting Using Mobile Devices

Mobile-based data collecting can be divided into three categories -

  • IVRS (interactive voice response technology) -  Will call the respondents and ask them questions that have already been recorded. 
  • SMS data collection - Will send a text message to the respondent, who can then respond to questions by text on their phone. 
  • Field surveyors - Can directly enter data into an interactive questionnaire while speaking to each respondent, thanks to smartphone apps.

We need to make sure to select the appropriate tool for our survey and responders because each one has its own disadvantages and advantages.

4. Carefully Consider the Data You Need to Gather

It's all too easy to get information about anything and everything, but it's crucial to only gather the information that we require. 

It is helpful to consider these 3 questions:

  • What details will be helpful?
  • What details are available?
  • What specific details do you require?

5. Remember to Consider Identifiers

Identifiers, or details describing the context and source of a survey response, are just as crucial as the information about the subject or program that we are actually researching.

In general, adding more identifiers will enable us to pinpoint our program's successes and failures with greater accuracy, but moderation is the key.

6. Data Collecting Through Mobile Devices is the Way to Go

Although collecting data on paper is still common, modern technology relies heavily on mobile devices. They enable us to gather many various types of data at relatively lower prices and are accurate as well as quick. There aren't many reasons not to pick mobile-based data collecting with the boom of low-cost Android devices that are available nowadays.

The Ultimate Ticket to Top Data Science Job Roles

The Ultimate Ticket to Top Data Science Job Roles

1. What is data collection with example?

Data collection is the process of collecting and analyzing information on relevant variables in a predetermined, methodical way so that one can respond to specific research questions, test hypotheses, and assess results. Data collection can be either qualitative or quantitative. Example: A company collects customer feedback through online surveys and social media monitoring to improve their products and services.

2. What are the primary data collection methods?

As is well known, gathering primary data is costly and time intensive. The main techniques for gathering data are observation, interviews, questionnaires, schedules, and surveys.

3. What are data collection tools?

The term "data collecting tools" refers to the tools/devices used to gather data, such as a paper questionnaire or a system for computer-assisted interviews. Tools used to gather data include case studies, checklists, interviews, occasionally observation, surveys, and questionnaires.

4. What’s the difference between quantitative and qualitative methods?

While qualitative research focuses on words and meanings, quantitative research deals with figures and statistics. You can systematically measure variables and test hypotheses using quantitative methods. You can delve deeper into ideas and experiences using qualitative methodologies.

5. What are quantitative data collection methods?

While there are numerous other ways to get quantitative information, the methods indicated above—probability sampling, interviews, questionnaire observation, and document review—are the most typical and frequently employed, whether collecting information offline or online.

6. What is mixed methods research?

User research that includes both qualitative and quantitative techniques is known as mixed methods research. For deeper user insights, mixed methods research combines insightful user data with useful statistics.

7. What are the benefits of collecting data?

Collecting data offers several benefits, including:

  • Knowledge and Insight
  • Evidence-Based Decision Making
  • Problem Identification and Solution
  • Validation and Evaluation
  • Identifying Trends and Predictions
  • Support for Research and Development
  • Policy Development
  • Quality Improvement
  • Personalization and Targeting
  • Knowledge Sharing and Collaboration

8. What’s the difference between reliability and validity?

Reliability is about consistency and stability, while validity is about accuracy and appropriateness. Reliability focuses on the consistency of results, while validity focuses on whether the results are actually measuring what they are intended to measure. Both reliability and validity are crucial considerations in research to ensure the trustworthiness and meaningfulness of the collected data and measurements.

Are you thinking about pursuing a career in the field of data science? Simplilearn's Data Science courses are designed to provide you with the necessary skills and expertise to excel in this rapidly changing field. Here's a detailed comparison for your reference:

Program Name Data Scientist Master's Program Post Graduate Program In Data Science Post Graduate Program In Data Science Geo All Geos All Geos Not Applicable in US University Simplilearn Purdue Caltech Course Duration 11 Months 11 Months 11 Months Coding Experience Required Basic Basic No Skills You Will Learn 10+ skills including data structure, data manipulation, NumPy, Scikit-Learn, Tableau and more 8+ skills including Exploratory Data Analysis, Descriptive Statistics, Inferential Statistics, and more 8+ skills including Supervised & Unsupervised Learning Deep Learning Data Visualization, and more Additional Benefits Applied Learning via Capstone and 25+ Data Science Projects Purdue Alumni Association Membership Free IIMJobs Pro-Membership of 6 months Resume Building Assistance Upto 14 CEU Credits Caltech CTME Circle Membership Cost $$ $$$$ $$$$ Explore Program Explore Program Explore Program

We live in the Data Age, and if you want a career that fully takes advantage of this, you should consider a career in data science. Simplilearn offers a Caltech Post Graduate Program in Data Science  that will train you in everything you need to know to secure the perfect position. This Data Science PG program is ideal for all working professionals, covering job-critical topics like R, Python programming , machine learning algorithms , NLP concepts , and data visualization with Tableau in great detail. This is all provided via our interactive learning model with live sessions by global practitioners, practical labs, and industry projects.

Data Science & Business Analytics Courses Duration and Fees

Data Science & Business Analytics programs typically range from a few weeks to several months, with fees varying based on program and institution.

Recommended Reads

Data Science Career Guide: A Comprehensive Playbook To Becoming A Data Scientist

Capped Collection in MongoDB

An Ultimate One-Stop Solution Guide to Collections in C# Programming With Examples

Managing Data

Difference Between Collection and Collections in Java

What Are Java Collections and How to Implement Them?

Get Affiliated Certifications with Live Class programs

Data scientist.

  • Add the IBM Advantage to your Learning
  • 25 Industry-relevant Projects and Integrated labs

Caltech Data Sciences-Bootcamp

  • Exclusive visit to Caltech’s Robotics Lab

Caltech Post Graduate Program in Data Science

  • Earn a program completion certificate from Caltech CTME
  • Curriculum delivered in live online sessions by industry experts
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Data Collection Methods | Step-by-Step Guide & Examples

Data Collection Methods | Step-by-Step Guide & Examples

Published on 4 May 2022 by Pritha Bhandari .

Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental, or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem .

While methods and aims may differ between fields, the overall process of data collection remains largely the same. Before you begin collecting data, you need to consider:

  • The  aim of the research
  • The type of data that you will collect
  • The methods and procedures you will use to collect, store, and process the data

To collect high-quality data that is relevant to your purposes, follow these four steps.

Table of contents

Step 1: define the aim of your research, step 2: choose your data collection method, step 3: plan your data collection procedures, step 4: collect the data, frequently asked questions about data collection.

Before you start the process of data collection, you need to identify exactly what you want to achieve. You can start by writing a problem statement : what is the practical or scientific issue that you want to address, and why does it matter?

Next, formulate one or more research questions that precisely define what you want to find out. Depending on your research questions, you might need to collect quantitative or qualitative data :

  • Quantitative data is expressed in numbers and graphs and is analysed through statistical methods .
  • Qualitative data is expressed in words and analysed through interpretations and categorisations.

If your aim is to test a hypothesis , measure something precisely, or gain large-scale statistical insights, collect quantitative data. If your aim is to explore ideas, understand experiences, or gain detailed insights into a specific context, collect qualitative data.

If you have several aims, you can use a mixed methods approach that collects both types of data.

  • Your first aim is to assess whether there are significant differences in perceptions of managers across different departments and office locations.
  • Your second aim is to gather meaningful feedback from employees to explore new ideas for how managers can improve.

Prevent plagiarism, run a free check.

Based on the data you want to collect, decide which method is best suited for your research.

  • Experimental research is primarily a quantitative method.
  • Interviews , focus groups , and ethnographies are qualitative methods.
  • Surveys , observations, archival research, and secondary data collection can be quantitative or qualitative methods.

Carefully consider what method you will use to gather data that helps you directly answer your research questions.

When you know which method(s) you are using, you need to plan exactly how you will implement them. What procedures will you follow to make accurate observations or measurements of the variables you are interested in?

For instance, if you’re conducting surveys or interviews, decide what form the questions will take; if you’re conducting an experiment, make decisions about your experimental design .

Operationalisation

Sometimes your variables can be measured directly: for example, you can collect data on the average age of employees simply by asking for dates of birth. However, often you’ll be interested in collecting data on more abstract concepts or variables that can’t be directly observed.

Operationalisation means turning abstract conceptual ideas into measurable observations. When planning how you will collect data, you need to translate the conceptual definition of what you want to study into the operational definition of what you will actually measure.

  • You ask managers to rate their own leadership skills on 5-point scales assessing the ability to delegate, decisiveness, and dependability.
  • You ask their direct employees to provide anonymous feedback on the managers regarding the same topics.

You may need to develop a sampling plan to obtain data systematically. This involves defining a population , the group you want to draw conclusions about, and a sample, the group you will actually collect data from.

Your sampling method will determine how you recruit participants or obtain measurements for your study. To decide on a sampling method you will need to consider factors like the required sample size, accessibility of the sample, and time frame of the data collection.

Standardising procedures

If multiple researchers are involved, write a detailed manual to standardise data collection procedures in your study.

This means laying out specific step-by-step instructions so that everyone in your research team collects data in a consistent way – for example, by conducting experiments under the same conditions and using objective criteria to record and categorise observations.

This helps ensure the reliability of your data, and you can also use it to replicate the study in the future.

Creating a data management plan

Before beginning data collection, you should also decide how you will organise and store your data.

  • If you are collecting data from people, you will likely need to anonymise and safeguard the data to prevent leaks of sensitive information (e.g. names or identity numbers).
  • If you are collecting data via interviews or pencil-and-paper formats, you will need to perform transcriptions or data entry in systematic ways to minimise distortion.
  • You can prevent loss of data by having an organisation system that is routinely backed up.

Finally, you can implement your chosen methods to measure or observe the variables you are interested in.

The closed-ended questions ask participants to rate their manager’s leadership skills on scales from 1 to 5. The data produced is numerical and can be statistically analysed for averages and patterns.

To ensure that high-quality data is recorded in a systematic way, here are some best practices:

  • Record all relevant information as and when you obtain data. For example, note down whether or how lab equipment is recalibrated during an experimental study.
  • Double-check manual data entry for errors.
  • If you collect quantitative data, you can assess the reliability and validity to get an indication of your data quality.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organisations.

When conducting research, collecting original data has significant advantages:

  • You can tailor data collection to your specific research aims (e.g., understanding the needs of your consumers or user testing your website).
  • You can control and standardise the process for high reliability and validity (e.g., choosing appropriate measurements and sampling methods ).

However, there are also some drawbacks: data collection can be time-consuming, labour-intensive, and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to test a hypothesis by systematically collecting and analysing data, while qualitative methods allow you to explore ideas and experiences in depth.

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research , you also have to consider the internal and external validity of your experiment.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

Operationalisation means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioural avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalise the variables that you want to measure.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, May 04). Data Collection Methods | Step-by-Step Guide & Examples. Scribbr. Retrieved 25 March 2024, from https://www.scribbr.co.uk/research-methods/data-collection-guide/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, qualitative vs quantitative research | examples & methods, triangulation in research | guide, types, examples, what is a conceptual framework | tips & examples.

  • 7 Data Collection Methods & Tools For Research

busayo.longe

  • Data Collection

The underlying need for Data collection is to capture quality evidence that seeks to answer all the questions that have been posed. Through data collection businesses or management can deduce quality information that is a prerequisite for making informed decisions.

To improve the quality of information, it is expedient that data is collected so that you can draw inferences and make informed decisions on what is considered factual.

At the end of this article, you would understand why picking the best data collection method is necessary for achieving your set objective. 

Sign up on Formplus Builder to create your preferred online surveys or questionnaire for data collection. You don’t need to be tech-savvy! Start creating quality questionnaires with Formplus.

What is Data Collection?

Data collection is a methodical process of gathering and analyzing specific information to proffer solutions to relevant questions and evaluate the results. It focuses on finding out all there is to a particular subject matter. Data is collected to be further subjected to hypothesis testing which seeks to explain a phenomenon.

Hypothesis testing eliminates assumptions while making a proposition from the basis of reason.

methods of data collection on research

For collectors of data, there is a range of outcomes for which the data is collected. But the key purpose for which data is collected is to put a researcher in a vantage position to make predictions about future probabilities and trends.

The core forms in which data can be collected are primary and secondary data. While the former is collected by a researcher through first-hand sources, the latter is collected by an individual other than the user. 

Types of Data Collection 

Before broaching the subject of the various types of data collection. It is pertinent to note that data collection in itself falls under two broad categories; Primary data collection and secondary data collection.

Primary Data Collection

Primary data collection by definition is the gathering of raw data collected at the source. It is a process of collecting the original data collected by a researcher for a specific research purpose. It could be further analyzed into two segments; qualitative research and quantitative data collection methods. 

  • Qualitative Research Method 

The qualitative research methods of data collection do not involve the collection of data that involves numbers or a need to be deduced through a mathematical calculation, rather it is based on the non-quantifiable elements like the feeling or emotion of the researcher. An example of such a method is an open-ended questionnaire.

methods of data collection on research

  • Quantitative Method

Quantitative methods are presented in numbers and require a mathematical calculation to deduce. An example would be the use of a questionnaire with close-ended questions to arrive at figures to be calculated Mathematically. Also, methods of correlation and regression, mean, mode and median.

methods of data collection on research

Read Also: 15 Reasons to Choose Quantitative over Qualitative Research

Secondary Data Collection

Secondary data collection, on the other hand, is referred to as the gathering of second-hand data collected by an individual who is not the original user. It is the process of collecting data that is already existing, be it already published books, journals, and/or online portals. In terms of ease, it is much less expensive and easier to collect.

Your choice between Primary data collection and secondary data collection depends on the nature, scope, and area of your research as well as its aims and objectives. 

Importance of Data Collection

There are a bunch of underlying reasons for collecting data, especially for a researcher. Walking you through them, here are a few reasons; 

  • Integrity of the Research

A key reason for collecting data, be it through quantitative or qualitative methods is to ensure that the integrity of the research question is indeed maintained.

  • Reduce the likelihood of errors

The correct use of appropriate data collection of methods reduces the likelihood of errors consistent with the results. 

  • Decision Making

To minimize the risk of errors in decision-making, it is important that accurate data is collected so that the researcher doesn’t make uninformed decisions. 

  • Save Cost and Time

Data collection saves the researcher time and funds that would otherwise be misspent without a deeper understanding of the topic or subject matter.

  • To support a need for a new idea, change, and/or innovation

To prove the need for a change in the norm or the introduction of new information that will be widely accepted, it is important to collect data as evidence to support these claims.

What is a Data Collection Tool?

Data collection tools refer to the devices/instruments used to collect data, such as a paper questionnaire or computer-assisted interviewing system. Case Studies, Checklists, Interviews, Observation sometimes, and Surveys or Questionnaires are all tools used to collect data.

It is important to decide on the tools for data collection because research is carried out in different ways and for different purposes. The objective behind data collection is to capture quality evidence that allows analysis to lead to the formulation of convincing and credible answers to the posed questions.

The objective behind data collection is to capture quality evidence that allows analysis to lead to the formulation of convincing and credible answers to the questions that have been posed – Click to Tweet

The Formplus online data collection tool is perfect for gathering primary data, i.e. raw data collected from the source. You can easily get data with at least three data collection methods with our online and offline data-gathering tool. I.e Online Questionnaires , Focus Groups, and Reporting. 

In our previous articles, we’ve explained why quantitative research methods are more effective than qualitative methods . However, with the Formplus data collection tool, you can gather all types of primary data for academic, opinion or product research.

Top Data Collection Methods and Tools for Academic, Opinion, or Product Research

The following are the top 7 data collection methods for Academic, Opinion-based, or product research. Also discussed in detail are the nature, pros, and cons of each one. At the end of this segment, you will be best informed about which method best suits your research. 

An interview is a face-to-face conversation between two individuals with the sole purpose of collecting relevant information to satisfy a research purpose. Interviews are of different types namely; Structured, Semi-structured , and unstructured with each having a slight variation from the other.

Use this interview consent form template to let an interviewee give you consent to use data gotten from your interviews for investigative research purposes.

  • Structured Interviews – Simply put, it is a verbally administered questionnaire. In terms of depth, it is surface level and is usually completed within a short period. For speed and efficiency, it is highly recommendable, but it lacks depth.
  • Semi-structured Interviews – In this method, there subsist several key questions which cover the scope of the areas to be explored. It allows a little more leeway for the researcher to explore the subject matter.
  • Unstructured Interviews – It is an in-depth interview that allows the researcher to collect a wide range of information with a purpose. An advantage of this method is the freedom it gives a researcher to combine structure with flexibility even though it is more time-consuming.
  • In-depth information
  • Freedom of flexibility
  • Accurate data.
  • Time-consuming
  • Expensive to collect.

What are The Best Data Collection Tools for Interviews? 

For collecting data through interviews, here are a few tools you can use to easily collect data.

  • Audio Recorder

An audio recorder is used for recording sound on disc, tape, or film. Audio information can meet the needs of a wide range of people, as well as provide alternatives to print data collection tools.

  • Digital Camera

An advantage of a digital camera is that it can be used for transmitting those images to a monitor screen when the need arises.

A camcorder is used for collecting data through interviews. It provides a combination of both an audio recorder and a video camera. The data provided is qualitative in nature and allows the respondents to answer questions asked exhaustively. If you need to collect sensitive information during an interview, a camcorder might not work for you as you would need to maintain your subject’s privacy.

Want to conduct an interview for qualitative data research or a special report? Use this online interview consent form template to allow the interviewee to give their consent before you use the interview data for research or report. With premium features like e-signature, upload fields, form security, etc., Formplus Builder is the perfect tool to create your preferred online consent forms without coding experience. 

  • QUESTIONNAIRES

This is the process of collecting data through an instrument consisting of a series of questions and prompts to receive a response from the individuals it is administered to. Questionnaires are designed to collect data from a group. 

For clarity, it is important to note that a questionnaire isn’t a survey, rather it forms a part of it. A survey is a process of data gathering involving a variety of data collection methods, including a questionnaire.

On a questionnaire, there are three kinds of questions used. They are; fixed-alternative, scale, and open-ended. With each of the questions tailored to the nature and scope of the research.

  • Can be administered in large numbers and is cost-effective.
  • It can be used to compare and contrast previous research to measure change.
  • Easy to visualize and analyze.
  • Questionnaires offer actionable data.
  • Respondent identity is protected.
  • Questionnaires can cover all areas of a topic.
  • Relatively inexpensive.
  • Answers may be dishonest or the respondents lose interest midway.
  • Questionnaires can’t produce qualitative data.
  • Questions might be left unanswered.
  • Respondents may have a hidden agenda.
  • Not all questions can be analyzed easily.

What are the Best Data Collection Tools for Questionnaires? 

  • Formplus Online Questionnaire

Formplus lets you create powerful forms to help you collect the information you need. Formplus helps you create the online forms that you like. The Formplus online questionnaire form template to get actionable trends and measurable responses. Conduct research, optimize knowledge of your brand or just get to know an audience with this form template. The form template is fast, free and fully customizable.

  • Paper Questionnaire

A paper questionnaire is a data collection tool consisting of a series of questions and/or prompts for the purpose of gathering information from respondents. Mostly designed for statistical analysis of the responses, they can also be used as a form of data collection.

By definition, data reporting is the process of gathering and submitting data to be further subjected to analysis. The key aspect of data reporting is reporting accurate data because inaccurate data reporting leads to uninformed decision-making.

  • Informed decision-making.
  • Easily accessible.
  • Self-reported answers may be exaggerated.
  • The results may be affected by bias.
  • Respondents may be too shy to give out all the details.
  • Inaccurate reports will lead to uninformed decisions.

What are the Best Data Collection Tools for Reporting?

Reporting tools enable you to extract and present data in charts, tables, and other visualizations so users can find useful information. You could source data for reporting from Non-Governmental Organizations (NGO) reports, newspapers, website articles, and hospital records.

  • NGO Reports

Contained in NGO report is an in-depth and comprehensive report on the activities carried out by the NGO, covering areas such as business and human rights. The information contained in these reports is research-specific and forms an acceptable academic base for collecting data. NGOs often focus on development projects which are organized to promote particular causes.

Newspaper data are relatively easy to collect and are sometimes the only continuously available source of event data. Even though there is a problem of bias in newspaper data, it is still a valid tool in collecting data for Reporting.

  • Website Articles

Gathering and using data contained in website articles is also another tool for data collection. Collecting data from web articles is a quicker and less expensive data collection Two major disadvantages of using this data reporting method are biases inherent in the data collection process and possible security/confidentiality concerns.

  • Hospital Care records

Health care involves a diverse set of public and private data collection systems, including health surveys, administrative enrollment and billing records, and medical records, used by various entities, including hospitals, CHCs, physicians, and health plans. The data provided is clear, unbiased and accurate, but must be obtained under legal means as medical data is kept with the strictest regulations.

  • EXISTING DATA

This is the introduction of new investigative questions in addition to/other than the ones originally used when the data was initially gathered. It involves adding measurement to a study or research. An example would be sourcing data from an archive.

  • Accuracy is very high.
  • Easily accessible information.
  • Problems with evaluation.
  • Difficulty in understanding.

What are the Best Data Collection Tools for Existing Data?

The concept of Existing data means that data is collected from existing sources to investigate research questions other than those for which the data were originally gathered. Tools to collect existing data include: 

  • Research Journals – Unlike newspapers and magazines, research journals are intended for an academic or technical audience, not general readers. A journal is a scholarly publication containing articles written by researchers, professors, and other experts.
  • Surveys – A survey is a data collection tool for gathering information from a sample population, with the intention of generalizing the results to a larger population. Surveys have a variety of purposes and can be carried out in many ways depending on the objectives to be achieved.
  • OBSERVATION

This is a data collection method by which information on a phenomenon is gathered through observation. The nature of the observation could be accomplished either as a complete observer, an observer as a participant, a participant as an observer, or as a complete participant. This method is a key base for formulating a hypothesis.

  • Easy to administer.
  • There subsists a greater accuracy with results.
  • It is a universally accepted practice.
  • It diffuses the situation of the unwillingness of respondents to administer a report.
  • It is appropriate for certain situations.
  • Some phenomena aren’t open to observation.
  • It cannot be relied upon.
  • Bias may arise.
  • It is expensive to administer.
  • Its validity cannot be predicted accurately.

What are the Best Data Collection Tools for Observation?

Observation involves the active acquisition of information from a primary source. Observation can also involve the perception and recording of data via the use of scientific instruments. The best tools for Observation are:

  • Checklists – state-specific criteria, that allow users to gather information and make judgments about what they should know in relation to the outcomes. They offer systematic ways of collecting data about specific behaviors, knowledge, and skills.
  • Direct observation – This is an observational study method of collecting evaluative information. The evaluator watches the subject in his or her usual environment without altering that environment.

FOCUS GROUPS

The opposite of quantitative research which involves numerical-based data, this data collection method focuses more on qualitative research. It falls under the primary category of data based on the feelings and opinions of the respondents. This research involves asking open-ended questions to a group of individuals usually ranging from 6-10 people, to provide feedback.

  • Information obtained is usually very detailed.
  • Cost-effective when compared to one-on-one interviews.
  • It reflects speed and efficiency in the supply of results.
  • Lacking depth in covering the nitty-gritty of a subject matter.
  • Bias might still be evident.
  • Requires interviewer training
  • The researcher has very little control over the outcome.
  • A few vocal voices can drown out the rest.
  • Difficulty in assembling an all-inclusive group.

What are the Best Data Collection Tools for Focus Groups?

A focus group is a data collection method that is tightly facilitated and structured around a set of questions. The purpose of the meeting is to extract from the participants’ detailed responses to these questions. The best tools for tackling Focus groups are: 

  • Two-Way – One group watches another group answer the questions posed by the moderator. After listening to what the other group has to offer, the group that listens is able to facilitate more discussion and could potentially draw different conclusions .
  • Dueling-Moderator – There are two moderators who play the devil’s advocate. The main positive of the dueling-moderator focus group is to facilitate new ideas by introducing new ways of thinking and varying viewpoints.
  • COMBINATION RESEARCH

This method of data collection encompasses the use of innovative methods to enhance participation in both individuals and groups. Also under the primary category, it is a combination of Interviews and Focus Groups while collecting qualitative data . This method is key when addressing sensitive subjects. 

  • Encourage participants to give responses.
  • It stimulates a deeper connection between participants.
  • The relative anonymity of respondents increases participation.
  • It improves the richness of the data collected.
  • It costs the most out of all the top 7.
  • It’s the most time-consuming.

What are the Best Data Collection Tools for Combination Research? 

The Combination Research method involves two or more data collection methods, for instance, interviews as well as questionnaires or a combination of semi-structured telephone interviews and focus groups. The best tools for combination research are: 

  • Online Survey –  The two tools combined here are online interviews and the use of questionnaires. This is a questionnaire that the target audience can complete over the Internet. It is timely, effective, and efficient. Especially since the data to be collected is quantitative in nature.
  • Dual-Moderator – The two tools combined here are focus groups and structured questionnaires. The structured questionnaires give a direction as to where the research is headed while two moderators take charge of the proceedings. Whilst one ensures the focus group session progresses smoothly, the other makes sure that the topics in question are all covered. Dual-moderator focus groups typically result in a more productive session and essentially lead to an optimum collection of data.

Why Formplus is the Best Data Collection Tool

  • Vast Options for Form Customization 

With Formplus, you can create your unique survey form. With options to change themes, font color, font, font type, layout, width, and more, you can create an attractive survey form. The builder also gives you as many features as possible to choose from and you do not need to be a graphic designer to create a form.

  • Extensive Analytics

Form Analytics, a feature in formplus helps you view the number of respondents, unique visits, total visits, abandonment rate, and average time spent before submission. This tool eliminates the need for a manual calculation of the received data and/or responses as well as the conversion rate for your poll.

  • Embed Survey Form on Your Website

Copy the link to your form and embed it as an iframe which will automatically load as your website loads, or as a popup that opens once the respondent clicks on the link. Embed the link on your Twitter page to give instant access to your followers.

methods of data collection on research

  • Geolocation Support

The geolocation feature on Formplus lets you ascertain where individual responses are coming. It utilises Google Maps to pinpoint the longitude and latitude of the respondent, to the nearest accuracy, along with the responses.

  • Multi-Select feature

This feature helps to conserve horizontal space as it allows you to put multiple options in one field. This translates to including more information on the survey form. 

Read Also: 10 Reasons to Use Formplus for Online Data Collection

How to Use Formplus to collect online data in 7 simple steps. 

  • Register or sign up on Formplus builder : Start creating your preferred questionnaire or survey by signing up with either your Google, Facebook, or Email account.

methods of data collection on research

Formplus gives you a free plan with basic features you can use to collect online data. Pricing plans with vast features starts at $20 monthly, with reasonable discounts for Education and Non-Profit Organizations. 

2. Input your survey title and use the form builder choice options to start creating your surveys. 

Use the choice option fields like single select, multiple select, checkbox, radio, and image choices to create your preferred multi-choice surveys online.

methods of data collection on research

3. Do you want customers to rate any of your products or services delivery? 

Use the rating to allow survey respondents rate your products or services. This is an ideal quantitative research method of collecting data. 

methods of data collection on research

4. Beautify your online questionnaire with Formplus Customisation features.

methods of data collection on research

  • Change the theme color
  • Add your brand’s logo and image to the forms
  • Change the form width and layout
  • Edit the submission button if you want
  • Change text font color and sizes
  • Do you have already made custom CSS to beautify your questionnaire? If yes, just copy and paste it to the CSS option.

5. Edit your survey questionnaire settings for your specific needs

Choose where you choose to store your files and responses. Select a submission deadline, choose a timezone, limit respondents’ responses, enable Captcha to prevent spam, and collect location data of customers.

methods of data collection on research

Set an introductory message to respondents before they begin the survey, toggle the “start button” post final submission message or redirect respondents to another page when they submit their questionnaires. 

Change the Email Notifications inventory and initiate an autoresponder message to all your survey questionnaire respondents. You can also transfer your forms to other users who can become form administrators.

6. Share links to your survey questionnaire page with customers.

There’s an option to copy and share the link as “Popup” or “Embed code” The data collection tool automatically creates a QR Code for Survey Questionnaire which you can download and share as appropriate. 

methods of data collection on research

Congratulations if you’ve made it to this stage. You can start sharing the link to your survey questionnaire with your customers.

7. View your Responses to the Survey Questionnaire

Toggle with the presentation of your summary from the options. Whether as a single, table or cards.

methods of data collection on research

8. Allow Formplus Analytics to interpret your Survey Questionnaire Data

methods of data collection on research

  With online form builder analytics, a business can determine;

  • The number of times the survey questionnaire was filled
  • The number of customers reached
  • Abandonment Rate: The rate at which customers exit the form without submitting it.
  • Conversion Rate: The percentage of customers who completed the online form
  • Average time spent per visit
  • Location of customers/respondents.
  • The type of device used by the customer to complete the survey questionnaire.

7 Tips to Create The Best Surveys For Data Collections

  •  Define the goal of your survey – Once the goal of your survey is outlined, it will aid in deciding which questions are the top priority. A clear attainable goal would, for example, mirror a clear reason as to why something is happening. e.g. “The goal of this survey is to understand why Employees are leaving an establishment.”
  • Use close-ended clearly defined questions – Avoid open-ended questions and ensure you’re not suggesting your preferred answer to the respondent. If possible offer a range of answers with choice options and ratings.
  • Survey outlook should be attractive and Inviting – An attractive-looking survey encourages a higher number of recipients to respond to the survey. Check out Formplus Builder for colorful options to integrate into your survey design. You could use images and videos to keep participants glued to their screens.
  •   Assure Respondents about the safety of their data – You want your respondents to be assured whilst disclosing details of their personal information to you. It’s your duty to inform the respondents that the data they provide is confidential and only collected for the purpose of research.
  • Ensure your survey can be completed in record time – Ideally, in a typical survey, users should be able to respond in 100 seconds. It is pertinent to note that they, the respondents, are doing you a favor. Don’t stress them. Be brief and get straight to the point.
  • Do a trial survey – Preview your survey before sending out your surveys to the intended respondents. Make a trial version which you’ll send to a few individuals. Based on their responses, you can draw inferences and decide whether or not your survey is ready for the big time.
  • Attach a reward upon completion for users – Give your respondents something to look forward to at the end of the survey. Think of it as a penny for their troubles. It could well be the encouragement they need to not abandon the survey midway.

Try out Formplus today . You can start making your own surveys with the Formplus online survey builder. By applying these tips, you will definitely get the most out of your online surveys.

Top Survey Templates For Data Collection 

  • Customer Satisfaction Survey Template 

On the template, you can collect data to measure customer satisfaction over key areas like the commodity purchase and the level of service they received. It also gives insight as to which products the customer enjoyed, how often they buy such a product, and whether or not the customer is likely to recommend the product to a friend or acquaintance. 

  • Demographic Survey Template

With this template, you would be able to measure, with accuracy, the ratio of male to female, age range, and the number of unemployed persons in a particular country as well as obtain their personal details such as names and addresses.

Respondents are also able to state their religious and political views about the country under review.

  • Feedback Form Template

Contained in the template for the online feedback form is the details of a product and/or service used. Identifying this product or service and documenting how long the customer has used them.

The overall satisfaction is measured as well as the delivery of the services. The likelihood that the customer also recommends said product is also measured.

  • Online Questionnaire Template

The online questionnaire template houses the respondent’s data as well as educational qualifications to collect information to be used for academic research.

Respondents can also provide their gender, race, and field of study as well as present living conditions as prerequisite data for the research study.

  • Student Data Sheet Form Template 

The template is a data sheet containing all the relevant information of a student. The student’s name, home address, guardian’s name, record of attendance as well as performance in school is well represented on this template. This is a perfect data collection method to deploy for a school or an education organization.

Also included is a record for interaction with others as well as a space for a short comment on the overall performance and attitude of the student. 

  • Interview Consent Form Template

This online interview consent form template allows the interviewee to sign off their consent to use the interview data for research or report to journalists. With premium features like short text fields, upload, e-signature, etc., Formplus Builder is the perfect tool to create your preferred online consent forms without coding experience.

What is the Best Data Collection Method for Qualitative Data?

Answer: Combination Research

The best data collection method for a researcher for gathering qualitative data which generally is data relying on the feelings, opinions, and beliefs of the respondents would be Combination Research.

The reason why combination research is the best fit is that it encompasses the attributes of Interviews and Focus Groups. It is also useful when gathering data that is sensitive in nature. It can be described as an all-purpose quantitative data collection method.

Above all, combination research improves the richness of data collected when compared with other data collection methods for qualitative data.

methods of data collection on research

What is the Best Data Collection Method for Quantitative Research Data?

Ans: Questionnaire

The best data collection method a researcher can employ in gathering quantitative data which takes into consideration data that can be represented in numbers and figures that can be deduced mathematically is the Questionnaire.

These can be administered to a large number of respondents while saving costs. For quantitative data that may be bulky or voluminous in nature, the use of a Questionnaire makes such data easy to visualize and analyze.

Another key advantage of the Questionnaire is that it can be used to compare and contrast previous research work done to measure changes.

Technology-Enabled Data Collection Methods

There are so many diverse methods available now in the world because technology has revolutionized the way data is being collected. It has provided efficient and innovative methods that anyone, especially researchers and organizations. Below are some technology-enabled data collection methods:

  • Online Surveys: Online surveys have gained popularity due to their ease of use and wide reach. You can distribute them through email, social media, or embed them on websites. Online surveys allow you to quickly complete data collection, automated data capture, and real-time analysis. Online surveys also offer features like skip logic, validation checks, and multimedia integration.
  • Mobile Surveys: With the widespread use of smartphones, mobile surveys’ popularity is also on the rise. Mobile surveys leverage the capabilities of mobile devices, and this allows respondents to participate at their convenience. This includes multimedia elements, location-based information, and real-time feedback. Mobile surveys are the best for capturing in-the-moment experiences or opinions.
  • Social Media Listening: Social media platforms are a good source of unstructured data that you can analyze to gain insights into customer sentiment and trends. Social media listening involves monitoring and analyzing social media conversations, mentions, and hashtags to understand public opinion, identify emerging topics, and assess brand reputation.
  • Wearable Devices and Sensors: You can embed wearable devices, such as fitness trackers or smartwatches, and sensors in everyday objects to capture continuous data on various physiological and environmental variables. This data can provide you with insights into health behaviors, activity patterns, sleep quality, and environmental conditions, among others.
  • Big Data Analytics: Big data analytics leverages large volumes of structured and unstructured data from various sources, such as transaction records, social media, and internet browsing. Advanced analytics techniques, like machine learning and natural language processing, can extract meaningful insights and patterns from this data, enabling organizations to make data-driven decisions.
Read Also: How Technology is Revolutionizing Data Collection

Faulty Data Collection Practices – Common Mistakes & Sources of Error

While technology-enabled data collection methods offer numerous advantages, there are some pitfalls and sources of error that you should be aware of. Here are some common mistakes and sources of error in data collection:

  • Population Specification Error: Population specification error occurs when the target population is not clearly defined or misidentified. This error leads to a mismatch between the research objectives and the actual population being studied, resulting in biased or inaccurate findings.
  • Sample Frame Error: Sample frame error occurs when the sampling frame, the list or source from which the sample is drawn, does not adequately represent the target population. This error can introduce selection bias and affect the generalizability of the findings.
  • Selection Error: Selection error occurs when the process of selecting participants or units for the study introduces bias. It can happen due to nonrandom sampling methods, inadequate sampling techniques, or self-selection bias. Selection error compromises the representativeness of the sample and affects the validity of the results.
  • Nonresponse Error: Nonresponse error occurs when selected participants choose not to participate or fail to respond to the data collection effort. Nonresponse bias can result in an unrepresentative sample if those who choose not to respond differ systematically from those who do respond. Efforts should be made to mitigate nonresponse and encourage participation to minimize this error.
  • Measurement Error: Measurement error arises from inaccuracies or inconsistencies in the measurement process. It can happen due to poorly designed survey instruments, ambiguous questions, respondent bias, or errors in data entry or coding. Measurement errors can lead to distorted or unreliable data, affecting the validity and reliability of the findings.

In order to mitigate these errors and ensure high-quality data collection, you should carefully plan your data collection procedures, and validate measurement tools. You should also use appropriate sampling techniques, employ randomization where possible, and minimize nonresponse through effective communication and incentives. Ensure you conduct regular checks and implement validation processes, and data cleaning procedures to identify and rectify errors during data analysis.

Best Practices for Data Collection

  • Clearly Define Objectives: Clearly define the research objectives and questions to guide the data collection process. This helps ensure that the collected data aligns with the research goals and provides relevant insights.
  • Plan Ahead: Develop a detailed data collection plan that includes the timeline, resources needed, and specific procedures to follow. This helps maintain consistency and efficiency throughout the data collection process.
  • Choose the Right Method: Select data collection methods that are appropriate for the research objectives and target population. Consider factors such as feasibility, cost-effectiveness, and the ability to capture the required data accurately.
  • Pilot Test : Before full-scale data collection, conduct a pilot test to identify any issues with the data collection instruments or procedures. This allows for refinement and improvement before data collection with the actual sample.
  • Train Data Collectors: If data collection involves human interaction, ensure that data collectors are properly trained on the data collection protocols, instruments, and ethical considerations. Consistent training helps minimize errors and maintain data quality.
  • Maintain Consistency: Follow standardized procedures throughout the data collection process to ensure consistency across data collectors and time. This includes using consistent measurement scales, instructions, and data recording methods.
  • Minimize Bias: Be aware of potential sources of bias in data collection and take steps to minimize their impact. Use randomization techniques, employ diverse data collectors, and implement strategies to mitigate response biases.
  • Ensure Data Quality: Implement quality control measures to ensure the accuracy, completeness, and reliability of the collected data. Conduct regular checks for data entry errors, inconsistencies, and missing values.
  • Maintain Data Confidentiality: Protect the privacy and confidentiality of participants’ data by implementing appropriate security measures. Ensure compliance with data protection regulations and obtain informed consent from participants.
  • Document the Process: Keep detailed documentation of the data collection process, including any deviations from the original plan, challenges encountered, and decisions made. This documentation facilitates transparency, replicability, and future analysis.

FAQs about Data Collection

  • What are secondary sources of data collection? Secondary sources of data collection are defined as the data that has been previously gathered and is available for your use as a researcher. These sources can include published research papers, government reports, statistical databases, and other existing datasets.
  • What are the primary sources of data collection? Primary sources of data collection involve collecting data directly from the original source also known as the firsthand sources. You can do this through surveys, interviews, observations, experiments, or other direct interactions with individuals or subjects of study.
  • How many types of data are there? There are two main types of data: qualitative and quantitative. Qualitative data is non-numeric and it includes information in the form of words, images, or descriptions. Quantitative data, on the other hand, is numeric and you can measure and analyze it statistically.
Sign up on Formplus Builder to create your preferred online surveys or questionnaire for data collection. You don’t need to be tech-savvy!

Logo

Connect to Formplus, Get Started Now - It's Free!

  • academic research
  • Data collection method
  • data collection techniques
  • data collection tool
  • data collection tools
  • field data collection
  • online data collection tool
  • product research
  • qualitative research data
  • quantitative research data
  • scientific research
  • busayo.longe

Formplus

You may also like:

How Technology is Revolutionizing Data Collection

As global industrialization continues to transform, it is becoming evident that there is a ubiquity of large datasets driven by the need...

methods of data collection on research

Data Collection Sheet: Types + [Template Examples]

Simple guide on data collection sheet. Types, tools, and template examples.

User Research: Definition, Methods, Tools and Guide

In this article, you’ll learn to provide value to your target market with user research. As a bonus, we’ve added user research tools and...

Data Collection Plan: Definition + Steps to Do It

Introduction A data collection plan is a way to get specific information on your audience. You can use it to better understand what they...

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 22 March 2008

Methods of data collection in qualitative research: interviews and focus groups

  • P. Gill 1 ,
  • K. Stewart 2 ,
  • E. Treasure 3 &
  • B. Chadwick 4  

British Dental Journal volume  204 ,  pages 291–295 ( 2008 ) Cite this article

1.68m Accesses

1028 Citations

40 Altmetric

Metrics details

Interviews and focus groups are the most common methods of data collection used in qualitative healthcare research

Interviews can be used to explore the views, experiences, beliefs and motivations of individual participants

Focus group use group dynamics to generate qualitative data

Qualitative research in dentistry

Conducting qualitative interviews with school children in dental research

Analysing and presenting qualitative data

This paper explores the most common methods of data collection used in qualitative research: interviews and focus groups. The paper examines each method in detail, focusing on how they work in practice, when their use is appropriate and what they can offer dentistry. Examples of empirical studies that have used interviews or focus groups are also provided.

You have full access to this article via your institution.

Similar content being viewed by others

methods of data collection on research

Interviews in the social sciences

Eleanor Knott, Aliya Hamid Rao, … Chana Teeger

methods of data collection on research

A review of technical and quality assessment considerations of audio-visual and web-conferencing focus groups in qualitative health research

Hiba Bawadi, Sara Elshami, … Banan Mukhalalati

Professionalism in dentistry: deconstructing common terminology

Andrew Trathen, Sasha Scambler & Jennifer E. Gallagher

Introduction

Having explored the nature and purpose of qualitative research in the previous paper, this paper explores methods of data collection used in qualitative research. There are a variety of methods of data collection in qualitative research, including observations, textual or visual analysis (eg from books or videos) and interviews (individual or group). 1 However, the most common methods used, particularly in healthcare research, are interviews and focus groups. 2 , 3

The purpose of this paper is to explore these two methods in more detail, in particular how they work in practice, the purpose of each, when their use is appropriate and what they can offer dental research.

Qualitative research interviews

There are three fundamental types of research interviews: structured, semi-structured and unstructured. Structured interviews are, essentially, verbally administered questionnaires, in which a list of predetermined questions are asked, with little or no variation and with no scope for follow-up questions to responses that warrant further elaboration. Consequently, they are relatively quick and easy to administer and may be of particular use if clarification of certain questions are required or if there are likely to be literacy or numeracy problems with the respondents. However, by their very nature, they only allow for limited participant responses and are, therefore, of little use if 'depth' is required.

Conversely, unstructured interviews do not reflect any preconceived theories or ideas and are performed with little or no organisation. 4 Such an interview may simply start with an opening question such as 'Can you tell me about your experience of visiting the dentist?' and will then progress based, primarily, upon the initial response. Unstructured interviews are usually very time-consuming (often lasting several hours) and can be difficult to manage, and to participate in, as the lack of predetermined interview questions provides little guidance on what to talk about (which many participants find confusing and unhelpful). Their use is, therefore, generally only considered where significant 'depth' is required, or where virtually nothing is known about the subject area (or a different perspective of a known subject area is required).

Semi-structured interviews consist of several key questions that help to define the areas to be explored, but also allows the interviewer or interviewee to diverge in order to pursue an idea or response in more detail. 2 This interview format is used most frequently in healthcare, as it provides participants with some guidance on what to talk about, which many find helpful. The flexibility of this approach, particularly compared to structured interviews, also allows for the discovery or elaboration of information that is important to participants but may not have previously been thought of as pertinent by the research team.

For example, in a recent dental public heath study, 5 school children in Cardiff, UK were interviewed about their food choices and preferences. A key finding that emerged from semi-structured interviews, which was not previously thought to be as highly influential as the data subsequently confirmed, was the significance of peer-pressure in influencing children's food choices and preferences. This finding was also established primarily through follow-up questioning (eg probing interesting responses with follow-up questions, such as 'Can you tell me a bit more about that?') and, therefore, may not have emerged in the same way, if at all, if asked as a predetermined question.

The purpose of research interviews

The purpose of the research interview is to explore the views, experiences, beliefs and/or motivations of individuals on specific matters (eg factors that influence their attendance at the dentist). Qualitative methods, such as interviews, are believed to provide a 'deeper' understanding of social phenomena than would be obtained from purely quantitative methods, such as questionnaires. 1 Interviews are, therefore, most appropriate where little is already known about the study phenomenon or where detailed insights are required from individual participants. They are also particularly appropriate for exploring sensitive topics, where participants may not want to talk about such issues in a group environment.

Examples of dental studies that have collected data using interviews are 'Examining the psychosocial process involved in regular dental attendance' 6 and 'Exploring factors governing dentists' treatment philosophies'. 7 Gibson et al . 6 provided an improved understanding of factors that influenced people's regular attendance with their dentist. The study by Kay and Blinkhorn 7 provided a detailed insight into factors that influenced GDPs' decision making in relation to treatment choices. The study found that dentists' clinical decisions about treatments were not necessarily related to pathology or treatment options, as was perhaps initially thought, but also involved discussions with patients, patients' values and dentists' feelings of self esteem and conscience.

There are many similarities between clinical encounters and research interviews, in that both employ similar interpersonal skills, such as questioning, conversing and listening. However, there are also some fundamental differences between the two, such as the purpose of the encounter, reasons for participating, roles of the people involved and how the interview is conducted and recorded. 8

The primary purpose of clinical encounters is for the dentist to ask the patient questions in order to acquire sufficient information to inform decision making and treatment options. However, the constraints of most consultations are such that any open-ended questioning needs to be brought to a conclusion within a fairly short time. 2 In contrast, the fundamental purpose of the research interview is to listen attentively to what respondents have to say, in order to acquire more knowledge about the study topic. 9 Unlike the clinical encounter, it is not to intentionally offer any form of help or advice, which many researchers have neither the training nor the time for. Research interviewing therefore requires a different approach and a different range of skills.

The interview

When designing an interview schedule it is imperative to ask questions that are likely to yield as much information about the study phenomenon as possible and also be able to address the aims and objectives of the research. In a qualitative interview, good questions should be open-ended (ie, require more than a yes/no answer), neutral, sensitive and understandable. 2 It is usually best to start with questions that participants can answer easily and then proceed to more difficult or sensitive topics. 2 This can help put respondents at ease, build up confidence and rapport and often generates rich data that subsequently develops the interview further.

As in any research, it is often wise to first pilot the interview schedule on several respondents prior to data collection proper. 8 This allows the research team to establish if the schedule is clear, understandable and capable of answering the research questions, and if, therefore, any changes to the interview schedule are required.

The length of interviews varies depending on the topic, researcher and participant. However, on average, healthcare interviews last 20-60 minutes. Interviews can be performed on a one-off or, if change over time is of interest, repeated basis, 4 for example exploring the psychosocial impact of oral trauma on participants and their subsequent experiences of cosmetic dental surgery.

Developing the interview

Before an interview takes place, respondents should be informed about the study details and given assurance about ethical principles, such as anonymity and confidentiality. 2 This gives respondents some idea of what to expect from the interview, increases the likelihood of honesty and is also a fundamental aspect of the informed consent process.

Wherever possible, interviews should be conducted in areas free from distractions and at times and locations that are most suitable for participants. For many this may be at their own home in the evenings. Whilst researchers may have less control over the home environment, familiarity may help the respondent to relax and result in a more productive interview. 9 Establishing rapport with participants prior to the interview is also important as this can also have a positive effect on the subsequent development of the interview.

When conducting the actual interview it is prudent for the interviewer to familiarise themselves with the interview schedule, so that the process appears more natural and less rehearsed. However, to ensure that the interview is as productive as possible, researchers must possess a repertoire of skills and techniques to ensure that comprehensive and representative data are collected during the interview. 10 One of the most important skills is the ability to listen attentively to what is being said, so that participants are able to recount their experiences as fully as possible, without unnecessary interruptions.

Other important skills include adopting open and emotionally neutral body language, nodding, smiling, looking interested and making encouraging noises (eg, 'Mmmm') during the interview. 2 The strategic use of silence, if used appropriately, can also be highly effective at getting respondents to contemplate their responses, talk more, elaborate or clarify particular issues. Other techniques that can be used to develop the interview further include reflecting on remarks made by participants (eg, 'Pain?') and probing remarks ('When you said you were afraid of going to the dentist what did you mean?'). 9 Where appropriate, it is also wise to seek clarification from respondents if it is unclear what they mean. The use of 'leading' or 'loaded' questions that may unduly influence responses should always be avoided (eg, 'So you think dental surgery waiting rooms are frightening?' rather than 'How do you find the waiting room at the dentists?').

At the end of the interview it is important to thank participants for their time and ask them if there is anything they would like to add. This gives respondents an opportunity to deal with issues that they have thought about, or think are important but have not been dealt with by the interviewer. 9 This can often lead to the discovery of new, unanticipated information. Respondents should also be debriefed about the study after the interview has finished.

All interviews should be tape recorded and transcribed verbatim afterwards, as this protects against bias and provides a permanent record of what was and was not said. 8 It is often also helpful to make 'field notes' during and immediately after each interview about observations, thoughts and ideas about the interview, as this can help in data analysis process. 4 , 8

Focus groups

Focus groups share many common features with less structured interviews, but there is more to them than merely collecting similar data from many participants at once. A focus group is a group discussion on a particular topic organised for research purposes. This discussion is guided, monitored and recorded by a researcher (sometimes called a moderator or facilitator). 11 , 12

Focus groups were first used as a research method in market research, originating in the 1940s in the work of the Bureau of Applied Social Research at Columbia University. Eventually the success of focus groups as a marketing tool in the private sector resulted in its use in public sector marketing, such as the assessment of the impact of health education campaigns. 13 However, focus group techniques, as used in public and private sectors, have diverged over time. Therefore, in this paper, we seek to describe focus groups as they are used in academic research.

When focus groups are used

Focus groups are used for generating information on collective views, and the meanings that lie behind those views. They are also useful in generating a rich understanding of participants' experiences and beliefs. 12 Suggested criteria for using focus groups include: 13

As a standalone method, for research relating to group norms, meanings and processes

In a multi-method design, to explore a topic or collect group language or narratives to be used in later stages

To clarify, extend, qualify or challenge data collected through other methods

To feedback results to research participants.

Morgan 12 suggests that focus groups should be avoided according to the following criteria:

If listening to participants' views generates expectations for the outcome of the research that can not be fulfilled

If participants are uneasy with each other, and will therefore not discuss their feelings and opinions openly

If the topic of interest to the researcher is not a topic the participants can or wish to discuss

If statistical data is required. Focus groups give depth and insight, but cannot produce useful numerical results.

Conducting focus groups: group composition and size

The composition of a focus group needs great care to get the best quality of discussion. There is no 'best' solution to group composition, and group mix will always impact on the data, according to things such as the mix of ages, sexes and social professional statuses of the participants. What is important is that the researcher gives due consideration to the impact of group mix (eg, how the group may interact with each other) before the focus group proceeds. 14

Interaction is key to a successful focus group. Sometimes this means a pre-existing group interacts best for research purposes, and sometimes stranger groups. Pre-existing groups may be easier to recruit, have shared experiences and enjoy a comfort and familiarity which facilitates discussion or the ability to challenge each other comfortably. In health settings, pre-existing groups can overcome issues relating to disclosure of potentially stigmatising status which people may find uncomfortable in stranger groups (conversely there may be situations where disclosure is more comfortable in stranger groups). In other research projects it may be decided that stranger groups will be able to speak more freely without fear of repercussion, and challenges to other participants may be more challenging and probing, leading to richer data. 13

Group size is an important consideration in focus group research. Stewart and Shamdasani 14 suggest that it is better to slightly over-recruit for a focus group and potentially manage a slightly larger group, than under-recruit and risk having to cancel the session or having an unsatisfactory discussion. They advise that each group will probably have two non-attenders. The optimum size for a focus group is six to eight participants (excluding researchers), but focus groups can work successfully with as few as three and as many as 14 participants. Small groups risk limited discussion occurring, while large groups can be chaotic, hard to manage for the moderator and frustrating for participants who feel they get insufficient opportunities to speak. 13

Preparing an interview schedule

Like research interviews, the interview schedule for focus groups is often no more structured than a loose schedule of topics to be discussed. However, in preparing an interview schedule for focus groups, Stewart and Shamdasani 14 suggest two general principles:

Questions should move from general to more specific questions

Question order should be relative to importance of issues in the research agenda.

There can, however, be some conflict between these two principles, and trade offs are often needed, although often discussions will take on a life of their own, which will influence or determine the order in which issues are covered. Usually, less than a dozen predetermined questions are needed and, as with research interviews, the researcher will also probe and expand on issues according to the discussion.

Moderating a focus group looks easy when done well, but requires a complex set of skills, which are related to the following principles: 15

Participants have valuable views and the ability to respond actively, positively and respectfully. Such an approach is not simply a courtesy, but will encourage fruitful discussions

Moderating without participating: a moderator must guide a discussion rather than join in with it. Expressing one's own views tends to give participants cues as to what to say (introducing bias), rather than the confidence to be open and honest about their own views

Be prepared for views that may be unpalatably critical of a topic which may be important to you

It is important to recognise that researchers' individual characteristics mean that no one person will always be suitable to moderate any kind of group. Sometimes the characteristics that suit a moderator for one group will inhibit discussion in another

Be yourself. If the moderator is comfortable and natural, participants will feel relaxed.

The moderator should facilitate group discussion, keeping it focussed without leading it. They should also be able to prevent the discussion being dominated by one member (for example, by emphasising at the outset the importance of hearing a range of views), ensure that all participants have ample opportunity to contribute, allow differences of opinions to be discussed fairly and, if required, encourage reticent participants. 13

Other relevant factors

The venue for a focus group is important and should, ideally, be accessible, comfortable, private, quiet and free from distractions. 13 However, while a central location, such as the participants' workplace or school, may encourage attendance, the venue may affect participants' behaviour. For example, in a school setting, pupils may behave like pupils, and in clinical settings, participants may be affected by any anxieties that affect them when they attend in a patient role.

Focus groups are usually recorded, often observed (by a researcher other than the moderator, whose role is to observe the interaction of the group to enhance analysis) and sometimes videotaped. At the start of a focus group, a moderator should acknowledge the presence of the audio recording equipment, assure participants of confidentiality and give people the opportunity to withdraw if they are uncomfortable with being taped. 14

A good quality multi-directional external microphone is recommended for the recording of focus groups, as internal microphones are rarely good enough to cope with the variation in volume of different speakers. 13 If observers are present, they should be introduced to participants as someone who is just there to observe, and sit away from the discussion. 14 Videotaping will require more than one camera to capture the whole group, as well as additional operational personnel in the room. This is, therefore, very obtrusive, which can affect the spontaneity of the group and in a focus group does not usually yield enough additional information that could not be captured by an observer to make videotaping worthwhile. 15

The systematic analysis of focus group transcripts is crucial. However, the transcription of focus groups is more complex and time consuming than in one-to-one interviews, and each hour of audio can take up to eight hours to transcribe and generate approximately 100 pages of text. Recordings should be transcribed verbatim and also speakers should be identified in a way that makes it possible to follow the contributions of each individual. Sometimes observational notes also need to be described in the transcripts in order for them to make sense.

The analysis of qualitative data is explored in the final paper of this series. However, it is important to note that the analysis of focus group data is different from other qualitative data because of their interactive nature, and this needs to be taken into consideration during analysis. The importance of the context of other speakers is essential to the understanding of individual contributions. 13 For example, in a group situation, participants will often challenge each other and justify their remarks because of the group setting, in a way that perhaps they would not in a one-to-one interview. The analysis of focus group data must therefore take account of the group dynamics that have generated remarks.

Focus groups in dental research

Focus groups are used increasingly in dental research, on a diverse range of topics, 16 illuminating a number of areas relating to patients, dental services and the dental profession. Addressing a special needs population difficult to access and sample through quantitative measures, Robinson et al . 17 used focus groups to investigate the oral health-related attitudes of drug users, exploring the priorities, understandings and barriers to care they encounter. Newton et al . 18 used focus groups to explore barriers to services among minority ethnic groups, highlighting for the first time differences between minority ethnic groups. Demonstrating the use of the method with professional groups as subjects in dental research, Gussy et al . 19 explored the barriers to and possible strategies for developing a shared approach in prevention of caries among pre-schoolers. This mixed method study was very important as the qualitative element was able to explain why the clinical trial failed, and this understanding may help researchers improve on the quantitative aspect of future studies, as well as making a valuable academic contribution in its own right.

Interviews and focus groups remain the most common methods of data collection in qualitative research, and are now being used with increasing frequency in dental research, particularly to access areas not amendable to quantitative methods and/or where depth, insight and understanding of particular phenomena are required. The examples of dental studies that have employed these methods also help to demonstrate the range of research contexts to which interview and focus group research can make a useful contribution. The continued employment of these methods can further strengthen many areas of dentally related work.

Silverman D . Doing qualitative research . London: Sage Publications, 2000.

Google Scholar  

Britten N . Qualitative interviews in healthcare. In Pope C, Mays N (eds) Qualitative research in health care . 2nd ed. pp 11–19. London: BMJ Books, 1999.

Legard R, Keegan J, Ward K . In-depth interviews. In Ritchie J, Lewis J (eds) Qualitative research practice: a guide for social science students and researchers . pp 139–169. London: Sage Publications, 2003.

May K M . Interview techniques in qualitative research: concerns and challenges. In Morse J M (ed) Qualitative nursing research . pp 187–201. Newbury Park: Sage Publications, 1991.

Stewart K, Gill P, Treasure E, Chadwick B . Understanding about food among 6-11 year olds in South Wales. Food Culture Society 2006; 9 : 317–333.

Article   Google Scholar  

Gibson B, Drenna J, Hanna S, Freeman R . An exploratory qualitative study examining the social and psychological processes involved in regular dental attendance. J Public Health Dent 2000; 60 : 5–11.

Kay E J, Blinkhorn A S . A qualitative investigation of factors governing dentists' treatment philosophies. Br Dent J 1996; 180 : 171–176.

Pontin D . Interviews. In Cormack D F S (ed) The research process in nursing . 4th ed. pp 289–298. Oxford: Blackwell Science, 2000.

Kvale S . Interviews . Thousand Oaks: Sage Publications, 1996.

Hammersley M, Atkinson P . Ethnography: principles in practice . 2nd ed. London: Routledge, 1995.

Kitzinger J . The methodology of focus groups: the importance of interaction between research participants. Sociol Health Illn 1994; 16 : 103–121.

Morgan D L . The focus group guide book . London: Sage Publications, 1998.

Book   Google Scholar  

Bloor M, Frankland J, Thomas M, Robson K . Focus groups in social research . London: Sage Publications, 2001.

Stewart D W, Shamdasani P M . Focus groups. Theory and practice . London: Sage Publications, 1990.

Krueger R A . Moderating focus groups . London: Sage Publications, 1998.

Chestnutt I G, Robson K F. Focus groups – what are they? Dent Update 2002; 28 : 189–192.

Robinson P G, Acquah S, Gibson B . Drug users: oral health related attitudes and behaviours. Br Dent J 2005; 198 : 219–224.

Newton J T, Thorogood N, Bhavnani V, Pitt J, Gibbons D E, Gelbier S . Barriers to the use of dental services by individuals from minority ethnic communities living in the United Kingdom: findings from focus groups. Primary Dent Care 2001; 8 : 157–161.

Gussy M G, Waters E, Kilpatrick M . A qualitative study exploring barriers to a model of shared care for pre-school children's oral health. Br Dent J 2006; 201 : 165–170.

Download references

Author information

Authors and affiliations.

Senior Research Fellow, Faculty of Health, Sport and Science, University of Glamorgan, Pontypridd, CF37 1DL,

Research Fellow, Academic Unit of Primary Care, University of Bristol, Bristol, BS8 2AA,

Dean and Professor of Dental Public Health, School of Dentistry, Dental Health and Biological Sciences, School of Dentistry, Cardiff University, Heath Park, Cardiff, CF14 4XY,

E. Treasure

Professor of Paediatric Dentistry, School of Dentistry, Dental Health and Biological Sciences, School of Dentistry, Cardiff University, Heath Park, Cardiff, CF14 4XY,

B. Chadwick

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to P. Gill .

Additional information

Refereed paper

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Gill, P., Stewart, K., Treasure, E. et al. Methods of data collection in qualitative research: interviews and focus groups. Br Dent J 204 , 291–295 (2008). https://doi.org/10.1038/bdj.2008.192

Download citation

Published : 22 March 2008

Issue Date : 22 March 2008

DOI : https://doi.org/10.1038/bdj.2008.192

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Imprisonment for south ethiopian people living with hiv presents a double health burden: lived experiences of prisoners.

  • Terefe Gone Fuge
  • George Tsourtos
  • Emma R Miller

BMC Health Services Research (2024)

Employees’ experiences of a large-scale implementation in a public care setting: a novel mixed-method approach to content analysis

  • My Säfström
  • Ulrika Löfkvist

Translating brand reputation into equity from the stakeholder’s theory: an approach to value creation based on consumer’s perception & interactions

  • Olukorede Adewole

International Journal of Corporate Social Responsibility (2024)

Heteronormative biases and distinctive experiences with prostate cancer among men who have sex with men: a qualitative focus group study

  • Evan Panken
  • Noah Frydenlund
  • Channa Amarasekera

BMC Urology (2024)

The experience of the irreligious conversion: analysis of the spiritual transition toward skepticism from traditional religions in young adults residing in Santiago, Chile

  • Consuelo Calderón Villarreal

SN Social Sciences (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

methods of data collection on research

Research on fine collection and interpretation methods of discontinuities on high-steep rock slopes based on UAV multi-angle nap-of-the-object photogrammetry

  • Original Paper
  • Published: 26 March 2024
  • Volume 83 , article number  142 , ( 2024 )

Cite this article

  • Shengyuan Song 1 ,
  • Mingyu Zhao 1 ,
  • Wen Zhang 1 ,
  • Fengyan Wang 2 ,
  • Jianping Chen 1 &
  • Yongchao Li 1  

48 Accesses

Explore all metrics

The rapid and accurate acquisition of discontinuity parameters of rock masses is of paramount significance for the stability assessment of rock slopes. However, the complex and hazardous terrain of high-steep rock slopes poses challenges to conventional survey methods. Given this, this study proposes the unmanned aerial vehicle (UAV) multi-angle nap-of-the-object photogrammetry technology. This method can comprehensively consider the terrain development characteristics of high-steep rock slopes and the orientation characteristics of dominant discontinuity sets. With trained pilots, high-quality image acquisition with millimeter resolution can be achieved. This method effectively overcomes issues like texture distortion, shadow obstruction, and low resolution commonly found in conventional UAV 3D models. Subsequently, a novel non-contact method is presented for obtaining discontinuity parameters, including orientation, trace length, spacing, aperture, and roughness, based on the 3D real-scene model of the slope. The feasibility and reliability of this method are verified by collecting 1728 discontinuities along the 1300-m length of a rock slope at a construction site on a railway in southeastern Tibet. A comparison with manually measured results indicates average differences of 2° for dip and dip direction; 1.3 cm and 9 mm for trace length and aperture, respectively; and 1.89 for JRC. The study also reveals that the effective resolution of the 3D model is approximately 1 to 2 times the theoretical resolution of a UAV image, providing crucial guidance for obtaining high-quality images on high-steep rock slopes. The method proposed in this study holds significant practical value in the stability assessment and disaster prevention of rock slopes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

methods of data collection on research

Similar content being viewed by others

methods of data collection on research

UAV-Based Discontinuity Analyses and Rock Fall Source Mapping in Alpine Terrain (Pletzachkogel/Tyrol/Austria)

methods of data collection on research

Application of UAV Photogrammetry to Slope-Displacement Measurement

Jungmin Cho, Jongseok Lee & Byoungkil Lee

Comparing Unmanned Aerial Vehicle (UAV), Terrestrial LiDAR, and Brunton Compass Methods for Discontinuity Data Collection

Data availability.

Most of the data generated during this study are included in this article, and other datasets generated during the current study are available from the corresponding author on reasonable request.

Aydan Ö, Shimizu Y, Kawamoto T (1996) The anisotropy of surface morphology characteristics of rock discontinuities. Rock Mech Rock Eng 29:47–59. https://doi.org/10.1007/BF01019939

Article   Google Scholar  

Azarafza M, Asghari-Kaljahi E, Akgün H (2017) Numerical modeling of discontinuous rock slopes utilizing the 3DDGM (three-dimensional discontinuity geometrical modeling) method. Bull Eng Geol Environ 76:989–1007. https://doi.org/10.1007/s10064-016-0879-1

Bandis S, Lumsden AC, Barton NR (1981) Experimental studies of scale effects on the shear behaviour of rock joints. Int J Rock Mech Min Sci Geomech Abstr 18:1–21. https://doi.org/10.1016/0148-9062(81)90262-X

Bao X, Li H (2020) Study on the evaluation method of subgrade slope green protection effect in dry-hot valley of Sichuan-Tibet railway. Math Probl Eng 2020:1–16. https://doi.org/10.1155/2020/7159582

Bar N, Borgatti L, Donati D et al (2021) Classification of natural and engineered rock slopes using UAV photogrammetry for assessing stability. IOP Conf Ser Earth Environ Sci 833:012046. https://doi.org/10.1088/1755-1315/833/1/012046

Barton N, Choubey V (1977) The shear strength of rock joints in theory and practice. Rock Mech Felsmech Mécanique Roches 10:1–54. https://doi.org/10.1007/BF01261801

Belem T, Souley M, Homand F (2007) Modeling surface roughness degradation of rock joint wall during monotonic and cyclic shearing. Acta Geotech 2:227–248. https://doi.org/10.1007/s11440-007-0039-7

Bonilla-Sierra V, Scholtès L, Donzé FV, Elmouttie MK (2015) Rock slope stability analysis using photogrammetric data and DFN–DEM modelling. Acta Geotech 10:497–511. https://doi.org/10.1007/s11440-015-0374-z

Bostanci HT, Alemdag S, Gurocak Z, Gokceoglu C (2018) Combination of discontinuity characteristics and GIS for regional assessment of natural rock slopes in a mountainous area (NE Turkey). CATENA 165:487–502. https://doi.org/10.1016/j.catena.2018.03.005

Chen J, Huang H, Zhou M, Chaiyasarn K (2021a) Towards semi-automatic discontinuity characterization in rock tunnel faces using 3D point clouds. Eng Geol 291:106232. https://doi.org/10.1016/j.enggeo.2021.106232

Chen N, Du C, Ding X (2021b) Intelligent interpretation of the geometric properties of rock mass discontinuities based on an unmanned aerial vehicle. Front Earth Sci 9:665. https://doi.org/10.3389/feart.2021.711866

Cui Z, Qi S, Han W (2022) The role of weak bedding planes in the cross-layer crack growth paths of layered rocks. Geomech Geophys Geo-Energy Geo-Resour 8:22. https://doi.org/10.1007/s40948-021-00321-x

Deng L, Chen Y, Zhao Y et al (2021) An approach for reflectance anisotropy retrieval from UAV-based oblique photogrammetry hyperspectral imagery. Int J Appl Earth Obs Geoinformation 102:102442. https://doi.org/10.1016/j.jag.2021.102442

Eccles CS, Redford RP (1999) The use of dynamic (window) sampling in the site investigation of potentially contaminated ground. Eng Geol 53:125–130. https://doi.org/10.1016/S0013-7952(99)00025-3

Eshiet KI-I, Sheng Y (2017) The role of rock joint frictional strength in the containment of fracture propagation. Acta Geotech 12:897–920. https://doi.org/10.1007/s11440-016-0512-2

Fanos AM (2019) A novel rockfall hazard assessment using laser scanning data and 3D modelling in GIS. CATENA 172:435–450. https://doi.org/10.1016/j.catena.2018.09.012

Farmakis I, Marinos V, Papathanassiou G, Karantanellis E (2020) Automated 3D jointed rock mass structural analysis and characterization using LiDAR terrestrial laser scanner for rockfall susceptibility assessment: Perissa area case (Santorini). Geotech Geol Eng 38:3007–3024. https://doi.org/10.1007/s10706-020-01203-x

Gan L, Wang L, Hu Z et al (2022) Do geologic hazards affect the sustainability of rural development? Evidence from rural areas in China. J Clean Prod 339:130693. https://doi.org/10.1016/j.jclepro.2022.130693

Gigli G, Casagli N (2011) Semi-automatic extraction of rock mass structural data from high resolution LIDAR point clouds. Int J Rock Mech Min Sci 48:187–198. https://doi.org/10.1016/j.ijrmms.2010.11.009

He J (2019) Nap-of-the-object photogrammetry and its key techniques. Wuhan University

Google Scholar  

ISRM (1978) Suggested methods for the quantitative description of discontinuities in rock masses. Int J Rock Mech Min Sci Geomech Abstr 15:319–368. https://doi.org/10.1016/0148-9062(78)91472-9

Kersten T, Wolf J, Lindstaedt M (2022) Investigations into the accuracy of the UAV system DJI Matrice 300 RTK with the sensors Zenmuse P1 and L1 in the hamburg test field. Int Arch Photogramm Remote Sens Spat. Inf Sci XLIII-B1-2022:339–346. https://doi.org/10.5194/isprs-archives-XLIII-B1-2022-339-2022

Kong D, Saroglou C, Wu F et al (2021) Development and application of UAV-SfM photogrammetry for quantitative characterization of rock mass discontinuities. Int J Rock Mech Min Sci 141:104729. https://doi.org/10.1016/j.ijrmms.2021.104729

Kulatilake PHSW, Balasingam P, Park J, Morgan R (2006) Natural rock joint roughness quantification through fractal techniques. Geotech Geol Eng 24:1181–1202. https://doi.org/10.1007/s10706-005-1219-6

Lai Q, Zhao J, Huang R et al (2021) Formation mechanism and evolution process of the Chada rock avalanche in Southeast Tibet, China. Landslides. https://doi.org/10.1007/s10346-021-01793-4

Lee S, Kim B, Baik H, Cho S-J (2022) A novel design and implementation of an autopilot terrain-following airship. IEEE Access 10:38428–38436. https://doi.org/10.1109/ACCESS.2022.3164737

Li Y, Chen J, Zhou F et al (2022) Stability evaluation of rock slope based on discrete fracture network and discrete element model: a case study for the right bank of Yigong Zangbu Bridge. Acta Geotech 17:1423–1441. https://doi.org/10.1007/s11440-021-01369-5

Liu G, Yu Q, Li H et al (2019) Study on applicability of unmanned aerial vehicle mobile survey of open-pit mine in alpine region. Coal Sci Technol 47:43–50. https://doi.org/10.13199/j.cnki.cst.2019.10.004

Lyman GJ (2003) Rock fracture mean trace length estimation and confidence interval calculation using maximum likelihood methods. Int J Rock Mech Min Sci 40:825–832. https://doi.org/10.1016/S1365-1609(03)00043-1

Mauldon M (1998) Estimating mean fracture trace length and density from observations in convex windows. Rock Mech Rock Eng 31:201–216. https://doi.org/10.1007/s006030050021

Menegoni N, Giordan D, Perotti C, Tannant DD (2019) Detection and geometric characterization of rock mass discontinuities using a 3D high-resolution digital outcrop model generated from RPAS imagery – Ormea rock slope, Italy. Eng Geol 252:145–163. https://doi.org/10.1016/j.enggeo.2019.02.028

Park HJ, West TR (2002) Sampling bias of discontinuity orientation caused by linear sampling technique. Eng Geol 66:99–110. https://doi.org/10.1016/S0013-7952(02)00034-0

Priest SD, Hudson JA (1981) Estimation of discontinuity spacing and trace length using scanline surveys. Int J Rock Mech Min Sci Geomech Abstr 18:183–197. https://doi.org/10.1016/0148-9062(81)90973-6

Ren C, Shang H, Zha Z et al (2022) Color balance method of dense point cloud in landslides area based on UAV images. IEEE Sens J 22:3516–3528. https://doi.org/10.1109/JSEN.2022.3141936

Article   CAS   Google Scholar  

Riquelme AJ, Abellán A, Tomás R, Jaboyedoff M (2014) A new approach for semi-automatic rock mass joints recognition from 3D point clouds. Comput Geosci 68:38–52. https://doi.org/10.1016/j.cageo.2014.03.014

Salvini R, Mastrorocco G, Seddaiu M et al (2017) The use of an unmanned aerial vehicle for fracture mapping within a marble quarry (Carrara, Italy): photogrammetry and discrete fracture network modelling. Geomat Nat Hazards Risk 8:34–52. https://doi.org/10.1080/19475705.2016.1199053

Salvini R, Vanneschi C, Coggan JS, Mastrorocco G (2020) Evaluation of the use of UAV photogrammetry for rock discontinuity roughness characterization. Rock Mech Rock Eng 53:3699–3720. https://doi.org/10.1007/s00603-020-02130-2

Shi Z, Xu J, Xie X et al (2023) Failure mechanism analysis for tunnel construction crossing the water-rich dense fracture zones: a case study. Eng Fail Anal 149:107242. https://doi.org/10.1016/j.engfailanal.2023.107242

Song J-J, Lee C-I (2001) Estimation of joint length distribution using window sampling. Int J Rock Mech Min Sci 38:519–528. https://doi.org/10.1016/S1365-1609(01)00018-1

Song S, Wang Q, Chen J et al (2017) Fuzzy C-means clustering analysis based on quantum particle swarm optimization algorithm for the grouping of rock discontinuity sets. KSCE J Civ Eng 21:1115–1122. https://doi.org/10.1007/s12205-016-1223-9

Song S, Zhao M, Zhu C et al (2022) Identification of the potential critical slip surface for fractured rock slope using the Floyd algorithm. Remote Sens 14:1284. https://doi.org/10.3390/rs14051284

Stimpson B (1982) A rapid field method for recording joint roughness profiles. Int J Rock Mech Min Sci Geomech Abstr 19:345–346. https://doi.org/10.1016/0148-9062(82)91369-9

Wang F, Yan X, Wang M et al (2020a) Universal three-point model of generating a new coordinate system and its application. J Jilin Univ Technol Ed 50:324–332. https://doi.org/10.13229/j.cnki.jdxbgxb20181228

Wang M, Zhou J, Chen J et al (2023) Automatic identification of rock discontinuity and stability analysis of tunnel rock blocks using terrestrial laser scanning. J Rock Mech Geotech Eng:S1674775523000240. https://doi.org/10.1016/j.jrmge.2022.12.015

Wang W-D, Li J, Han Z (2020b) Comprehensive assessment of geological hazard safety along railway engineering using a novel method: a case study of the Sichuan-Tibet railway, China. Geomat Nat Hazards Risk 11:1–21. https://doi.org/10.1080/19475705.2019.1699606

Westoby MJ, Brasington J, Glasser NF et al (2012) ‘Structure-from-motion’ photogrammetry: a low-cost, effective tool for geoscience applications. Geomorphology 179:300–314. https://doi.org/10.1016/j.geomorph.2012.08.021

Wichmann V, Strauhal T, Fey C, Perzlmaier S (2019) Derivation of space-resolved normal joint spacing and in situ block size distribution data from terrestrial LIDAR point clouds in a rugged Alpine relief (Kühtai, Austria). Bull Eng Geol Environ 78:4465–4478. https://doi.org/10.1007/s10064-018-1374-7

Xing Z, Zhang X, Zan X et al (2021) Crowdsourced social media and mobile phone signaling data for disaster impact assessment: a case study of the 8.8 Jiuzhaigou earthquake. Int J Disaster Risk Reduct 58:102200. https://doi.org/10.1016/j.ijdrr.2021.102200

Xu W, Zhang Y, Li X et al (2020) Extraction and statistics of discontinuity orientation and trace length from typical fractured rock mass: a case study of the Xinchang underground research laboratory site. China. Eng Geol 269:105553. https://doi.org/10.1016/j.enggeo.2020.105553

Yan J, Chen J, Zhan J et al (2022) Automatic identification of rock discontinuity sets using modified agglomerative nesting algorithm. Bull Eng Geol Environ 81:229. https://doi.org/10.1007/s10064-022-02724-w

Zhan J, Wang Q, Zhang W et al (2019) Soil-engineering properties and failure mechanisms of shallow landslides in soft-rock materials. CATENA 181:104093. https://doi.org/10.1016/j.catena.2019.104093

Zhan L, Guo X, Sun Q et al (2021) The 2015 Shenzhen catastrophic landslide in a construction waste dump: analyses of undrained strength and slope stability. Acta Geotech 16:1247–1263. https://doi.org/10.1007/s11440-020-01083-8

Zhang X, Zhao P, Hu Q et al (2020) A UAV-based panoramic oblique photogrammetry (POP) approach using spherical projection. ISPRS J Photogramm Remote Sens 159:198–219. https://doi.org/10.1016/j.isprsjprs.2019.11.016

Zhang Y, Chen J, Zhou F et al (2023) A novel approach to investigating 3D fracture connectivity in ultrahigh steep rock slopes. Int J Rock Mech Min Sci 161:105291. https://doi.org/10.1016/j.ijrmms.2022.105291

Zhao M, Chen J, Song S et al (2023) Proposition of UAV multi-angle nap-of-the-object image acquisition framework based on a quality evaluation system for a 3D real scene model of a high-steep rock slope. Int J Appl Earth Obs Geoinformation 125:103558. https://doi.org/10.1016/j.jag.2023.103558

Zhao M, Song S, Wang F et al (2024) A method to interpret fracture aperture of rock slope using adaptive shape and unmanned aerial vehicle multi-angle nap-of-the-object photogrammetry. J Rock Mech Geotech Eng 16:924–941. https://doi.org/10.1016/j.jrmge.2023.07.010

Zhou T, Lv L, Liu J, Wan J (2021) Application of UAV oblique photography in real scene 3d modeling. Int Arch Photogramm Remote Sens Spat Inf Sci XLIII-B2-2021:413–418. https://doi.org/10.5194/isprs-archives-XLIII-B2-2021-413-2021

Download references

Acknowledgements

This work was financially supported by the National Nature Science Foundation of China (grant number: 42177139, 41941017, 42077242), the Natural Science Foundation Project of Jilin Province (grant number: 20230101088JC), and the Scientific Research Project of the Education Department of Jilin Province (grant number: JJKH20231182KJ).

Author information

Authors and affiliations.

College of Construction Engineering, Jilin University, Changchun, 130026, China

Shengyuan Song, Mingyu Zhao, Wen Zhang, Jianping Chen & Yongchao Li

College of Geo-Exploration Science and Technology, Jilin University, Changchun, 130026, China

Fengyan Wang

You can also search for this author in PubMed   Google Scholar

Contributions

Shengyuan Song: writing (original draft), visualization, funding acquisition, and supervision. Mingyu Zhao: software, visualization, investigation, and writing (review and editing). Wen Zhang: conceptualization, methodology, and supervision. Fengyan Wang: project administration, investigation, and data curation. Chen Cao: software, validation, and writing (review and editing). Jianping Chen: formal analysis and writing (review and editing). Yongchao Li: investigation and data curation.

Corresponding author

Correspondence to Jianping Chen .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Song, S., Zhao, M., Zhang, W. et al. Research on fine collection and interpretation methods of discontinuities on high-steep rock slopes based on UAV multi-angle nap-of-the-object photogrammetry. Bull Eng Geol Environ 83 , 142 (2024). https://doi.org/10.1007/s10064-024-03646-5

Download citation

Received : 18 July 2023

Accepted : 18 March 2024

Published : 26 March 2024

DOI : https://doi.org/10.1007/s10064-024-03646-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • High-steep rock slope
  • Discontinuity
  • Parameter interpretation
  • UAV oblique photogrammetry
  • UAV multi-angle nap-of-the-object photogrammetry
  • Find a journal
  • Publish with us
  • Track your research

To read this content please select one of the options below:

Please note you do not have access to teaching notes, homelessness: challenges and opportunities in the “new normal”.

Mental Health and Social Inclusion

ISSN : 2042-8308

Article publication date: 29 March 2024

This paper – the final paper of a series of three – aims to discuss the implications of the findings from a service user needs assessment of people experiencing homelessness in the Northwest of England. It will expand on the previous paper by offering a more detailed analysis and discussion of the identified key themes and issues. The service user needs assessment was completed as part of a review of local service provision in the Northwest of England against the backdrop of the COVID-19 pandemic.

Design/methodology/approach

Semi-structured questionnaires were administered and used by health-care professionals to collect data from individuals accessing the Homeless and Vulnerable Adults Service (HVAS) in Bolton. The questionnaires included a section exploring Adverse Childhood Experiences. Data from 100 completed questionnaires were analysed to better understand the needs of those accessing the HVAS.

Multiple deprivations including extensive health and social care needs were identified within the cohort. Meeting these complex needs was challenging for both service users and service providers. This paper will explore key themes identified by the needs assessment and draw upon further comments from those who participated in the data-gathering process. The paper discusses the practicalities of responding to the complex needs of those with lived experience of homelessness. It highlights how a coordinated partnership approach, using an integrated service delivery model can be both cost-effective and responsive to the needs of those often on the margins of our society.

Research limitations/implications

Data collection during the COVID-19 pandemic presented a number of challenges. The collection period had to be extended whilst patient care was prioritised. Quantitative methods were used, however, this limited the opportunity for service user involvement and feedback. Future research could use qualitative methods to address this balance and use a more inclusive approach.

Practical implications

This study illustrates that the needs of the homeless population are broad and varied. Although the population themselves have developed different responses to their situations, their needs can only be fully met by a co-ordinated, multi-agency, partnership response. An integrated service model can help identify, understand, and meet the needs of the whole population and individuals within it to improve healthcare for a vulnerable population.

Social implications

This study highlighted new and important findings around the resilience of the homeless population and the significance of building protective factors to help combat the multiplicity of social isolation with both physical and mental health problems.

Originality/value

The discussion provides an opportunity to reflect on established views in relation to the nature and scope of homelessness. The paper describes a contemporary approach to tackling current issues faced by those experiencing homelessness in the current context of the COVID-19 pandemic. Recommendations for service improvements will include highlighting established good practices including embedding a more inclusive/participatory approach.

  • Homelessness
  • Social exclusion
  • Health inequalities
  • Mental health
  • Partnerships

Acknowledgements

The authors received no financial support for the research, authorship and/or publication of this article. The authors wish to acknowledge the contributions made by those with lived experience who completed the survey. Recognition and thanks are also given to those involved in the delivery of services that seek to improve the lives of those who are homeless.

Woods, A. , Lace, R. , Dickinson, J. and Hughes, B. (2024), "Homelessness: challenges and opportunities in the “new normal”", Mental Health and Social Inclusion , Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/MHSI-02-2024-0032

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited

Related articles

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

This paper is in the following e-collection/theme issue:

Published on 29.3.2024 in Vol 26 (2024)

Usability of Health Care Price Transparency Data in the United States: Mixed Methods Study

Authors of this article:

Author Orcid Image

Original Paper

  • Negar Maleki 1 , PhD   ; 
  • Balaji Padmanabhan 2 , PhD   ; 
  • Kaushik Dutta 1 , PhD  

1 School of Information Systems and Management, Muma College of Business, University of South Florida, Tampa, FL, United States

2 Decision, Operations & Information Technologies Department, Robert H. Smith School of Business, University of Maryland, College Park, MD, United States

Corresponding Author:

Negar Maleki, PhD

School of Information Systems and Management

Muma College of Business

University of South Florida

4202 E Fowler Avenue

Tampa, FL, 33620

United States

Phone: 1 8139742011

Email: [email protected]

Background: Increasing health care expenditure in the United States has put policy makers under enormous pressure to find ways to curtail costs. Starting January 1, 2021, hospitals operating in the United States were mandated to publish transparent, accessible pricing information online about the items and services in a consumer-friendly format within comprehensive machine-readable files on their websites.

Objective: The aims of this study are to analyze the available files on hospitals’ websites, answering the question—is price transparency (PT) information as provided usable for patients or for machines?—and to provide a solution.

Methods: We analyzed 39 main hospitals in Florida that have published machine-readable files on their website, including commercial carriers. We created an Excel (Microsoft) file that included those 39 hospitals along with the 4 most popular services—Current Procedural Terminology (CPT) 45380, 29827, and 70553 and Diagnosis-Related Group (DRG) 807—for the 4 most popular commercial carriers (Health Maintenance Organization [HMO] or Preferred Provider Organization [PPO] plans)—Aetna, Florida Blue, Cigna, and UnitedHealthcare. We conducted an A/B test using 67 MTurkers (randomly selected from US residents), investigating the level of awareness about PT legislation and the usability of available files. We also suggested format standardization, such as master field names using schema integration, to make machine-readable files consistent and usable for machines.

Results: The poor usability and inconsistent formats of the current PT information yielded no evidence of its usefulness for patients or its quality for machines. This indicates that the information does not meet the requirements for being consumer-friendly or machine readable as mandated by legislation. Based on the responses to the first part of the experiment (PT awareness), it was evident that participants need to be made aware of the PT legislation. However, they believe it is important to know the service price before receiving it. Based on the responses to the second part of the experiment (human usability of PT information), the average number of correct responses was not equal between the 2 groups, that is, the treatment group (mean 1.23, SD 1.30) found more correct answers than the control group (mean 2.76, SD 0.58; t 65 =6.46; P <.001; d =1.52).

Conclusions: Consistent machine-readable files across all health systems facilitate the development of tools for estimating customer out-of-pocket costs, aligning with the PT rule’s main objective—providing patients with valuable information and reducing health care expenditures.

Introduction

From 1970 to 2020, on a per capita basis, health care expenditures in the United States have increased sharply from US $353 per person to US $12,531 per person. In constant 2020 dollars, the increase was from US $1875 in 1970 to US $12,531 in 2020 [ 1 ]. The significant rise in health care expenses has put policy makers under enormous pressure to find ways to contain these expenditures. Price transparency (PT) in health care is 1 generally proposed strategy for addressing these problems [ 2 ] and has been debated for years [ 3 ]. Some economists believe that PT in health care will cut health care prices in the same way it has in other industries, while others argue that owing to the specific characteristics of the health care market, PT would not ameliorate rising health care costs. Price elasticity also does not typically apply in health care, since, if a problem gets severe, people will typically seek treatment regardless of cost, with the drawback that individuals learn of their health care costs after receiving treatment [ 4 ]. Complex billing processes, hidden insurer-provider contracts, the sheer quantity of third-party payers, and substantial quality differences in health care delivery are other unique aspects of health care that complicate the situation considerably.

The Centers for Medicare & Medicaid Services (CMS) mandated hospitals to post negotiated rates, including payer-specific negotiated costs, for 300 “shoppable services” beginning in January 2021. The list must include 70 CMS-specified services and an additional 230 services each hospital considers relevant to its patient population. Hospitals must include each third-party payer and their payer-specific fee when negotiating multiple rates for the same care. The data must be displayed simply, easily accessible (without requiring personal information from the patient), and saved in a machine-readable manner [ 5 ]. These efforts aim to facilitate informed patient decision-making, reduce out-of-pocket spending, and decrease health care expenditures. Former Secretary of Health and Human Services, Alex Azar, expressed a vision of hospital PT when declaring the new legislation “a patient-centered system that puts you in control and provides the affordability you need, the options and control you want, and the quality you deserve. Providing patients with clear, accessible information about the price of their care is a vital piece of delivering on that vision” [ 6 ].

Despite the legislation, it is not clear if people are actually engaging in using PT tools. For example, in 2007, New Hampshire’s HealthCost website was established, providing the negotiated price and out-of-pocket costs for 42 commonly used services by asking whether the patient is insured or their insurer and the zip code to post out-of-pocket costs in descending order. Mehrotra et al [ 7 ] examined this website over 3 years to understand how often and why these tools have mainly been used. Their analysis suggested that despite the growing interest in PT, approximately 1% of the state’s population used this tool. Low PT tool usage was also seen in other studies [ 8 - 10 ], suggesting that 3% to 12% of individuals who were offered the tool used it during the study period, and in all studies, the duration was at least 12 months. Thus, offering PT tools does not in itself lead to decreased total spending, since few people who have access to them use them to browse for lower-cost services [ 7 , 11 ].

In a recent paper, researchers addressed 1 possible reason for low engagement—lack of awareness. They implemented an extensive targeted online advertising campaign using Google Advertisements to increase awareness and assessed whether it increased New Hampshire’s PT website use. Their findings suggested that although lack of awareness is a possible reason for the low impact of PT tools in health care spending, structural factors might affect the use of health care information [ 12 ]. Individuals may not be able to exactly determine their out-of-pocket expenses from the information provided.

Surprisingly, there is little research on the awareness and usability of PT information after the current PT legislation went into effect. A recent study [ 13 ] highlighted the nonusability of existing machine-readable files for employers, policy makers, researchers, or consumers, and this paper adds to this literature by answering the question—is PT information as provided usable for patients or machines? Clearly, if it is of value to patients, it can be useful; the reason to take the perspective of machines was to examine whether this information as provided might also be useful for third-party programs that can extract information from the provided data (to subsequently help patients through other ways of presenting this information perhaps). We address this question through a combination of user experiments and data schema analysis. While there are recent papers that have also argued that PT data have deficiencies [ 13 , 14 ], ours is the first to combine user experiments with analysis of data schema from several hospitals in Florida to make a combined claim on value for patients and machines. We hope this can add to the discourse on PT and what needs to be done to extract value for patients and the health care system as a whole.

Impact of PT Tools

The impact of PT tools on consumers and health care facilities has been investigated in the literature. Some studies showed that consumers with access to PT tools are more likely to reduce forgone needed services over time. Moreover, consumers who use tools tend to find the lowest service prices [ 8 , 15 - 17 ]. A few studies investigated the impact of PT tools on the selection of health care facilities. They illustrated that some consumers tend to change health care facilities pursuing lower prices, while some others prefer to stay with expensive ones, although they are aware of some other facilities that offer lower prices [ 9 , 18 ]. Finally, some research studied the impact of PT tools on cost and showed that some consumers experienced no effect, while others experienced decreases in average consumer expenses [ 8 , 17 , 18 ]. However, the impact of PT tools on health care facilities is inconclusive, meaning different studies concluded different effects. Some stated that PT tools decrease the prices of imaging and laboratory services, while others said that although public charge disclosure lowers health care facility charges, the final prices remained unchanged [ 17 , 18 ].

Legislation Related Works

In a study, researchers considered 20 leading US hospitals to assess provided chargemasters to understand to what extent patients can obtain information from websites to determine the out-of-pocket costs [ 19 ]. Their findings showed that although all hospitals provided chargemasters on their websites, they rarely offered transparent information, making it hard for patients to determine out-of-pocket costs. Their analysis used advanced diagnostic imaging services to assess hospitals’ chargemasters since these are the most common services people look for. Mehrotra et al [ 7 ] also mentioned that the most common searches belonged to outpatient visits, magnetic resonance imaging (MRI), and emergency department visits. To this end, we used “MRI scan of the brain before and after contrast” as one of the shoppable services in our analysis. Another study examined imaging services in children’s hospitals (n=89), restricting the analysis to hospitals (n=35) that met PT requirements—published chargemaster rates, discounted cash prices, and payer-negotiated prices in a machine-readable file, and published costs for 300 common shoppable medical services in a consumer-friendly format. Their study revealed that, in addition to a broad range of imaging service charges, most hospitals lack the machine-readable file requirement [ 20 ].

Arvisais-Anhalt et al [ 21 ] identified 11 hospitals with available chargemasters in Dallas County to compare the prices of a wide range of available services. They observed significant variations for a laboratory test: partial thromboplastin time, a medication: 5 mg tablet of amlodipine, and a procedure: circumcision. Reddy et al [ 22 ] focus on New York State to assess the accessibility and usability of hospitals’ chargemasters from patients’ viewpoint. They found that 189 out of 202 hospitals had a locatable chargemaster on their home page. However, only 37 hospitals contain the Current Procedural Terminology (CPT) code, which makes those without the CPT code unusable due to the existence of many different descriptions for the same procedure; for example, an elective heart procedure had 34 entries. We add to this considerable literature by examining a subset of Florida hospitals.

In a competitive market, higher-quality goods and services require higher prices [ 23 ]. Based on this, Patel et al [ 24 ] examined the relationship between the Diagnosis-Related Group (DRG) chargemaster and quality measures. Although prior research found no convincing evidence that hospitals with greater costs also delivered better care [ 25 ], they discovered 2 important quality indicators that were linked to standard charges positively and substantially—mortality rate and readmission rates—which both are quality characteristics that are in line with economic theory. Moreover, Patel et al [ 24 ] studied the variety of one of the most commonly performed services (vaginal delivery) as a DRG code, which motivated us to select “Vaginal delivery without sterilization or D&C without CC/MCC” as another shoppable service in our analysis.

Ethical Considerations

All data used in this study, including the secondary data set obtained from hospitals’ websites and the data collected during the user experiment, underwent a thorough anonymization process. The study was conducted under protocols approved by the University of South Florida institutional review board (STUDY004145: “Effect of price transparency regulation (PTR) on the public decisions”) under HRP-502b(7) Social Behavioral Survey Consent. This approval encompassed the use of publicly available anonymized secondary data from hospitals’ websites, as well as a user experiment aimed at assessing awareness of the PT rule and the usability of hospitals’ files. No individual-specific data were collected during the experiment, which solely focused on capturing subjects’ awareness and opinions regarding the PT rule and associated files. At the onset of the experiment, participants were provided with a downloadable consent form and were allowed to withdraw their participation at any time. Survey participants were offered a US $2 reward, and their involvement was entirely anonymous.

Data Collection

According to CMS, “Starting January 1, 2021, each hospital operating in the United States will be required to provide clear, accessible pricing information online about the items and services they provide in two ways: 1- As a comprehensive machine-readable file with all items and services. 2- In a display of shoppable services in a consumer-friendly format.” As stated, files available on hospitals’ websites should be consumer-friendly, so the question of whether these files are for users arises. On the other hand, as stated, files should be machine-readable, so again the question of whether these files are for machines arises. Below we try to answer both questions in detail, respectively.

Value for Users: User Experiments

When a public announcement is disseminated, its efficacy relies on ensuring widespread awareness and facilitating practical use during times of necessity. Previous research on PT announcements has highlighted the challenges faced by patients in accurately estimating out-of-pocket expenses. However, a fundamental inquiry arises—are individuals adequately informed about the availability of tools that enable them to estimate their out-of-pocket costs for desired services? To address this, we conducted a survey to assess public awareness of PT legislation. The survey encompassed a range of yes or no and multiple-choice questions aimed at gauging participants’ familiarity with the PT rule in health care and their entitlement to obtain cost information prior to receiving a service. Additionally, we inquired about participants’ knowledge of resources for accessing pricing information and whether they were aware of the PT rule. Furthermore, we incorporated follow-up questions to ensure that the survey responses were not provided arbitrarily, thereby securing reliable and meaningful outcomes.

Moreover, considering the previously established evidence of subpar usability associated with the currently available files, we propose streamlining the existing files and developing a user-friendly and comprehensive document for conducting an A/B test. This test aims to evaluate which file better facilitates participants in accurately estimating their out-of-pocket costs. In collaboration with Florida Blue experts during biweekly meetings throughout the entire process outlined in this paper, the authors determined the optimal design for the summary table. This design, which presents prices in a more user-friendly format, enhancing overall participant comprehension, was used during the A/B testing. Participants were randomly assigned to either access the hospitals’ files or a meticulously constructed summary table, manually created in Excel, prominently displaying cost information (Please note that all files, including the hospitals’ files and our Excel file, are made available in the same format [Excel] on a cloud-based platform to eliminate any disparities in accessing the files. This ensures equitable ease of finding, downloading, and opening files, as accessing the hospitals’ files typically requires significant effort.). The experiment entailed presenting 3 distinct health-related scenarios and instructing participants to locate the price for the requested service. Subsequently, participants were asked to provide the hospital name, service price, insurer name, and insurance plan. Additionally, we sought feedback on the perceived difficulty of finding the requested service and their priority for selecting hospitals [ 26 ], followed by Likert scale questions to assess participants’ evaluation of the provided file’s efficacy in facilitating price retrieval.

The experiments were conducted to investigate the following questions: (1) Are the individuals aware of the PT legislation? and (2) Is the information provided usable for patients? To evaluate the usability of files found on websites, we selected 2 prevalent services based on existing literature and 2 other services recommended as high-demand ones by Florida Blue experts, Table 1 . Furthermore, meticulous efforts were made to ensure that both the control and treatment groups encountered identical circumstances, thus allowing for a systematic examination of the disparities solely attributable to variations in data representation.

a DRG: Diagnosis-Related Group.

b D&C: dilation and curettage.

c CC/MCC: complication or comorbidity/major complication or comorbidity.

d CPT: Current Procedural Terminology.

e MRI: magnetic resonance imaging.

Participants

A total of 67 adults (30 female individuals; mean 41.43, SD 12.39 years) were recruited on the Amazon Mechanical Turk platform, with no specific selection criteria other than being located in the United States.

We focused on 75 main hospitals (ie, the main hospital refers to distinguish a hospital from smaller clinics or specialized medical centers within the same health system) in the state of Florida. When we searched their websites for PT files (machine-readable files), only 89% (67/75) of hospitals included machine-readable files. According to the PT legislation, these files were supposed to contain information about 300 shoppable services. However, only 58% (39/67) of hospitals included information such as insurer prices in their files. Therefore, for the rest of the analysis, we only included the 39 hospitals that have the required information in their machine-readable files on their websites. We created an Excel file that included those 39 hospitals along with the 4 services—CPT 45380, 29827 and 70553 and DRG 807—mentioned in the literature ( Table 1 ) for 4 popular (suggested by Florida Blue experts) commercial carriers (Health Maintenance Organization [HMO] or Preferred Provider Organization [PPO] plans)—Aetna, Florida Blue, Cigna, and UnitedHealthcare.

Participants were recruited for the pilot and randomly assigned by the Qualtrics XM platform to answer multiple-choice questions and fill in blanks based on the given scenarios. First, participants responded to questions regarding the awareness of PT and then were divided into 2 groups randomly to answer questions regarding the usability of hospital-provided PT information. One group was assigned hospitals’ website links (control group), while the other group was given an Excel file with the same information provided in files on hospitals’ websites, but in a manner that was designed to allow easier comparison of prices across hospitals ( Multimedia Appendix 1 ). Participants were given 3 scenarios that asked them to find a procedure’s price based on their hospital and insurer selection to compare hospital-provided information with Excel. We provide some examples of hospitals’ files and our Excel file in Multimedia Appendix 1 and the survey experiment questions in Multimedia Appendix 2 .

Value for Machines: Schema Integration—Machine-Readable Files Representation

Through meticulous investigation of machine-readable files from 39 hospitals, we discovered that these files may vary in formats such as CSV or JSON, posing a challenge for machines to effectively manage the data within these files. Another significant obstacle arises from the lack of uniformity in data representation across these files, rendering them unsuitable for machine use without a cohesive system capable of processing them collectively. Our analysis revealed that hospitals within a single health system exhibit consistent data representation, although service prices may differ (we include both the same and different chargemaster prices in our study), while substantial disparities in data representation exist between hospitals affiliated with different health systems.

Moving forward, we will use the terms “data representation” and “schema” interchangeably, with “schema” denoting its database management context. In this context, a schema serves as a blueprint outlining the structure, organization, and relationships of data within a database system. It encompasses key details such as tables, fields, data types, and constraints that define the stored data. To systematically illustrate schema differences among hospitals associated with different health systems, we adopted the methodology outlined in reference [ 27 ] for schema integration, which offers a valid approach for comparing distinct data representations. The concept of schema integration encompasses four common categories: (1) identical: hospitals within the same health system adhere to this concept as their representations are identical; (2) equivalent: while hospitals in health system “A” may present different representations from those in health system “B,” they possess interchangeable columns; (3) compatible: in cases where hospitals across different health systems are neither identical nor equivalent, the modeling constructs, designer perception, and integrity constraints do not contradict one another; and (4) incompatible: in situations where hospitals within different health systems demonstrate contradictory representations, distinct columns exist for each health system due to specification incoherence.

Our analysis focused on health systems in Florida that encompassed a minimum of 4 main hospitals, using the most up-to-date data available on their respective websites. Within this scope, we identified 8 health systems with at least 4 main hospitals, of which 88% (7/8) of health systems had published machine-readable files on their websites. Consequently, our analysis included 65% (36/55) of hospitals that possessed machine-readable files available on their websites. To facilitate further investigation by interested researchers, we have made the analyzed data accessible on a cloud-based platform. During our analysis, we meticulously extracted the schema of each health system by closely scrutinizing the hospitals associated with each health system, capturing key details such as tables, fields, and data types. Subsequently, we compiled a comprehensive master field name table trying to have the same data type and field names that make it easier for machines to retrieve information. We elaborate on the master field names table in greater detail within the results section.

Value for Users

Question 1 (pt awareness).

Based on the responses, it is evident that participants need to be made aware of the PT legislation. Among the participants, 64% (49/76) reported that they had not heard about the legislation. However, they believe it is important to know the service price before receiving it—response charts are provided in Multimedia Appendix 3 .

Question 2 (Human Usability of PT Information)

Based on the responses to scenarios, the average number of correct responses is not equal between the 2 groups, that is, the treatment group (mean 1.23, SD 1.30) found more correct answers than the control group (mean 2.76, SD 0.58; t 65 =6.46; P <.049; d =1.52). The t tests (2-tailed) for the other questions in the experiment are in Multimedia Appendix 4 .

These suggest that current files on hospitals’ websites are not consumer-friendly, and participants find it challenging to estimate out-of-pocket costs for a desired service. For this reason, in addition to making the files easier to use, this information should also include thorough documentation that explains what each column represents, up to what amount an insurer covers for a specific service, or the stated price covers up to how many days of a particular service, that is, “contracting method.” For example, based on consulting with one of the senior network analysts of Florida Blue, some prices for a service like DRG 807 are presented as per diem costs, and based on the current information on these files, it cannot be recognizable without having comprehensive documentation for them.

Value for Machines

After carefully reviewing all machine-readable file schemas, we create a master field name table, including the available field names in machine-readable files ( Table 2 ). According to Table 2 , the first column represents master field names that we came up with, and the following columns each represent hospitals within a health system. The “✓” mark shows that hospitals within a health system have identical field names as we consider as master field names and the “written” cells show equivalent field names, meaning that hospitals within that health system use different field names—we write what they use in their representation—while the content is equivalent to what we select as the master field name. The “❋” mark means that although hospitals within health system #2 provide insurer names and plans in their field names, some codes make those columns unusable for machines to recognize them the same as master field names. We also include the type of field names for all representations in parentheses.

a As noted previously, since we focus on the health system level instead of the hospital level, our schema does not have hospital-level information; however, it would be beneficial to add hospital information to the table.

b ✓: it means the given master field name in that row appears on the given health system file in that column.

c str: shows “string” as the data type.

d int: shows “integer” as the data type.

e CPT: Current Procedural Terminology.

f HCPCS: Health care Common Procedure Coding System.

g Not applicable.

h Apr: all patients refined.

i DRG: Diagnosis-Related Group.

j Ms: Medicare severity.

k CDM: charge description master.

l UB: uniform billing.

m float: it shows “float” as the data type.

n ❋: it means that although hospitals within health system #2 provide insurer names and plans in their field names, some codes make those columns unusable for machines to recognize them the same as master field names.

We did reverse engineering and drew entity-relationship diagrams (ERDs) for each hospital based on their data representation. However, as hospitals within the same health system have the same ERDs, we only include 1 ERD for each health system ( Figure 1 ). According to Figure 1 , although hospitals have tried to follow an intuitive structure, we can still separate them into three groups: (1) group I: all hospitals within this group have several columns for different insurers. As shown in the ERDs, we decided to have a separate entity, called “Insurance” for this group; (2) group II: all hospitals within this group have many sheets, and each sheet belongs to a specific insurer with a specific plan. As shown in the ERDs, we decided to create an “Insurance_Name” entity for this group’s ERD to show the difference in data representation; and (3) group III: all hospitals within this system have a “payer” column which includes the names of insurers without their plans. As shown in the ERDs, we decided to put this column as an attribute in the “Service” entity, and do not have an “Insurance” entity for this group’s ERD.

In conclusion, although most hospitals have adopted group I logic for data representation, for full similarity, a standard representation with the same intuitive field names (like what we suggest as the master field name; Table 2 ) should be proposed so that it can cover all systems’ data representations and be used as machine-readable file, for at least machine benefits. Mainly, standardization in the format and semantics of the provided data can help substantially in making the data more machine friendly.

methods of data collection on research

Comparison With New CMS Guidelines

Recently, CMS has published guidelines regarding the PT legislation [ 28 ]. The most recent CMS guideline is a step forward in ensuring standardization but is still only recommended and is not mandatory. These guidelines exhibit overlaps with our fields in Table 2 , with slight differences attributed to granularities. Our observation reveals that hospitals within the same health system adopt a uniform schema. Therefore, our suggested schema operates on the granularity of health systems rather than individual hospitals.

The recent CMS guidelines allocate 24% (6/25) of field names specifically to hospital information, encompassing details such as “Hospital Name,” “Hospital File Date,” “Version,” “Hospital Location,” “Hospital Financial Aid Policy,” and “Hospital Licensure Information.” These details, absent in current hospital files, are crucial for informed decision-making. As noted previously, since we focus on the health system level instead of the hospital level, our schema does not have hospital-level information; however, it would be beneficial to add hospital information to the tables.

Our analysis reveals that the 11 field names in Table 2 align with the field names in the new CMS guidelines, demonstrating a substantial overlap of 58% (11/19). The corresponding CMS field names (compatible with our schema) include “Item or Service Description (Description or CDM Service Description),” “Code (Code),” “Code Type (Code Type),” “Setting (Patient Class),” “Gross Charge (Gross Charge),” “Discounted Cash Price (Discounted Cash Price),” “Payer Name (Insurer Name),” “Plan Name (Insurer Plan),” “Payer Specific Negotiated Charge: Dollar Amount (Price),” “De-identified Minimum Negotiated Charge (Min Negotiated Rate),” and “De-identified Maximum Negotiated Charge (Max Negotiated Rate).” Additionally, both our schema and the new CMS guidelines propose data types for each field name.

In our schema, which represents current hospitals’ files, there are 5 field names absent in the new CMS guidelines “Revenue Description,” “Revenue Code,” “Package/Line Level,” “Procedure ID,” and “Self Pay.” Conversely, the new CMS guidelines introduce 8 additional field names “Billing Class,” “Drug Unit of Measurement,” “Drug Type of Measurement,” “Modifiers,” “Payer Specific Negotiated Charge: Percentage,” “Contracting Method,” “Additional Generic Notes,” and “Additional Payer-Specific Notes.” We regard these new field names as providing further detailed information and enhancing consumer decision-making. If hospitals within a health system adopt consistent formats and can map their formats to the new CMS guidelines clearly in a mapping document they also provide, this can be more useful than the current optional guideline that is suggested.

In summary, since our analysis is based on the current data schema that hospitals have in place, we believe the schema we put out is easier to implement with minimal change to what the hospitals are currently doing. However, given the recent CMS guidelines, we recommend adding 8 additional fields as well as hospital-specific information.

Implications

The PT legislation aims to enable informed decision-making, reduce out-of-pocket expenses, and decrease overall health care expenditures. This study investigates the usage of current files by individuals and machines. Our results, unfortunately, suggest that PT data—as currently reported—appear to be neither useful for patients nor machines, raising important questions as to what these appear to be achieving today. Moreover, the findings indicate that even individuals with basic computer knowledge struggle with the usability of these files, highlighting the need for significant revisions to make them consumer-friendly and accessible to individuals of all technical proficiency levels. Additionally, inconsistencies in data representation between hospitals affiliated with different health systems pose challenges for machines, necessitating schema design improvements and the implementation of a standardized data representation. By addressing these concerns, PT legislation can achieve consistency and enhance machine readability, thus improving its effectiveness in promoting informed decision-making and reducing health care costs.

Although the official announcement of PT legislation is recent, prior studies [ 15 - 17 ] have attempted to evaluate the usability of PT, while subsequent studies [ 19 - 22 ] have examined the effectiveness of PT tools following the announcement. However, despite the introduction of PT rules, it appears that the usability of these files has not undergone significant improvements, indicating the necessity for proactive measures from responsible executives to ensure the effectiveness of this legislation. Our analysis of this matter emphasizes 2 primary factors—a lack of awareness among stakeholders and the challenges associated with using files due to inconsistencies in their format and representation.

As of April 2023, the CMS has issued over 730 warning notices and 269 requests for Corrective Action Plans. A total of 4 hospitals have faced Civil Monetary Penalties for noncompliance, and these penalties are publicly disclosed on the CMS website. The remaining hospitals subjected to comprehensive compliance reviews have either rectified their deficiencies or are actively engaged in doing so. While we acknowledge these efforts to comply with PT rules, our research revealed a notable disparity in data representation among hospitals affiliated with different health systems. Consequently, we focused on schema design and proposed the implementation of a master field name that encompasses a comprehensive data representation derived from an analysis of 36 hospitals. Standardizing the data representation across all health systems’ machine-readable files will effectively address concerns about consistency. Therefore, significant modifications are required for the PT legislation to enhance machine readability and provide clearer guidance on the design and structure of the files’ schema. If the hospital-provided information is consistent and of high quality, PT tools provided by health insurers may be able to estimate an individual’s total expenses more accurately.

Limitations

Our objective was to have an equal number in both groups. However, in the case of the group tasked with obtaining information from the hospitals’ websites, most did not finish the task and dropped out without completing it. This occurred because the task of retrieving the cost from the hospitals’ websites in its current form is complex, as indicated by feedback from some participants. Only 19% (13/67) completed the task in that group (control group). Although this is a limitation of the study, it also highlights the complexity of obtaining cost information from hospitals’ websites in the current form. In the treatment group, 81% (54 out of 67) of participants completed the task of retrieving the data, and the completion percentage was much higher.

Conclusions

Due to the poor usability and inconsistency of the formats, we, unfortunately, did not find evidence that the PT rule as implemented currently is useful to consumers, researchers, or policy makers (despite the legislation’s goals that files are “consumer-friendly” and “machine-readable”). As 1 solution, we suggest a master field name for the data representation of machine-readable files to make them consistent, at least for the machines. Building tools that enable customers to estimate out-of-pocket costs is facilitated by having consistent machine-readable files across all health systems, which can be considered as future work for researchers and companies to help the PT rule reach its main goal, which is providing useful information for patients and reducing health care expenditures. In addition, another worthwhile approach to reducing some of the exorbitant health care costs in the United States would be to integrate clinical decision support tools into the providers’ workflow, triggered by orders for medications, diagnostic testing, and other billable services. In this regard, Bouayad et al [ 29 ] conducted experiments with physicians to demonstrate that PT, when included as part of the system they interact with, such as clinical decision support integrated into electronic health record systems, can significantly aid in cost reduction. This is a promising direction for practice but needs to be implemented carefully to avoid unanticipated consequences, such as scenarios where cost is incorrectly viewed as a proxy for quality, or where the use of this information introduces new biases for physicians and patients.

Conflicts of Interest

None declared.

Example of Excel format of hospitals’ files and our created Excel file.

Survey questions and experiment scenarios.

Participants’ responses chart regarding price transparency awareness.

The t test analysis regarding human usability of price transparency information based on participants’ responses.

  • McGough M, Winger A, Rakshit S, Amin K. How has U.S. spending on healthcare changed over time? Health System Tracker. 2022. URL: https://www.healthsystemtracker.org/chart-collection/u-s-spending-healthcare-changed-time/ [accessed 2024-03-13]
  • Christensen HB, Floyd E, Maffett M. The only prescription is transparency: the effect of charge-price-transparency regulation on healthcare prices. Manag Sci. 2020;66(7):2861-2882. [ CrossRef ]
  • Muir MA, Alessi SA, King JS. Clarifying costs: can increased price transparency reduce healthcare spending? UC Hastings Research Paper No. 38 (SSRN). Feb 26, 2013.:319-367. [ FREE Full text ] [ CrossRef ]
  • Reinhardt UE. Health care price transparency and economic theory. JAMA. 2014;312(16):1642-1643. [ CrossRef ] [ Medline ]
  • CY 2020 hospital Outpatient Prospective Payment System (OPPS) policy changes: hospital price transparency requirements (CMS-1717-F2). CMS.gov. 2020. URL: https://tinyurl.com/mrafxtvd [accessed 2024-03-13]
  • Secretary Azar statement on proposed rule for hospital price transparency. HHS.gov. 2020. URL: https://tinyurl.com/yc4dx2vx [accessed 2024-03-13]
  • Mehrotra A, Brannen T, Sinaiko AD. Use patterns of a state health care price transparency web site: what do patients shop for? Inquiry. 2014;51:0046958014561496. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Desai S, Hatfield LA, Hicks AL, Sinaiko AD, Chernew ME, Cowling D, et al. Offering a price transparency tool did not reduce overall spending among California public employees and retirees. Health Aff (Millwood). 2017;36(8):1401-1407. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sinaiko AD, Joynt Maddox KE, Rosenthal MB. Association between viewing health care price information and choice of health care facility. JAMA Intern Med. 2016;176(12):1868-1870. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Desai S, Hatfield LA, Hicks AL, Chernew ME, Mehrotra A. Association between availability of a price transparency tool and outpatient spending. JAMA. 2016;315(17):1874-1881. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sinaiko AD, Rosenthal MB. Examining a health care price transparency tool: who uses it, and how they shop for care. Health Aff (Millwood). 2016;35(4):662-670. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Desai SM, Shambhu S, Mehrotra A. Online advertising increased New Hampshire residents' use of provider price tool but not use of lower-price providers. Health Aff (Millwood). 2021;40(3):521-528. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kona M, Corlette S. Hospital and insurer price transparency rules now in effect but compliance is still far away. Health Affairs Forefront. 2022. URL: hhttps://tinyurl.com/3x6ymxf2 [accessed 2024-03-13]
  • Wheeler C, Taylor R. New year, new CMS price transparency rule for hospitals. Health Affairs Forefront. 2021. URL: https://www.healthaffairs.org/content/forefront/new-year-new-cms-price-transparency-rule-hospitals [accessed 2024-03-13]
  • Chernew M, Cooper Z, Larsen-Hallock E, Morton FS. Are health care services shoppable? Evidence from the consumption of lower-limb MRI scans. National Bureau of Economic Research. 2021. URL: https://www.nber.org/papers/w24869 [accessed 2024-03-13]
  • Mehrotra A, Dean KM, Sinaiko AD, Sood N. Americans support price shopping for health care, but few actually seek out price information. Health Aff (Millwood). 2017;36(8):1392-1400. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Brown ZY. Equilibrium effects of health care price information. Rev Econ Stat. 2019;101(4):699-712. [ FREE Full text ] [ CrossRef ]
  • Wu SJ, Sylwestrzak G, Shah C, DeVries A. Price transparency for MRIs increased use of less costly providers and triggered provider competition. Health Aff (Millwood). 2014;33(8):1391-1398. [ CrossRef ] [ Medline ]
  • Glover M, Whorms DS, Singh R, Almeida RR, Prabhakar AM, Saini S, et al. A radiology-focused analysis of transparency and usability of top U.S. hospitals' chargemasters. Acad Radiol. 2020;27(11):1603-1607. [ CrossRef ] [ Medline ]
  • Hayatghaibi SE, Alves VV, Ayyala RS, Dillman JR, Trout AT. Transparency and variability in pricing for pediatric outpatient imaging in US children's hospitals. JAMA Netw Open. 2022;5(3):e220736. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Arvisais-Anhalt S, McDonald S, Park JY, Kapinos K, Lehmann CU, Basit M. Survey of hospital chargemaster transparency. Appl Clin Inform. 2021;12(2):391-398. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Reddy S, Daly G, Baban S, Kadesh A, Block AE, Grimes CL. Accessibility and usability of hospital chargemasters in New York state. J Gen Intern Med. 2022;37(8):2130-2131. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Robinson JC. Hospital quality competition and the economics of imperfect information. Milbank Q. 1988;66(3):465-481. [ Medline ]
  • Patel KN, Mazurenko O, Ford E. Analysis of hospital quality measures and web-based chargemasters, 2019: cross-sectional study. JMIR Form Res. 2021;5(8):e26887. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Batty M, Ippolito B. Mystery of the chargemaster: examining the role of hospital list prices in what patients actually pay. Health Aff (Millwood). 2017;36(4):689-696. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Muhlestein DB, Wilks CEA, Richter JP. Limited use of price and quality advertising among American hospitals. J Med Internet Res. 2013;15(8):e185. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Batini C, Lenzerini M, Navathe SB. A comparative analysis of methodologies for database schema integration. ACM Comput Surv. 1986;18(4):323-364. [ FREE Full text ] [ CrossRef ]
  • Voluntary hospital price transparency machine-readable file sample format data dictionary (version 1.1). CMS.gov. URL: https:/​/www.​cms.gov/​files/​document/​hospital-price-transparency-machine-readable-data-dictionary-tall.​pdf [accessed 2024-03-13]
  • Bouayad L, Padmanabhan B, Chari K. Can recommender systems reduce healthcare costs? the role of time pressure and cost transparency in prescription choice. MIS Q. 2020;44(4):1859-1903. [ CrossRef ]

Abbreviations

Edited by S He; submitted 07.07.23; peer-reviewed by KN Patel, R Marshall, G Deckard; comments to author 03.12.23; revised version received 21.01.24; accepted 26.02.24; published 29.03.24.

©Negar Maleki, Balaji Padmanabhan, Kaushik Dutta. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 29.03.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Read our research on: Abortion | Podcasts | Election 2024

Regions & Countries

What the data says about abortion in the u.s..

Pew Research Center has conducted many surveys about abortion over the years, providing a lens into Americans’ views on whether the procedure should be legal, among a host of other questions.

In a  Center survey  conducted nearly a year after the Supreme Court’s June 2022 decision that  ended the constitutional right to abortion , 62% of U.S. adults said the practice should be legal in all or most cases, while 36% said it should be illegal in all or most cases. Another survey conducted a few months before the decision showed that relatively few Americans take an absolutist view on the issue .

Find answers to common questions about abortion in America, based on data from the Centers for Disease Control and Prevention (CDC) and the Guttmacher Institute, which have tracked these patterns for several decades:

How many abortions are there in the U.S. each year?

How has the number of abortions in the u.s. changed over time, what is the abortion rate among women in the u.s. how has it changed over time, what are the most common types of abortion, how many abortion providers are there in the u.s., and how has that number changed, what percentage of abortions are for women who live in a different state from the abortion provider, what are the demographics of women who have had abortions, when during pregnancy do most abortions occur, how often are there medical complications from abortion.

This compilation of data on abortion in the United States draws mainly from two sources: the Centers for Disease Control and Prevention (CDC) and the Guttmacher Institute, both of which have regularly compiled national abortion data for approximately half a century, and which collect their data in different ways.

The CDC data that is highlighted in this post comes from the agency’s “abortion surveillance” reports, which have been published annually since 1974 (and which have included data from 1969). Its figures from 1973 through 1996 include data from all 50 states, the District of Columbia and New York City – 52 “reporting areas” in all. Since 1997, the CDC’s totals have lacked data from some states (most notably California) for the years that those states did not report data to the agency. The four reporting areas that did not submit data to the CDC in 2021 – California, Maryland, New Hampshire and New Jersey – accounted for approximately 25% of all legal induced abortions in the U.S. in 2020, according to Guttmacher’s data. Most states, though,  do  have data in the reports, and the figures for the vast majority of them came from each state’s central health agency, while for some states, the figures came from hospitals and other medical facilities.

Discussion of CDC abortion data involving women’s state of residence, marital status, race, ethnicity, age, abortion history and the number of previous live births excludes the low share of abortions where that information was not supplied. Read the methodology for the CDC’s latest abortion surveillance report , which includes data from 2021, for more details. Previous reports can be found at  stacks.cdc.gov  by entering “abortion surveillance” into the search box.

For the numbers of deaths caused by induced abortions in 1963 and 1965, this analysis looks at reports by the then-U.S. Department of Health, Education and Welfare, a precursor to the Department of Health and Human Services. In computing those figures, we excluded abortions listed in the report under the categories “spontaneous or unspecified” or as “other.” (“Spontaneous abortion” is another way of referring to miscarriages.)

Guttmacher data in this post comes from national surveys of abortion providers that Guttmacher has conducted 19 times since 1973. Guttmacher compiles its figures after contacting every known provider of abortions – clinics, hospitals and physicians’ offices – in the country. It uses questionnaires and health department data, and it provides estimates for abortion providers that don’t respond to its inquiries. (In 2020, the last year for which it has released data on the number of abortions in the U.S., it used estimates for 12% of abortions.) For most of the 2000s, Guttmacher has conducted these national surveys every three years, each time getting abortion data for the prior two years. For each interim year, Guttmacher has calculated estimates based on trends from its own figures and from other data.

The latest full summary of Guttmacher data came in the institute’s report titled “Abortion Incidence and Service Availability in the United States, 2020.” It includes figures for 2020 and 2019 and estimates for 2018. The report includes a methods section.

In addition, this post uses data from StatPearls, an online health care resource, on complications from abortion.

An exact answer is hard to come by. The CDC and the Guttmacher Institute have each tried to measure this for around half a century, but they use different methods and publish different figures.

The last year for which the CDC reported a yearly national total for abortions is 2021. It found there were 625,978 abortions in the District of Columbia and the 46 states with available data that year, up from 597,355 in those states and D.C. in 2020. The corresponding figure for 2019 was 607,720.

The last year for which Guttmacher reported a yearly national total was 2020. It said there were 930,160 abortions that year in all 50 states and the District of Columbia, compared with 916,460 in 2019.

  • How the CDC gets its data: It compiles figures that are voluntarily reported by states’ central health agencies, including separate figures for New York City and the District of Columbia. Its latest totals do not include figures from California, Maryland, New Hampshire or New Jersey, which did not report data to the CDC. ( Read the methodology from the latest CDC report .)
  • How Guttmacher gets its data: It compiles its figures after contacting every known abortion provider – clinics, hospitals and physicians’ offices – in the country. It uses questionnaires and health department data, then provides estimates for abortion providers that don’t respond. Guttmacher’s figures are higher than the CDC’s in part because they include data (and in some instances, estimates) from all 50 states. ( Read the institute’s latest full report and methodology .)

While the Guttmacher Institute supports abortion rights, its empirical data on abortions in the U.S. has been widely cited by  groups  and  publications  across the political spectrum, including by a  number of those  that  disagree with its positions .

These estimates from Guttmacher and the CDC are results of multiyear efforts to collect data on abortion across the U.S. Last year, Guttmacher also began publishing less precise estimates every few months , based on a much smaller sample of providers.

The figures reported by these organizations include only legal induced abortions conducted by clinics, hospitals or physicians’ offices, or those that make use of abortion pills dispensed from certified facilities such as clinics or physicians’ offices. They do not account for the use of abortion pills that were obtained  outside of clinical settings .

(Back to top)

A line chart showing the changing number of legal abortions in the U.S. since the 1970s.

The annual number of U.S. abortions rose for years after Roe v. Wade legalized the procedure in 1973, reaching its highest levels around the late 1980s and early 1990s, according to both the CDC and Guttmacher. Since then, abortions have generally decreased at what a CDC analysis called  “a slow yet steady pace.”

Guttmacher says the number of abortions occurring in the U.S. in 2020 was 40% lower than it was in 1991. According to the CDC, the number was 36% lower in 2021 than in 1991, looking just at the District of Columbia and the 46 states that reported both of those years.

(The corresponding line graph shows the long-term trend in the number of legal abortions reported by both organizations. To allow for consistent comparisons over time, the CDC figures in the chart have been adjusted to ensure that the same states are counted from one year to the next. Using that approach, the CDC figure for 2021 is 622,108 legal abortions.)

There have been occasional breaks in this long-term pattern of decline – during the middle of the first decade of the 2000s, and then again in the late 2010s. The CDC reported modest 1% and 2% increases in abortions in 2018 and 2019, and then, after a 2% decrease in 2020, a 5% increase in 2021. Guttmacher reported an 8% increase over the three-year period from 2017 to 2020.

As noted above, these figures do not include abortions that use pills obtained outside of clinical settings.

Guttmacher says that in 2020 there were 14.4 abortions in the U.S. per 1,000 women ages 15 to 44. Its data shows that the rate of abortions among women has generally been declining in the U.S. since 1981, when it reported there were 29.3 abortions per 1,000 women in that age range.

The CDC says that in 2021, there were 11.6 abortions in the U.S. per 1,000 women ages 15 to 44. (That figure excludes data from California, the District of Columbia, Maryland, New Hampshire and New Jersey.) Like Guttmacher’s data, the CDC’s figures also suggest a general decline in the abortion rate over time. In 1980, when the CDC reported on all 50 states and D.C., it said there were 25 abortions per 1,000 women ages 15 to 44.

That said, both Guttmacher and the CDC say there were slight increases in the rate of abortions during the late 2010s and early 2020s. Guttmacher says the abortion rate per 1,000 women ages 15 to 44 rose from 13.5 in 2017 to 14.4 in 2020. The CDC says it rose from 11.2 per 1,000 in 2017 to 11.4 in 2019, before falling back to 11.1 in 2020 and then rising again to 11.6 in 2021. (The CDC’s figures for those years exclude data from California, D.C., Maryland, New Hampshire and New Jersey.)

The CDC broadly divides abortions into two categories: surgical abortions and medication abortions, which involve pills. Since the Food and Drug Administration first approved abortion pills in 2000, their use has increased over time as a share of abortions nationally, according to both the CDC and Guttmacher.

The majority of abortions in the U.S. now involve pills, according to both the CDC and Guttmacher. The CDC says 56% of U.S. abortions in 2021 involved pills, up from 53% in 2020 and 44% in 2019. Its figures for 2021 include the District of Columbia and 44 states that provided this data; its figures for 2020 include D.C. and 44 states (though not all of the same states as in 2021), and its figures for 2019 include D.C. and 45 states.

Guttmacher, which measures this every three years, says 53% of U.S. abortions involved pills in 2020, up from 39% in 2017.

Two pills commonly used together for medication abortions are mifepristone, which, taken first, blocks hormones that support a pregnancy, and misoprostol, which then causes the uterus to empty. According to the FDA, medication abortions are safe  until 10 weeks into pregnancy.

Surgical abortions conducted  during the first trimester  of pregnancy typically use a suction process, while the relatively few surgical abortions that occur  during the second trimester  of a pregnancy typically use a process called dilation and evacuation, according to the UCLA School of Medicine.

In 2020, there were 1,603 facilities in the U.S. that provided abortions,  according to Guttmacher . This included 807 clinics, 530 hospitals and 266 physicians’ offices.

A horizontal stacked bar chart showing the total number of abortion providers down since 1982.

While clinics make up half of the facilities that provide abortions, they are the sites where the vast majority (96%) of abortions are administered, either through procedures or the distribution of pills, according to Guttmacher’s 2020 data. (This includes 54% of abortions that are administered at specialized abortion clinics and 43% at nonspecialized clinics.) Hospitals made up 33% of the facilities that provided abortions in 2020 but accounted for only 3% of abortions that year, while just 1% of abortions were conducted by physicians’ offices.

Looking just at clinics – that is, the total number of specialized abortion clinics and nonspecialized clinics in the U.S. – Guttmacher found the total virtually unchanged between 2017 (808 clinics) and 2020 (807 clinics). However, there were regional differences. In the Midwest, the number of clinics that provide abortions increased by 11% during those years, and in the West by 6%. The number of clinics  decreased  during those years by 9% in the Northeast and 3% in the South.

The total number of abortion providers has declined dramatically since the 1980s. In 1982, according to Guttmacher, there were 2,908 facilities providing abortions in the U.S., including 789 clinics, 1,405 hospitals and 714 physicians’ offices.

The CDC does not track the number of abortion providers.

In the District of Columbia and the 46 states that provided abortion and residency information to the CDC in 2021, 10.9% of all abortions were performed on women known to live outside the state where the abortion occurred – slightly higher than the percentage in 2020 (9.7%). That year, D.C. and 46 states (though not the same ones as in 2021) reported abortion and residency data. (The total number of abortions used in these calculations included figures for women with both known and unknown residential status.)

The share of reported abortions performed on women outside their state of residence was much higher before the 1973 Roe decision that stopped states from banning abortion. In 1972, 41% of all abortions in D.C. and the 20 states that provided this information to the CDC that year were performed on women outside their state of residence. In 1973, the corresponding figure was 21% in the District of Columbia and the 41 states that provided this information, and in 1974 it was 11% in D.C. and the 43 states that provided data.

In the District of Columbia and the 46 states that reported age data to  the CDC in 2021, the majority of women who had abortions (57%) were in their 20s, while about three-in-ten (31%) were in their 30s. Teens ages 13 to 19 accounted for 8% of those who had abortions, while women ages 40 to 44 accounted for about 4%.

The vast majority of women who had abortions in 2021 were unmarried (87%), while married women accounted for 13%, according to  the CDC , which had data on this from 37 states.

A pie chart showing that, in 2021, majority of abortions were for women who had never had one before.

In the District of Columbia, New York City (but not the rest of New York) and the 31 states that reported racial and ethnic data on abortion to  the CDC , 42% of all women who had abortions in 2021 were non-Hispanic Black, while 30% were non-Hispanic White, 22% were Hispanic and 6% were of other races.

Looking at abortion rates among those ages 15 to 44, there were 28.6 abortions per 1,000 non-Hispanic Black women in 2021; 12.3 abortions per 1,000 Hispanic women; 6.4 abortions per 1,000 non-Hispanic White women; and 9.2 abortions per 1,000 women of other races, the  CDC reported  from those same 31 states, D.C. and New York City.

For 57% of U.S. women who had induced abortions in 2021, it was the first time they had ever had one,  according to the CDC.  For nearly a quarter (24%), it was their second abortion. For 11% of women who had an abortion that year, it was their third, and for 8% it was their fourth or more. These CDC figures include data from 41 states and New York City, but not the rest of New York.

A bar chart showing that most U.S. abortions in 2021 were for women who had previously given birth.

Nearly four-in-ten women who had abortions in 2021 (39%) had no previous live births at the time they had an abortion,  according to the CDC . Almost a quarter (24%) of women who had abortions in 2021 had one previous live birth, 20% had two previous live births, 10% had three, and 7% had four or more previous live births. These CDC figures include data from 41 states and New York City, but not the rest of New York.

The vast majority of abortions occur during the first trimester of a pregnancy. In 2021, 93% of abortions occurred during the first trimester – that is, at or before 13 weeks of gestation,  according to the CDC . An additional 6% occurred between 14 and 20 weeks of pregnancy, and about 1% were performed at 21 weeks or more of gestation. These CDC figures include data from 40 states and New York City, but not the rest of New York.

About 2% of all abortions in the U.S. involve some type of complication for the woman , according to an article in StatPearls, an online health care resource. “Most complications are considered minor such as pain, bleeding, infection and post-anesthesia complications,” according to the article.

The CDC calculates  case-fatality rates for women from induced abortions – that is, how many women die from abortion-related complications, for every 100,000 legal abortions that occur in the U.S .  The rate was lowest during the most recent period examined by the agency (2013 to 2020), when there were 0.45 deaths to women per 100,000 legal induced abortions. The case-fatality rate reported by the CDC was highest during the first period examined by the agency (1973 to 1977), when it was 2.09 deaths to women per 100,000 legal induced abortions. During the five-year periods in between, the figure ranged from 0.52 (from 1993 to 1997) to 0.78 (from 1978 to 1982).

The CDC calculates death rates by five-year and seven-year periods because of year-to-year fluctuation in the numbers and due to the relatively low number of women who die from legal induced abortions.

In 2020, the last year for which the CDC has information , six women in the U.S. died due to complications from induced abortions. Four women died in this way in 2019, two in 2018, and three in 2017. (These deaths all followed legal abortions.) Since 1990, the annual number of deaths among women due to legal induced abortion has ranged from two to 12.

The annual number of reported deaths from induced abortions (legal and illegal) tended to be higher in the 1980s, when it ranged from nine to 16, and from 1972 to 1979, when it ranged from 13 to 63. One driver of the decline was the drop in deaths from illegal abortions. There were 39 deaths from illegal abortions in 1972, the last full year before Roe v. Wade. The total fell to 19 in 1973 and to single digits or zero every year after that. (The number of deaths from legal abortions has also declined since then, though with some slight variation over time.)

The number of deaths from induced abortions was considerably higher in the 1960s than afterward. For instance, there were 119 deaths from induced abortions in  1963  and 99 in  1965 , according to reports by the then-U.S. Department of Health, Education and Welfare, a precursor to the Department of Health and Human Services. The CDC is a division of Health and Human Services.

Note: This is an update of a post originally published May 27, 2022, and first updated June 24, 2022.

methods of data collection on research

Sign up for our weekly newsletter

Fresh data delivered Saturday mornings

Key facts about the abortion debate in America

Public opinion on abortion, three-in-ten or more democrats and republicans don’t agree with their party on abortion, partisanship a bigger factor than geography in views of abortion access locally, do state laws on abortion reflect public opinion, most popular.

About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Appl Clin Inform
  • v.6(1); 2015

Logo of appci

Data Collection Methods in Health Services Research

M.n. sarkies.

2 Monash Health, Allied Health, Melbourne, Victoria, Australia

3 Monash Health, Allied Health Research Unit, Melbourne, Victoria, Australia

K.-A. Bowles

7 Monash University, Physiotherapy Department, Allied Health Research Unit, Melbourne, Victoria, Australia

E.H. Skinner

4 Monash University, Allied Health Research Unit, Melbourne, Victoria, Australia

8 Western Health, Allied Health, Melbourne, Victoria, Australia

D. Mitchell

6 Monash University, Physiotherapy Department, Melbourne, Victoria, Australia

L. O’Brien

5 Monash University, Occupational Therapy Department, Melbourne, Victoria, Australia

1 Melbourne Health, Allied Health, Melbourne, Victoria, Australia

T.P. Haines

Hospital length of stay and discharge destination are important outcome measures in evaluating effectiveness and efficiency of health services. Although hospital administrative data are readily used as a data collection source in health services research, no research has assessed this data collection method against other commonly used methods.

Determine if administrative data from electronic patient management programs are an effective data collection method for key hospital outcome measures when compared with alternative hospital data collection methods.

Prospective observational study comparing the completeness of data capture and level of agreement between three data collection methods; manual data collection from ward-based sources, administrative data from an electronic patient management program (i.PM), and inpatient medical record review (gold standard) for hospital length of stay and discharge destination.

Manual data collection from ward-based sources captured only 376 (69%) of the 542 inpatient episodes captured from the hospital administrative electronic patient management program. Administrative data from the electronic patient management program had the highest levels of agreement with inpatient medical record review for both length of stay (93.4%) and discharge destination (91%) data.

This is the first paper to demonstrate differences between data collection methods for hospital length of stay and discharge destination. Administrative data from an electronic patient management program showed the highest level of completeness of capture and level of agreement with the gold standard of inpatient medical record review for both length of stay and discharge destination, and therefore may be an acceptable data collection method for these measures.

1. Background

Hospital length of stay and discharge destination are important outcome measures used in health services research. Length of stay is often used as a measure of healthcare efficiency by researchers, clinicians, administrators, and policy makers in planning the delivery of health services [ 1–4 ]. Hospital discharge destination is an influencing factor on length of stay [ 3 ] providing a means of quantifying numerous measures such as; requirements for sub-acute inpatient care; changes in level of care; requirement for community services following discharge, and hospital death. Due to their importance, researchers use these measures as key indicators of effectiveness and efficiency when evaluating hospital service provision.

There are numerous methods by which data may be collected for research and hospital administrative purposes. Observational length of stay and discharge destination data can be manually collected from ward-based sources including; nursing handover records, paper-based ward discharge/transfer records, paper-based inpatient medical records, direct observation by experienced personnel, and 24hour recall of key hospital personnel (e.g. Nurse Unit Manager). However this is a time-intensive data collection method which is difficult to fund in the current environment where research funding is increasingly more competitive. Retrospective data may be collected via review of scanned inpatient medical records post hospital discharge. While this approach has previously been used as a gold standard measure for multiple outcomes [ 5–11 ], transforming medical records into research data is resource intensive and requires exceptional knowledge and skill in medical context and research [ 12 ]. An alternative to these traditional methods of hospital data collection has been to extract electronic administrative data. Retrospective hospital administrative data has become a commonly used source of inexpensive and readily available information. Administrative data is not normally entered specifically for research purposes, with previous literature indicating the use of administrative data in adverse events and coding for billing purposes may result in inaccurate data [ 12–23 ].

Interestingly despite the importance and frequent reporting of hospital length of stay and discharge destination measures, we were unable to identify any published empirical research comparing methods of data collection for these outcome measures. With this range of potential data sources and data collection approaches, it is important to consider the relative completeness and agreement between different data extraction methods. This research is therefore essential to ensure the validity of data collection for research that is used to inform decision making around health policy and clinical care.

2. Objectives

The purpose of this study was therefore to determine the completeness of capture and level of agreement between three different data collection approaches in measuring length of stay and discharge destination. These were:

  • Observational data manually collected from ward-based sources by a research assistant
  • retrospective administrative data extraction from an electronic patient management program (i.PM), and
  • retrospective review of scanned inpatient medical records post discharge from hospital (gold standard).

3.1 Study setting

This study was performed in conjunction with a larger stepped-wedge randomised controlled trial examining the effectiveness of acute weekend allied health services [ 24 ] and was approved by Monash Health Human Ethics Committee (Reference Number 13327B). The study was conducted at a major 520 bed public hospital providing acute and sub-acute services in urban South-East Melbourne, Australia and occurred during the first two weeks in February 2014. The study period and wards were selected in accordance with the stepped-wedge randomised controlled trial and included the acute assessment unit, neurosciences, plastics, surgical, orthopaedic, and two general medical wards. As there were no exclusion criteria, the study cohort included the total sample of consecutive individuals discharged from the study wards during the study period.

3.2 Outcome measures

Two outcome measures from the stepped-wedge randomised controlled trial were used in this analysis. These were selected as they are outcome measures in the larger project that were extracted from multiple sources and are key indicators of inpatient hospital effectiveness and efficiency.

  • Hospital length of stay . Hospital length of stay was reported in days and was determined by subtracting the date of hospital admission from the date of hospital discharge.
  • Discharge destination . Discharge destination is the location the patient is residing immediately post hospital discharge and can include: home, other hospital, rehabilitation facility, other supported residential facility (including retirement villages, supported residential services, respite and transitional care), low level care (hostel), high level care (nursing home) or death.

3.3 Data collection approaches

Five research assistants from allied health backgrounds collected hospital admission date, hospital discharge date and discharge destination via three methods. All research assistants received on-site training from a hospital researcher prior to collecting data.

  • Observational data manually collected by four research assistants from ward-based sources including nursing handover records, paper-based inpatient medical records, paper-based ward discharge/transfer records and verbal handover from ward staff based on the previous 24 hours. This data collection method was a pragmatic approach intended to replicate how this data would be collected in a large clinical trial with limited resources. Nursing handover records were updated daily by the nurse in charge. Ward transfer records were updated continuously by ward administrative staff Monday to Friday between 0730 – 2000 hours and 0730 – 1300 hours Saturdays, and nursing staff during all other hours.
  • Retrospective data extraction by one research assistant using administrative data from an electronic patient management program (i.Patient Manager CSC, Falls Church Virginia, USA). i.Patient Manager (i.PM) is the most widely used Patient Administrative System in Australian and New Zealand public hospitals. i.PM allows all administrative aspects of an inpatient episode to be securely tracked within a centralised database accessible by healthcare staff from multiple sites within the same health service [ 25 ]. Admission and discharge information is entered into i.PM by administrative staff during Monday to Friday between 0730 – 2000 hours and 0730 – 1300 hours Saturdays, and nursing staff during all other hours. This information includes date and time of hospital admission and hospital discharge, in addition to discharge destination.
  • Retrospective review of scanned inpatient medical records by two research assistants following patient discharge . All paper-based inpatient medical records are routinely scanned by health record administrative staff to form an integrated digital record within 48 hours of patient discharge. This record can then be reviewed electronically. For the purposes of this study, the retrospective review of scanned inpatient medical records was considered the gold standard data collection method. This justification is founded in the consideration of the medico-legal record of the patient admission as the primary source of information [ 26 ] and has previously been used as a gold standard measure when assessing multiple other outcomes including diagnostic accuracy and rates of adverse events but not for hospital length of stay or discharge destination [ 5–11 , 27 ].

3.4 Procedure

Wards were attended daily by a research assistant between 0800 and 1200 hours. Observational data collected manually from ward based sources was entered directly into a survey tool (SurveyMonkey Inc. Palo Alto California, USA) via an electronic tablet device (iPad, Apple Inc, Cupertino CA, USA) at time of daily collection. Hospital admission and discharge data were collected from nursing handover records, paper-based inpatient medical records and ward admission and discharge records. If any data were unavailable or there were any discrepancies between data sources, research assistants clarified data through discussions with the nurse in charge or ward administrative staff. Data was exported from the survey tool into a Microsoft Office Excel spreadsheet (Microsoft, Redmond WA, USA).

Retrospective administrative data extracted from i.PM was entered into a separate Excel spreadsheet after patients had been discharged from hospital to ensure full availability and capture of inpatient episode data.

Similarly, research assistants independently extracted hospital admission date, hospital discharge date and discharge destination from the scanned inpatient medical records and entered data into a separate Excel spreadsheet. Inter-rater reliability analysis using Cohen’s Kappa coefficient was performed to determine consistency among the two research assistants, finding 93.8% agreement (Kappa=0.92) for length of stay and 100% agreement (Kappa=1.00) for discharge destination.

3.5 Analysis

3.5.1 completeness of data capture.

The computer program used to retrospectively review scanned medical records post discharge from hospital did not have a method for calculating total hospital admissions and hospital discharges between set dates. Therefore, for the purpose of this project, the researchers deemed a comparative analysis between retrospective administrative data collected from i.PM, and prospective data collection from ward-based sources as the most suitable assessment of the completeness of data capture. The number and percentage of data captured via each data collection method was presented using descriptive statistics.

3.5.2 Level of Agreement

Investigators compared level of agreement for the 376 admission and discharge data sets captured completely by all three data collection methods. Level of agreement between data collection from scanned medical record review, electronic patient management program (i.PM), and data collected from ward-based sources was analysed using a Bland-Altman comparison for length of stay and Cohen’s Kappa for discharge destination. To determine whether agreement between data collection methods was related to the day of week, of discharge data were entered into univariate logistic regression analysis as independent variables, agreement with inpatient medical record review was entered as the dependent variable.

Statistical analyses were performed using Stata (Version 13, StataCorp, College Station, Texas, USA).

4.1 Completeness of data capture

▶ Figure 1 outlines the data collection and analysis process, while ▶ Table 1 outlines the data cleaning process resulting in the identification of 376 complete hospital admission and hospital discharge data sets captured by all three data collection methods. This data set contained 178 (47%) females and 198 (53%) males with an average age of 59 ± 21 years.

An external file that holds a picture, illustration, etc.
Object name is ACI-06-0096-g001.jpg

Data collection and analyses process flowchart

Data cleaning process

Of the 542 complete sets of hospital admission and hospital discharge data captured retrospectively by i.PM, only 376 (69%) complete data sets were captured via manual data collection from ward-based sources by a research assistant. It should be noted that data collection from ward-based sources did not produce any unique data set that wasn’t captured via i.PM.

4.2 Level of agreement

Descriptive data describing the level of agreement between data collection methods for length of stay and discharge destination are displayed in ▶ Table 2 .

Agreement between ward-based sources, electronic patient management program and scanned inpatient medical records

4.2.1 Length of Stay

Bland-Altman comparison between length of stay data collected via scanned inpatient medical records and ward-based sources by a research assistant resulted in limits of agreement of –5.323 to 5.637 with a mean difference of 0.157 (-0.121 – 0.435). Pitman’s Test in variance resulted in r=0.186 (p=0.00).

Bland-Altman comparison between length of stay data collected via scanned inpatient medical records and retrospective administrative data from the electronic patient management program (i.PM) resulted in limits of agreement of –1.564 to 1.415 with a mean difference of –0.074 (-150 – 0.001). Pitman’s Test in variance resulted in r=0.026 (p=0.613).

Agreement between data collection methods for hospital length of stay are displayed as Bland-Altman plots (▶ Figure 2 ), with differences in agreement based on day of week of discharge displayed in ▶ Table 3 .

An external file that holds a picture, illustration, etc.
Object name is ACI-06-0096-g002.jpg

Bland-Altman plots comparing length of stay using different data collection methods.

Agreement between ward-based sources, electronic patient management program and medical record review for length of stay based on day of week of discharge

4.2.2 Discharge Destination

Cohen’s Kappa coefficient of agreement between discharge destination data collected via scanned inpatient medical records and ward-based sources by a research assistant resulted in 83.1% agreement with a k=0.63 (p=0.00).

Cohen’s Kappa coefficient of agreement between discharge destination data collected via scanned inpatient medical records and retrospective administrative data from the electronic patient management program (i.PM) resulted in 92.0% agreement with a k=0.81 (p=0.00).

Differences in agreement based on day of week of discharge are displayed in ▶ Table 4 .

Agreement between ward-based sources, electronic patient management program and medical record review for discharge destination based on day of week of discharge

5. Discussion

Data accuracy has previously been examined for electronic patient medical records and hospital administration data for multiple clinical outcome measures including diagnostic accuracy and rates of adverse events [ 5–11 , 27 ]. However, to the author’s knowledge, this is the first investigation to compare and demonstrate differences in completeness of capture and agreement between hospital data collection methods for length of stay and discharge destination.

Our results demonstrate that the administrative data-set extracted from i.PM captured a larger set of complete hospital admission and hospital discharge data than the ward-based data-set. Importantly there were no unique data captured from ward-based sources that were not captured via i.PM. Therefore, researchers, hospital administrators and clinicians may be able to rely on electronic patient management programs as an effective method of capturing occasions of inpatient episodes of care.

There are many potential contributing factors to the discrepancy in completeness of capture between observational data collected from ward-based sources, and retrospective administrative data extraction from i.PM. Research assistants cannot be present on all wards at all times of the day, so it is possible that data may be missed for patients whose ward admission and discharge occurred when research assistants were not present on the ward, highlighting a limitation of this data collection method. There may have also been limited opportunity for research staff to access all ward resources for all patients on the ward. For example, if a patient was in the operating room, having a procedure or in radiology, their inpatient medical records are not located in the ward but are with the patient. Prospective observational data collection has been used in research examining hospital adverse events [ 29 , 30 ] where data collectors were integrated into the ward environment and attended regular meetings. This method of data collection is difficult to fund in the current environment where competition for research funding is increasingly fierce. With increasing use of health information technology globally [ 31 , 32 ] traditional research methodology must evolve to embrace opportunities to collect reliable retrospective data from electronic sources. Our results support the use of retrospective administrative data collection from electronic patient management programs to capture inpatient episodes of care.

Using a Bland-Altman comparison strong levels of agreement in length of stay were observed between all three investigated data collection methods, with a slightly larger mean difference observed in data from ward-based sources. Less significant p values in Pittman’s Test of variance when comparing electronic administrative data may have been due to the high levels of agreement in this data. There was no significant difference in agreement between different days of the week, with weekend data from ward-based sources demonstrating a slight trend towards improved agreement. Although this may appear to indicate that any method of data collection could be used in the hospital setting, when displayed in Bland-Altman plots data from ward-based sources demonstrated lower levels of agreement as length of stay increased when compared with inpatient medical record review ( Figure 2 A ). As can be seen in Figure 2 (A) , one patient had a recorded hospital length of stay that was greatly different in the ward-based source data collection method, compared to the other 2 methods. On further examination, this patient was an inpatient in the health services’ “Hospital in the Home Program” prior to being admitted to the ward. The length of stay collected from the ward-based source only represented a portion of the patient’s total length of stay, however, this information was not clear to research assistants when collecting this patient’s data from ward-based sources. This may indicate that for patients with long “outlier” hospital length of stays, data collected from ward-based sources is less likely to be correct. These findings have further relevance in the funding of clinical trials where resources may be sought for staffing to manually collect data. This study suggests that retrospective data collection via electronic patient management programs can improve data completeness and agreement compared to manual ward-based data collection, when measuring hospital length of stay.

Using Cohen’s Kappa coefficient the electronic patient management program (i.PM) demonstrated the highest levels of agreement with the gold standard retrospective review of scanned medical records for discharge destination data. Discharge destination data collected from ward-based sources on a Monday was significantly less likely to agree with review of inpatient medical records. This may be due to change of staff from weekend to weekday staff, or that wards may have been busier on a Monday morning. Lower levels of agreement in data from ward-based sources generally, may be due to inaccuracy in recording of discharge destination by ward staff as this data is not usually recorded for research purposes and there appeared to be a perception by ward staff that accurate recording of discharge destination was of a lower organisational importance.

The knowledge of agreement between different hospital data collection methods remains limited despite the importance of data accuracy for research validity and health services quality management. This study was performed in seven wards in a single acute urban Melbourne hospital, which limits the external generalisability of the findings. In addition, as the study only measured a narrow scope of outcome measures using a single program, conclusions should be drawn cautiously for other electronic patient management programs and outcome measures. While inter-rater reliability was tested between research assistants reviewing inpatient medical records, this analysis was not performed on research assistants collecting data from ward-based sources. Due to the simplicity of the task in transcribing (collecting) data from one source to another it was deemed unlikely that there would be systematic bias between research assistants, however authors do recognise this as a limitation to this study. Future research should continue to examine accuracy in methods of hospital data collection, for the purposes of research and health services management. Due to the limitations of this research, the authors recommend future investigation to examine the cost effectiveness of data collection methods, as well as a broader scope of outcome measures, for example; unplanned hospital readmissions, rapid response teams, and patient adverse events (falls, pressure injuries, and deep vein thrombosis). Inclusion of cost effectiveness of data collection methods using time-intensive comparisons in future research could help to identify what level of investment is essential for the valid and accurate conduct of clinical trials in the hospital setting. Evaluation across multiple hospital sites using different electronic management programs in future research will improve the ability to generalise results across health services.

6. Conclusion

Although previous research has sourced data from electronic patient management programs, this is the first paper to demonstrate difference in completeness of data capture and level of agreement between data collection methods for hospital length of stay and discharge destination. Administrative data from an electronic patient management program showed the highest level of completeness of capture and level of agreement with the gold standard of inpatient medical record review for both length of stay and discharge destination, and therefore may be an acceptable data collection method for these measures.

Acknowledgments

The authors thank staff from Monash Health, Western Health, Melbourne Health, and Monash University, who contributed their time and effort to this investigation, particularly Monash Health ward staff, Allied Health WISER Unit, Health Information Services, and Monash University Allied Health Research Unit.

Clinical Relevance Statement

Electronic administrative data has become a key source of inexpensive and readily available information for hospital outcome measures such as hospital length of stay and discharge destination. Interestingly while previous studies have shown the use of administrative data in adverse events and coding for billing purposes may result in inaccurate data, no such studies were identified examining the use of administrative data for key outcome measures such as length of stay and discharge destination. This is the first paper to demonstrate that the administrative data from electronic patient management programs are an effective data collection method for key hospital outcome measures (length of stay and discharge destination). Future clinicians and hospital administrators can utilise opportunities to collect reliable hospital length of stay and discharge destination data from electronic sources, while researchers can improve data collection methodology when using length of stay and discharge destination as research outcome measures.

Conflict Of Interest

This study was performed in conjunction with a larger stepped-wedge randomised controlled trial examining the effectiveness of acute weekend allied health services. The results of our manuscript for publication helped shape the data collection methodology for the stepped-wedge trial mentioned above. All investigators involved in this manuscript are involved in the stepped-wedge randomised control trial. This research was funded by a Partnership Grant from the Australian National Health and Medical Research Council. Professor Terry Haines is supported by a Career Development Fellowship (Level 2) from the National Health and Medical Research Council. The NHMRC has had no direct role in the writing of the manuscript or the decision to submit it for publication.

Protection Of Human Subjects

This paper was an observational study only collecting routinely available data from the hospital systems and did not involve any direct intervention to participants. This study was approved by Monash Health Human Ethics Committee (Reference number: 13327B).

VIDEO

  1. Biophysiological methods

  2. QUALITATIVE RESEARCH: Methods of data collection

  3. Mastering Data Collection Methods: Essential Strategies & Techniques

  4. Observation as a data collection technique (Urdu/Hindi)

  5. 12 Important Practice Questions /Research Methodology in English Education /Unit-1 /B.Ed. 4th Year

  6. 4. Data Collection Methods

COMMENTS

  1. Data Collection

    Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem. While methods and aims may differ between fields, the overall process of ...

  2. Data Collection

    Data collection is the process of gathering and collecting information from various sources to analyze and make informed decisions based on the data collected. This can involve various methods, such as surveys, interviews, experiments, and observation. In order for data collection to be effective, it is important to have a clear understanding ...

  3. PDF Methods of Data Collection in Quantitative, Qualitative, and Mixed Research

    There are actually two kinds of mixing of the six major methods of data collection (Johnson & Turner, 2003). The first is intermethod mixing, which means two or more of the different methods of data collection are used in a research study. This is seen in the two examples in the previous paragraph.

  4. Chapter 10. Introduction to Data Collection Techniques

    Figure 10.1. Data Collection Techniques. Each of these data collection techniques will be the subject of its own chapter in the second half of this textbook. This chapter serves as an orienting overview and as the bridge between the conceptual/design portion of qualitative research and the actual practice of conducting qualitative research.

  5. Best Practices in Data Collection and Preparation: Recommendations for

    We offer best-practice recommendations for journal reviewers, editors, and authors regarding data collection and preparation. Our recommendations are applicable to research adopting different epistemological and ontological perspectives—including both quantitative and qualitative approaches—as well as research addressing micro (i.e., individuals, teams) and macro (i.e., organizations ...

  6. (PDF) Data Collection Methods and Tools for Research; A Step-by-Step

    One of the main stages in a research study is data collection that enables the researcher to find answers to research questions. Data collection is the process of collecting data aiming to gain ...

  7. Data Collection Methods: Sources & Examples

    Some common data collection methods include surveys, interviews, observations, focus groups, experiments, and secondary data analysis. The data collected through these methods can then be analyzed and used to support or refute research hypotheses and draw conclusions about the study's subject matter.

  8. Design: Selection of Data Collection Methods

    Data collection methods are important, ... Textual analysis can be used as the main method in a research project or to contextualize findings from another method. The choice and number of documents has to be guided by the research question, but can include newspaper or research articles, governmental reports, organization policies and protocols ...

  9. Data Collection Methods

    Data Collection Methods. Data collection is a process of collecting information from all the relevant sources to find answers to the research problem, test the hypothesis (if you are following deductive approach) and evaluate the outcomes. Data collection methods can be divided into two categories: secondary methods of data collection and ...

  10. Data Collection in Educational Research

    Data collection methods in educational research are used to gather information that is then analyzed and interpreted. As such, data collection is a very important step in conducting research and can influence results significantly. Once the research question and sources of data are identified, appropriate methods of data collection are determined.

  11. Data Collection: Key Debates and Methods in Social Research

    The book is divided into seven distinct parts, encouraging researchers to combine methods of data collection: Data Collection: An Introduction to Research Practices; Collecting Qualitative Data; Observation and Informed Methods; Experimental and Systematic Data Collection; Survey Methods for Data Collection; The Case-Study Method of Data ...

  12. What Is Data Collection: Methods, Types, Tools

    Data collection is the process of collecting and evaluating information or data from multiple sources to find answers to research problems, answer questions, evaluate outcomes, and forecast trends and probabilities. It is an essential phase in all types of research, analysis, and decision-making, including that done in the social sciences ...

  13. Data Collection Methods and Tools for Research; A Step-by-Step Guide to

    Data Collection, Research Methodology, Data Collection Methods, Academic Research Paper, Data Collection Techniques. I. INTRODUCTION Different methods for gathering information regarding specific variables of the study aiming to employ them in the data analysis phase to achieve the results of the study, gain the answer of the research

  14. Qualitative Research: Data Collection, Analysis, and Management

    The method itself should then be described, including ethics approval, choice of participants, mode of recruitment, and method of data collection (e.g., semistructured interviews or focus groups), followed by the research findings, which will be the main body of the report or paper.

  15. Data Collection Methods: A Comprehensive View

    The data obtained by primary data collection methods is exceptionally accurate and geared to the research's motive. They are divided into two categories: quantitative and qualitative. We'll explore the specifics later. Secondary data collection. Secondary data is the information that's been used in the past.

  16. Commonly Utilized Data Collection Approaches in Clinical Research

    Abstract. In this article we provide an overview of the different data collection approaches that are commonly utilized in carrying out clinical, public health, and translational research. We discuss several of the factors researchers need to consider in using data collected in questionnaire surveys, from proxy informants, through the review of ...

  17. Data Collection Methods

    Step 2: Choose your data collection method. Based on the data you want to collect, decide which method is best suited for your research. Experimental research is primarily a quantitative method. Interviews, focus groups, and ethnographies are qualitative methods. Surveys, observations, archival research, and secondary data collection can be ...

  18. 7 Data Collection Methods & Tools For Research

    The qualitative research methods of data collection do not involve the collection of data that involves numbers or a need to be deduced through a mathematical calculation, rather it is based on the non-quantifiable elements like the feeling or emotion of the researcher. An example of such a method is an open-ended questionnaire.

  19. 6 Methods of Data Collection (With Types and Examples)

    6 methods of data collection. There are many methods of data collection that you can use in your workplace, including: 1. Observation. Observational methods focus on examining things and collecting data about them. This might include observing individual animals or people in their natural spaces and places.

  20. Methods of data collection in qualitative research: interviews and

    Qualitative research in dentistry This paper explores the most common methods of data collection used in qualitative research: interviews and focus groups. The paper examines each method in detail ...

  21. (PDF) METHODS OF DATA COLLECTION

    research data, techniques and methods within a single research framework. Mixed methods approaches may mean a number of things, i.e. a number of different types of methods in a study or

  22. Navigating the Virtual Landscape: Methodological Considerations for

    Extensive prior research has identified the disadvantages of virtual qualitative data collection methods that can compromise data quality including technology challenges, limited participant observation opportunities, and difficulty cultivating rapport with participants (Lobe et al., 2020; Sah et al., 2020; Schlegel et al., 2021; Varma et al ...

  23. Video Recording in Ethnographic SLA Research: Some Issues of Validity

    One advantage is the density of data that a visual recording provides (Grimshaw, 1982a). In an ethnographic approach to research, we seek to study real people in real situations, doing real activities. Video recorded data can provide us with more contextual data than can audio recorded data (Gass & Houck, 1999; Iino, 1999).

  24. Journal of Medical Internet Research

    This paper is in the following e-collection/theme issue: Engagement with and Adherence to Digital Health Interventions, Law of Attrition (429) New Methods (854) E-Health / Health Services Research and New Models of Care (385) Aging with Chronic Disease (52) Chronic Conditions (170) Aging in Place (208) Health Services Research and Health Care Utilization in Older Patients (61) Artificial ...

  25. Methods of Data Collection, Representation, and Analysis

    This chapter concerns research on collecting, representing, and analyzing the data that underlie behavioral and social sciences knowledge. Such research, methodological in character, includes ethnographic and historical approaches, scaling, axiomatic measurement, and statistics, with its important relatives, econometrics and psychometrics. The field can be described as including the self ...

  26. Research on fine collection and interpretation methods of

    The rapid and accurate acquisition of discontinuity parameters of rock masses is of paramount significance for the stability assessment of rock slopes. However, the complex and hazardous terrain of high-steep rock slopes poses challenges to conventional survey methods. Given this, this study proposes the unmanned aerial vehicle (UAV) multi-angle nap-of-the-object photogrammetry technology ...

  27. Homelessness: challenges and opportunities in the "new normal"

    The collection period had to be extended whilst patient care was prioritised. Quantitative methods were used, however, this limited the opportunity for service user involvement and feedback. Future research could use qualitative methods to address this balance and use a more inclusive approach.

  28. Journal of Medical Internet Research

    Background: Increasing health care expenditure in the United States has put policy makers under enormous pressure to find ways to curtail costs. Starting January 1, 2021, hospitals operating in the United States were mandated to publish transparent, accessible pricing information online about the items and services in a consumer-friendly format within comprehensive machine-readable files on ...

  29. What the data says about abortion in the U.S.

    The CDC says that in 2021, there were 11.6 abortions in the U.S. per 1,000 women ages 15 to 44. (That figure excludes data from California, the District of Columbia, Maryland, New Hampshire and New Jersey.) Like Guttmacher's data, the CDC's figures also suggest a general decline in the abortion rate over time.

  30. Data Collection Methods in Health Services Research

    Method. Prospective observational study comparing the completeness of data capture and level of agreement between three data collection methods; manual data collection from ward-based sources, administrative data from an electronic patient management program (i.PM), and inpatient medical record review (gold standard) for hospital length of stay ...