Impact of Missing Data on Data Quality in Social Research

Authors

DOI:

https://doi.org/10.29038/2306-3971-2024-02-31-31

Keywords:

Missing Data, Data Quality, Data Imputation, Multiple Imputation

Abstract

Missing data is a common issue in quantitative social research that negatively affects the data quality. This article explores the consequences of missing data, outlining the potential issues it may pose and emphasizing the importance of properly addressing the missingness. It outlines the patterns of missing data, with a focus on the need to distinguish data that's Missing at Random and data that's Missing Not at Random, explaining how these patterns may affect the choice of handling methods. The article illustrates various approaches to managing missing data through a combination of hypothetical scenarios and case studies from actual research in order to showcase the application and effectiveness of various methods. It showcases the traditional methods of handling missing data, such as complete case analysis and simple imputation methods, and their limitations. Emphasizing the importance of advanced statistical techniques, the article advocates for the use of multiple imputation as a main method of choice when dealing with missing data. By providing a methodological comparison and a strategic framework for social scientists facing missing data challenges, this work provides a strategy to be employed by social scientists when dealing with missing data in order to ensure the proper data quality.

References

Biemer, P. P., & Lyberg, L. (2003). Introduction to survey quality. Hoboken, NJ: Wiley.

Brunton-Smith, I., & Tarling, R. (2017). Harnessing paradata and multilevel multiple imputation when analysing survey data: A case study. International Journal of Social Research Methodology, 20(6), 709-720. https://doi.org/10.1080/13645579.2017.1287842

Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Beverly Hills, CA: Sage Publications.

Carpita, M., & Manisera, M. (2011). On the Imputation of Missing Data in Surveys with Likert-Type Scales. Journal of Classification, 28(1), 93-112. https://doi.org/10.1007/s00357-011-9074-z

Chen, H., Dunbar, G., & Shen, Q. R. (2017). The Mode is the Message: Using Predata as Exclusion Restrictions to Evaluate Survey Design. Bank of Canada Staff Working Paper 2017-43. Retrieved November 10, 2024 from https://www.bankofcanada.ca/wp-content/uploads/2017/10/swp2017-43.pdf

Chen, Y., & Fu, D. (2015). Measuring income inequality using survey data: The case of China. Journal of Economic Inequality, 13, 299-307. https://doi.org/10.1007/s10888-014-9283-x

Couper, M. P., & Kreuter, F. (2013). Using paradata to explore item level response times in surveys. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(1), 271-286. https://doi.org/10.1111/j.1467-985X.2012.01041.x

Cox, B. E., McIntosh, K., Reason, R. D., & Terenzini, P. T. (2014). Working with Missing Data in Higher Education Research: A Primer and Real-World Example. The Review of Higher Education, 37(3), 377-402. https://doi.org/10.1353/rhe.2014.0026

Da Silva, D. N., Skinner, C., & Kim, J. K. (2016). Using Binary Paradata to Correct for Measurement Error in Survey Data Analysis. Journal of the American Statistical Association, 111(514), 526-537. https://doi.org/10.1080/01621459.2015.1130632

Enders, C. K. (2010). Applied missing data analysis. NY: Guilford Press.

Ge, Y., Li, Z., & Zhang, J. (2023). A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods. Scientific Reports, 13, 9432. https://doi.org/10.1038/s41598-023-36509-2

Gomer, B., & Yuan, K.-H. (2021). Subtypes of the missing not at random missing data mechanism. Psychological Methods, 26(5), 559-598. https://doi.org/10.1037/met0000377

Gorard, S. (2020). Handling missing data in numeric analyses. International Journal of Social Research Methodology, 23(6), 651-660. https://doi.org/10.1080/13645579.2020.1729974

Goretzko, D. (2021). Factor retention in exploratory factor analysis with missing data. Educational and Psychological Measurement, 82(3), 444–464. https://doi.org/10.1177/00131644211022031

Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576. https://doi.org/10.1146/annurev.psych.58.110405.085530

Graham, J. W. (2012). Missing data: Analysis and design. Springer.

Grund, S., Lüdtke, O., & Robitzsch, A. (2017). Multiple imputation of missing data for multilevel models: Simulations and recommendations. Organizational Research Methods, 21(1), 111-149. https://doi.org/10.1177/1094428117703686

Jesiļevska, S. (2017). Data quality dimensions to ensure optimal data quality. The Romanian Economic Journal, 20(63), 89-103. Retrieved November 10, 2024 from https://ideas.repec.org/a/rej/journl/v20y2017i63p89-103.html

Krejčí, J. (2010). Approaching quality in survey research: Towards a comprehensive perspective. Czech Sociological Review, 46(5), 1011-1033. Retrieved November 10, 2024 from https://sreview.soc.cas.cz/pdfs/csr/2010/06/06.pdf

Kreuter, F., Couper, M., & Lyberg, L. (2010). The use of paradata to monitor and manage survey data collection. In Section on Survey Research Methods – JSM 2010. Retrieved November 10, 2024 from http://sampieuchair.ec.unipi.it/wp-content/uploads/2018/10/Couper-et-al.pdf

Lee, J. H., & Huber Jr., J. (2011). Multiple imputation with large proportions of missing data: How much is too much? In Proceedings of the 23rd United Kingdom Stata Users' Group Meetings. Stata Users Group.

Little, R. J. A., & Rubin, D. B. (1989). The analysis of social science data with missing values. Sociological Methods & Research, 18(2-3), 292–326. https://doi.org/10.1177/0049124189018002004

McKnight, P. E., McKnight, K. M., Sidani, S., & Figueredo, A. J. (2007). Missing Data: A Gentle Introduction. Guilford Press.

Mirzaei, A., Carter, S. R., Patanwala, A. E., & Schneider, C. R. (2022). Missing data in surveys: Key concepts approaches and applications. Research in Social and Administrative Pharmacy, 18, 2308–2316. https://doi.org/10.1016/j.sapharm.2021.03.009

Nartgun, Z., & Sahin Kursad, M. (2016). Comparison of the various methods used in solving missing data problems. The Anthropologist, 24(1), 380-388. https://doi.org/10.1080/09720073.2016.11892028

Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372–411. https://doi.org/10.1177/1094428114548590

Penn, D. A. (2007). Estimating missing values from the General Social Survey: An application of multiple imputation. Social Science Quarterly, 88(2), 573-595. https://doi.org/10.1111/j.1540-6237.2007.00472.x

Penn, D. (2009). Financial well-being in an urban area: An application of multiple imputation. Applied Economics, 41(23), 2955-2964. https://doi.org/10.1080/00036840701367507

Peytchev, A. (2012). Multiple Imputation for Unit Nonresponse and Measurement Error. Public Opinion Quarterly, 76(2), 214-237. https://doi.org/10.1093/poq/nfr065

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581-592. https://doi.org/10.1093/biomet/63.3.581

Schouten, R. M., & Vink, G. (2021). The dance of the mechanisms: How observed information influences the validity of missingness assumptions. Sociological Methods & Research, 50(3), 1243-1258. https://doi.org/10.1177/0049124118799376

Skafida, V., Morrison, F., & Devaney, J. (2022). Answer refused: Exploring item non-response on domestic abuse questions in a social survey affects analysis. Survey Research Methods, 16(2), 227-240. https://doi.org/10.18148/srm/2022.v16i2.7823

Stavseth, M. R., Clausen, T., & Røislien, J. (2019). How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data. SAGE Open Medicine, 7, 1-12. https://doi.org/10.1177/2050312118822912

Tufiş, C. D. (2008). Multiple imputation as a solution to the missing data problem in social sciences. Calitatea vieţii, 1-2, 199-212. Retrieved November 10, 2024 from https://www.ceeol.com/search/article-detail?id=80322

Vandecasteele, L., & Debels, A. (2007). Attrition in panel data: The effectiveness of weighting. European Sociological Review, 23(1), 81–97. https://doi.org/10.1093/esr/jcl021

Wang, J., Liu, Y., Li, P., Lin, Z., Sindakis, S., & Aggarwal, S. (2024). Overview of Data Quality: Examining the Dimensions, Antecedents, and Impacts of Data Quality. Journal of the Knowledge Economy, 15(1159-1178). https://doi.org/10.1007/s13132-022-01096-6

Wang, R. Y., & Strong, D. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-34. Retrieved November 10, 2024 from http://mitiq.mit.edu/Documents/Publications/TDQMpub/14_Beyond_Accuracy.pdf

Wu, W., Jia, F., & Enders, C. (2015). A Comparison of Imputation Strategies for Ordinal Missing Data on Likert Scale Variables. Multivariate Behavioral Research, 50(5), 484-503. https://doi.org/10.1080/00273171.2015.1022644

Published

30.12.2024

Issue

Section

METHODOLOGY AND METHODS OF SOCIOLOGICAL RESEARCH