Impact of Missing Data on Data Quality in Social Research
DOI:
https://doi.org/10.29038/2306-3971-2024-02-31-31Keywords:
Missing Data, Data Quality, Data Imputation, Multiple ImputationAbstract
Missing data is a common issue in quantitative social research that negatively affects the data quality. This article explores the consequences of missing data, outlining the potential issues it may pose and emphasizing the importance of properly addressing the missingness. It outlines the patterns of missing data, with a focus on the need to distinguish data that's Missing at Random and data that's Missing Not at Random, explaining how these patterns may affect the choice of handling methods. The article illustrates various approaches to managing missing data through a combination of hypothetical scenarios and case studies from actual research in order to showcase the application and effectiveness of various methods. It showcases the traditional methods of handling missing data, such as complete case analysis and simple imputation methods, and their limitations. Emphasizing the importance of advanced statistical techniques, the article advocates for the use of multiple imputation as a main method of choice when dealing with missing data. By providing a methodological comparison and a strategic framework for social scientists facing missing data challenges, this work provides a strategy to be employed by social scientists when dealing with missing data in order to ensure the proper data quality.
References
Biemer, P. P., & Lyberg, L. (2003). Introduction to survey quality. Hoboken, NJ: Wiley.
Brunton-Smith, I., & Tarling, R. (2017). Harnessing paradata and multilevel multiple imputation when analysing survey data: A case study. International Journal of Social Research Methodology, 20(6), 709-720. https://doi.org/10.1080/13645579.2017.1287842
Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Beverly Hills, CA: Sage Publications.
Carpita, M., & Manisera, M. (2011). On the Imputation of Missing Data in Surveys with Likert-Type Scales. Journal of Classification, 28(1), 93-112. https://doi.org/10.1007/s00357-011-9074-z
Chen, H., Dunbar, G., & Shen, Q. R. (2017). The Mode is the Message: Using Predata as Exclusion Restrictions to Evaluate Survey Design. Bank of Canada Staff Working Paper 2017-43. Retrieved November 10, 2024 from https://www.bankofcanada.ca/wp-content/uploads/2017/10/swp2017-43.pdf
Chen, Y., & Fu, D. (2015). Measuring income inequality using survey data: The case of China. Journal of Economic Inequality, 13, 299-307. https://doi.org/10.1007/s10888-014-9283-x
Couper, M. P., & Kreuter, F. (2013). Using paradata to explore item level response times in surveys. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(1), 271-286. https://doi.org/10.1111/j.1467-985X.2012.01041.x
Cox, B. E., McIntosh, K., Reason, R. D., & Terenzini, P. T. (2014). Working with Missing Data in Higher Education Research: A Primer and Real-World Example. The Review of Higher Education, 37(3), 377-402. https://doi.org/10.1353/rhe.2014.0026
Da Silva, D. N., Skinner, C., & Kim, J. K. (2016). Using Binary Paradata to Correct for Measurement Error in Survey Data Analysis. Journal of the American Statistical Association, 111(514), 526-537. https://doi.org/10.1080/01621459.2015.1130632
Enders, C. K. (2010). Applied missing data analysis. NY: Guilford Press.
Ge, Y., Li, Z., & Zhang, J. (2023). A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods. Scientific Reports, 13, 9432. https://doi.org/10.1038/s41598-023-36509-2
Gomer, B., & Yuan, K.-H. (2021). Subtypes of the missing not at random missing data mechanism. Psychological Methods, 26(5), 559-598. https://doi.org/10.1037/met0000377
Gorard, S. (2020). Handling missing data in numeric analyses. International Journal of Social Research Methodology, 23(6), 651-660. https://doi.org/10.1080/13645579.2020.1729974
Goretzko, D. (2021). Factor retention in exploratory factor analysis with missing data. Educational and Psychological Measurement, 82(3), 444–464. https://doi.org/10.1177/00131644211022031
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576. https://doi.org/10.1146/annurev.psych.58.110405.085530
Graham, J. W. (2012). Missing data: Analysis and design. Springer.
Grund, S., Lüdtke, O., & Robitzsch, A. (2017). Multiple imputation of missing data for multilevel models: Simulations and recommendations. Organizational Research Methods, 21(1), 111-149. https://doi.org/10.1177/1094428117703686
Jesiļevska, S. (2017). Data quality dimensions to ensure optimal data quality. The Romanian Economic Journal, 20(63), 89-103. Retrieved November 10, 2024 from https://ideas.repec.org/a/rej/journl/v20y2017i63p89-103.html
Krejčí, J. (2010). Approaching quality in survey research: Towards a comprehensive perspective. Czech Sociological Review, 46(5), 1011-1033. Retrieved November 10, 2024 from https://sreview.soc.cas.cz/pdfs/csr/2010/06/06.pdf
Kreuter, F., Couper, M., & Lyberg, L. (2010). The use of paradata to monitor and manage survey data collection. In Section on Survey Research Methods – JSM 2010. Retrieved November 10, 2024 from http://sampieuchair.ec.unipi.it/wp-content/uploads/2018/10/Couper-et-al.pdf
Lee, J. H., & Huber Jr., J. (2011). Multiple imputation with large proportions of missing data: How much is too much? In Proceedings of the 23rd United Kingdom Stata Users' Group Meetings. Stata Users Group.
Little, R. J. A., & Rubin, D. B. (1989). The analysis of social science data with missing values. Sociological Methods & Research, 18(2-3), 292–326. https://doi.org/10.1177/0049124189018002004
McKnight, P. E., McKnight, K. M., Sidani, S., & Figueredo, A. J. (2007). Missing Data: A Gentle Introduction. Guilford Press.
Mirzaei, A., Carter, S. R., Patanwala, A. E., & Schneider, C. R. (2022). Missing data in surveys: Key concepts approaches and applications. Research in Social and Administrative Pharmacy, 18, 2308–2316. https://doi.org/10.1016/j.sapharm.2021.03.009
Nartgun, Z., & Sahin Kursad, M. (2016). Comparison of the various methods used in solving missing data problems. The Anthropologist, 24(1), 380-388. https://doi.org/10.1080/09720073.2016.11892028
Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372–411. https://doi.org/10.1177/1094428114548590
Penn, D. A. (2007). Estimating missing values from the General Social Survey: An application of multiple imputation. Social Science Quarterly, 88(2), 573-595. https://doi.org/10.1111/j.1540-6237.2007.00472.x
Penn, D. (2009). Financial well-being in an urban area: An application of multiple imputation. Applied Economics, 41(23), 2955-2964. https://doi.org/10.1080/00036840701367507
Peytchev, A. (2012). Multiple Imputation for Unit Nonresponse and Measurement Error. Public Opinion Quarterly, 76(2), 214-237. https://doi.org/10.1093/poq/nfr065
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581-592. https://doi.org/10.1093/biomet/63.3.581
Schouten, R. M., & Vink, G. (2021). The dance of the mechanisms: How observed information influences the validity of missingness assumptions. Sociological Methods & Research, 50(3), 1243-1258. https://doi.org/10.1177/0049124118799376
Skafida, V., Morrison, F., & Devaney, J. (2022). Answer refused: Exploring item non-response on domestic abuse questions in a social survey affects analysis. Survey Research Methods, 16(2), 227-240. https://doi.org/10.18148/srm/2022.v16i2.7823
Stavseth, M. R., Clausen, T., & Røislien, J. (2019). How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data. SAGE Open Medicine, 7, 1-12. https://doi.org/10.1177/2050312118822912
Tufiş, C. D. (2008). Multiple imputation as a solution to the missing data problem in social sciences. Calitatea vieţii, 1-2, 199-212. Retrieved November 10, 2024 from https://www.ceeol.com/search/article-detail?id=80322
Vandecasteele, L., & Debels, A. (2007). Attrition in panel data: The effectiveness of weighting. European Sociological Review, 23(1), 81–97. https://doi.org/10.1093/esr/jcl021
Wang, J., Liu, Y., Li, P., Lin, Z., Sindakis, S., & Aggarwal, S. (2024). Overview of Data Quality: Examining the Dimensions, Antecedents, and Impacts of Data Quality. Journal of the Knowledge Economy, 15(1159-1178). https://doi.org/10.1007/s13132-022-01096-6
Wang, R. Y., & Strong, D. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-34. Retrieved November 10, 2024 from http://mitiq.mit.edu/Documents/Publications/TDQMpub/14_Beyond_Accuracy.pdf
Wu, W., Jia, F., & Enders, C. (2015). A Comparison of Imputation Strategies for Ordinal Missing Data on Likert Scale Variables. Multivariate Behavioral Research, 50(5), 484-503. https://doi.org/10.1080/00273171.2015.1022644
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.