Addressing missing data in health research: a narrative review of mechanisms, methods, and implications for healthcare quality and policy

Majed Al-Turbag

doi:10.21037/jhmhp-25-52

Review Article

Addressing missing data in health research: a narrative review of mechanisms, methods, and implications for healthcare quality and policy

Majed Al-Turbag

School of Nursing & Midwifery, Trinity College Dublin, University of Dublin, Dublin, Ireland

Correspondence to: Majed Al-Turbag, PhD. School of Nursing & Midwifery, Trinity College Dublin, University of Dublin, College Green, Dublin 2, Ireland. Email: alturbam@tcd.ie.

Background and Objective: Missing data are pervasive in healthcare research and routinely affect hospital performance indicators, patient safety metrics, clinical outcomes, and health policy evaluations. Despite extensive methodological literature, applied healthcare studies continue to rely on suboptimal or poorly reported approaches for handling missing data. This narrative review aims to synthesise missing data mechanisms and statistical handling methods through a healthcare systems and policy lens, highlighting their implications for hospital management and decision-making.

Methods: A narrative review was conducted using PubMed, Scopus, and Web of Science to identify English-language literature on missing data mechanisms, prevention strategies, and analytical methods relevant to health research, hospital datasets, and clinical studies. Key methodological papers, reviews, and applied healthcare studies were synthesised narratively.

Key Content and Findings: Rubin’s framework of missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) remains foundational for guiding analytical decisions. In healthcare settings, missing data frequently arise from patient non-response, loss to follow-up, electronic health record incompleteness, and administrative data linkage failures. Simple approaches such as case deletion and single imputation remain common but can distort estimates, reduce statistical power, and misinform quality assessments. Likelihood-based methods and multiple imputation (MI) generally provide more valid inference under MAR assumptions, while MNAR scenarios require explicit modelling or sensitivity analyses using pattern-mixture or selection models.

Conclusions: Effective management of missing data in health research requires integration of prevention strategies with analytically appropriate methods aligned to the assumed missingness mechanism. Transparent reporting and sensitivity analysis are essential to support reliable hospital management decisions and health policy formulation.

Keywords: Missing data; healthcare quality; health policy; multiple imputation (MI); missing not at random (MNAR)

Received: 24 May 2025; Accepted: 20 March 2026; Published online: 22 June 2026.

doi: 10.21037/jhmhp-25-52

Introduction

Missing data are a persistent challenge in healthcare research, with direct consequences for hospital management, patient safety, quality measurement, and health policy evaluation (1). Incomplete data can bias estimates of clinical outcomes, distort hospital performance indicators, and misinform policy decisions related to resource allocation, service planning, and quality improvement initiatives (2). Common healthcare data sources, including electronic health records, patient-reported outcome measures, registries, and administrative databases, are particularly vulnerable to missingness due to fragmented care pathways, patient non-response, and system-level data integration issues (3,4).

The statistical literature on missing data is extensive, with foundational frameworks and analytical methods well established. However, applied healthcare research continues to demonstrate inconsistent handling and inadequate reporting of missing data, limiting the interpretability and reliability of findings used to guide clinical and managerial decisions. In hospital and health policy contexts, inappropriate handling of missing data may lead to underestimation of adverse outcomes, misclassification of service performance, and inequitable policy conclusions (5).

This narrative review synthesises missing data mechanisms, prevention strategies, and analytical methods with a specific focus on their implications for healthcare quality, hospital management, and health policy. By integrating methodological principles with applied healthcare considerations, this review aims to provide practical guidance for researchers, clinicians, and health system analysts. This article is presented in accordance with the Narrativ Review reporting checklist (available at https://jhmhp.amegroups.com/article/view/10.21037/jhmhp-25-52/rc).

Methods

This narrative review was conducted to synthesise key concepts and practices related to missing data handling in health research. Literature searches were performed in PubMed, Scopus, and Web of Science. English-language articles published up to 2024 were considered. Search terms included combinations of “missing data”, “health research”, “hospital data”, “multiple imputation”, “MNAR”, “patient-reported outcomes”, and “electronic health records”.

Eligible sources included methodological papers, narrative, and systematic reviews, and applied healthcare studies addressing missing data mechanisms, prevention strategies, and analytical approaches. Articles were selected based on relevance to healthcare research and health system applications rather than formal quality scoring. Findings were synthesised narratively, with emphasis on implications for hospital management and health policy. The search strategy summary is presented in Table 1 and a summary of databases, search terms, and study selection and synthesises provided in Table S1. Sources were prioritised if they addressed missing data in healthcare datasets, or provided widely cited methodological guidance with clear implications for applied health research.

Table 1

The search strategy summary

Items	Specification
Date of search	21 January 2025
Databases and other sources searched	PubMed, Scopus, Web of Science
Search terms used	“Missing data” OR “missing values” OR “incomplete data”
Timeframe	Database inception to December 2024
Inclusion and exclusion criteria	Study types included were methodological papers; narrative and systematic reviews; applied healthcare studies. Study types excluded were non-health-related methodological papers; editorials without methodological content; conference abstracts only
Selection process	Selection of studies was done independently

Key content and findings

Missing data: causes, consequences, and considerations

Missing information can arise in many contexts, and often for a variety of reasons. In surveys, for example, individuals may opt out of answering a question due to privacy concerns or confusion about what is being asked (6). Sometimes, the possible responses simply do not match the respondent’s actual situation, leading them to leave an item blank. In other cases, surveys are left incomplete when participants run out of time or lose interest before reaching the end. This leads to missing data where even one answer is left blank by the respondent (6).

Beyond surveys, experimental and research settings are also susceptible to data gaps. Measurements can be overlooked by the research team, or they might be lost due to accidents like misplacing a sample or breaking a test tube. Large databases can likewise contain missing data entries if different regions or departments track unique variables, meaning that certain fields will remain unfilled when data sources are combined (7).

Risks of missing data

Despite often going unnoticed, missing data can pose significant risks. One challenge is that many statistical tools automatically exclude rows or cases with incomplete records, shrinking your dataset (8). This reduction might leave you with too few observations to conduct the analyses you need potentially rendering your results insignificant. Even if you can still run an analysis, there is no guarantee that the remaining subset of responses reflects a random sample. As a result, the outcomes you generate could be skewed, leading to misleading conclusions (9).

Identifying when missing data truly poses a risk can be challenging, since it does not always challenge your findings, yet sometimes it does. Pinpointing the threshold at which a handful of blank entries becomes a significant stumbling block is not straightforward, as each variable might have only a few missing responses. However, taken together, these gaps can add up quickly. A careful, systematic evaluation of all missing data is essential to figure out whether your results stand on solid ground or might be compromised or whether further data collection or more sophisticated handling methods are required. Historically, this level of detailed scrutiny has been both labour-intensive and vulnerable to mistakes, making the handling of missing data all the riskier (5,10-12).

Understanding missing data mechanisms

One of the most influential frameworks for classifying missing data is the threefold scheme introduced by Rubin (13). This approach categorizes missingness based on the underlying reason for the absence of information. By determining which mechanism applies to your dataset, you can select appropriate methods to handle the missing values and produce more reliable statistical results (Table 2).

Table 2

Classification of missing data mechanisms with healthcare examples and implications

Missing data mechanism	Definition	Typical causes in healthcare settings	Examples from health research and hospital data	Implications for analysis and decision-making
MCAR	The probability of missingness is unrelated to observed or unobserved data	Random technical failures, accidental data loss, random sample mishandling	Laboratory sample lost during transport; random sensor malfunction; random EHR extraction failure (14,15)	Estimates remain unbiased, but reduced sample size lowers statistical power; case deletion may be acceptable
MAR	The probability of missingness depends on observed data but not on the missing value itself	Patient characteristics, disease severity recorded elsewhere, system-level data capture differences	Older patients less likely to complete PROMs; sicker patients missing follow-up surveys where baseline severity is recorded (16)	Requires model-based handling (e.g., multiple imputation or likelihood-based methods); deletion may introduce bias
MNAR	The probability of missingness depends on the unobserved value itself	Patient reluctance, symptom severity, stigma, adverse outcomes	Patients with severe pain not reporting pain scores; patients with poor quality-of-life outcomes dropping out (17,18)	Standard MAR-based methods may be biased; requires sensitivity analysis or explicit MNAR modelling

EHR, electronic health records; MAR, missing at random; MCAR, missing completely at random; MNAR, missing not at random; PROMs, Patient-Reported Outcome Measures.

Missing completely at random (MCAR)

Data are considered MCAR when the likelihood of a value being missing is unrelated to either the actual value itself or any other experimental variables (18). In practical terms, this situation might arise if a sensor malfunctions at random intervals, or if samples are accidentally misplaced during transport. Because MCAR occurs in a way that is truly random with respect to the measured variables, it does not bias parameter estimates, though it reduces overall sample size, potentially reducing statistical power.

Missing at random (MAR)

A more realistic (and common) scenario is MAR, where the chance of a missing value depends on information that is already observed, but not on the unobserved value itself (18,19). As an example, if a hospital patient tends to skip a questionnaire once their condition reaches a particular threshold already recorded by the medical team, missingness can be explained by existing data about the patient’s status. Although MAR does not introduce bias as directly as other forms of missingness might, it still requires systematic handling that incorporates the observed data, to yield correct statistical inferences.

Missing not at random (MNAR)

When the propensity for a value to be missing is directly linked to the actual, unobserved value, the data are said to be MNAR (20). This situation is the most challenging because standard approaches may no longer produce unbiased results. For instance, if patients with more severe pain are less likely to report their pain levels, the missingness itself is tied to the severity. Handling MNAR typically involves specialized modelling techniques that account for why data are missing, such as creating explicit models of the missingness process or collecting supplementary data that explain non-response patterns (21).

Approaches for MNAR (non-ignorable) missingness

When MNAR is plausible, standard MAR-based methods (e.g., MI or likelihood estimation) may be insufficient unless paired with explicit MNAR modelling or sensitivity analyses. Two common MNAR frameworks are:

Selection models, which jointly model the outcome process and the missingness indicator (i.e., the probability of missingness is modelled as a function of outcomes and covariates).
Pattern-mixture models, which stratify analyses by missingness patterns and combine results across patterns, often using sensitivity parameters (e.g., “delta adjustments”) to encode departures from MAR.

A pragmatic recommendation in applied health research is to treat MNAR handling as a sensitivity analysis layer: estimate a primary model under MAR, then test how conclusions change under plausible MNAR departures using pattern-mixture or selection-model specifications.

This makes MNAR coverage publishable and aligns with current guidance emphasizing sensitivity analysis.

Recent advances in missing data handling

Recent methodological developments emphasise transparency, sensitivity analysis, and realistic assumptions over purely technical sophistication. Contemporary guidance highlights routine assessment of departures from MAR, use of sensitivity parameters, and explicit reporting of missing data handling decisions. Planned missingness designs and improved reporting standards are increasingly promoted to balance data quality, participant burden, and analytical validity in health research (22).

Across healthcare studies, transparent reporting is increasingly emphasised, including explicit description of the extent and pattern of missingness, justification of assumptions (MCAR/MAR/MNAR), the imputation/estimation model specification, and the conduct of sensitivity analyses where MNAR is plausible.

Key strategies for preventing and addressing missing data

Whether data are MCAR, MAR, or MNAR carries significant implications for how you analyse and interpret your study. Recognizing the mechanism of missingness guides you toward the appropriate strategies. A proactive approach to missing data often proves the most effective. By carefully planning a study and being attentive during data collection, you can significantly reduce the volume of unrecorded responses (23). The following practices help curtail missingness and maintain higher-quality datasets:

Enhancing data collection procedures: implement rigorous and standardised data collection protocols to minimise errors during data entry and reduce technical failures. Reliable and consistent methods ensure higher data integrity from the outset.
Ongoing data quality monitoring: conduct routine checks for missing or incomplete data and take timely corrective actions. By identifying trends or patterns in missingness, researchers can investigate underlying causes and prevent further data loss.

Utilizing data validation techniques: incorporate real-time validation rules during data entry to flag inconstancies and prevent erroneous or incomplete inputs. This ensures that data quality is maintained at the point of capture (24).

Thorough staff training: conduct training sessions for everyone involved such as researchers, data-entry personnel, and participants (as needed). Clarify the steps for enrolment, data collection, and intervention delivery. Well-informed team members can spot potential errors early and maintain consistency (23).

Run a pilot study: a small test run can help reveal hidden challenges in the study such as unclear instructions or impractical schedules, before you invest deeply in a larger project. Addressing these issues early on reduces the chance of substantial missingness later (23).

Planned missingness designs intentionally assign subsets of items or measurement occasions to be missing by design (e.g., three-form or wave-missing designs) to reduce respondent burden while preserving inferential validity under MCAR-by-design assumptions. These designs are distinct from unplanned missingness and should be explicitly pre-specified with an analysis plan (e.g., MI or likelihood-based estimation).

Case deletion methods

Listwise (complete case) deletion

A straightforward solution and perhaps the most widespread in practice, removes any record containing at least one missing value (25). Referred to as listwise deletion, this approach is the default in many statistical software packages due to its simplicity. If the assumption of MCAR holds, listwise deletion yields unbiased parameter estimates. Moreover, the computations are straightforward, and there is no need for additional assumptions or modelling. However, if the data are not MCAR, the resulting estimates may be biased (25). Additionally, when sample sizes are modest, discarding data can seriously erode power. Under these conditions, relying on listwise deletion may lead to skewed conclusions or underpowered analyses.

When the dataset is large and truly MCAR, listwise deletion can be a reasonable strategy (26). As in some settings, pairwise deletion can perform worse than listwise deletion, because each statistic is estimated from a different subset of cases, potentially amplifying inconsistencies and bias and yielding invalid covariance structures. In scenarios involving more complex missingness patterns or limited sample sizes, more sophisticated methods are generally preferred.

Pairwise deletion

Instead of dropping an entire record whenever any variable is missing, pairwise deletion discards only the specific missing values required for a particular test. That means if a record has data for Variable A but not Variable B, it can still be used when analysing Variable A. By retaining all available data for each calculation, pairwise deletion often preserves more information than listwise deletion would (25). This can be advantageous if you have strong reasons to believe missingness is at random (MCAR or MAR) and you include relevant covariates in your analysis. Since each parameter might be estimated from a different subset of data, inconsistencies can emerge (e.g., varying sample sizes or standard errors across analyses). This fragmented approach may also produce correlation or covariance matrices that are not positive definite, preventing certain statistical measures from running at all.

In practice, pairwise deletion is more complex to implement and interpret, and it can become unreliable if the amount of missing data is extensive. If missingness is widespread or not random, pairwise deletion often results in disorganised datasets that complicate analysis and interpretation.

Empirical studies have shown that pairwise deletion may perform worse than listwise deletion in some settings, as estimates are derived from different subsets of data, leading to inconsistent covariance structures and potentially biased results. In healthcare datasets with complex correlation structures, this limitation can compromise model validity.

Imputation techniques

Mean substitution

When preventive measures fail and gaps persist in a dataset, researchers often turn to imputation methods. These are the procedures that substitute estimated values for missing observations. While these techniques can preserve sample size and allow full use of available data, each approach carries specific assumptions, advantages, and risks (27).

A simple solution replaces every missing entry with the overall average (mean) of that variable. The rationale is that the mean can stand in for a ‘typical’ value, especially if the variable is normally distributed (28). However, this approach comes with notable drawbacks such as mean substitution does not introduce new information about the underlying data-generating process (Table 3). Moreover, if the missing values are not truly random or if multiple variables have unequal patterns of missingness, mean substitution can distort findings. Replacing distinct values with the same number consistently lowers the variance and leads to overly optimistic estimates of precision. Because it tends to overgeneralize actual data structure, mean substitution is rarely recommended.

Table 3

Comparison of common missing data handling methods in healthcare research

Method	Key assumption	Strengths	Limitations	Appropriate use in healthcare research
Listwise (complete case) deletion	MCAR	Simple to implement; widely available in software	Loss of statistical power; biased estimates if MCAR violated	Large datasets with minimal and plausibly random missingness
Pairwise deletion	MCAR or weak MAR	Retains more data for some analyses	Inconsistent sample sizes; invalid covariance matrices; may perform worse than listwise deletion	Limited use; generally discouraged in complex healthcare datasets
Mean substitution	MCAR	Easy to apply	Distorts variance; underestimates uncertainty; biased estimates	Rarely appropriate; not recommended for health outcomes
Single imputation (e.g., regression, LOCF)	MAR	Maintains sample size; simple workflow	Underestimates variance; treats imputed values as observed	Limited use for exploratory analyses; discouraged for inference
Maximum likelihood (EM, FIML)	MAR	Uses all observed data; statistically efficient	Model-dependent; limited flexibility for complex missingness patterns	Longitudinal studies, structural equation models, clinical trials
MI	MAR	Accounts for uncertainty; flexible; widely recommended	Requires careful model specification; computational complexity	Preferred approach for most healthcare and policy analyses
MNAR models (pattern-mixture, selection models)	MNAR	Explicitly addresses non-ignorable missingness	Requires strong assumptions; complex interpretation	Sensitivity analysis layer when MNAR is plausible

EM, expectation-maximisation; FIML, full information maximum likelihood; LOCF, last observation carried forward; MAR, missing at random; MCAR, missing completely at random; MI, multiple imputation; MNAR, missing not at random.

Single imputation techniques

Single imputation involves replacing missing values with one filled-in value per missing entry (e.g., regression imputation, stochastic regression imputation, hot-deck imputation, or last observation carried forward for longitudinal datasets) (29-33). While these techniques yield a single completed dataset, they generally underestimate uncertainty because they treat imputed values as if observed.

Maximum likelihood approaches

The expectation-maximisation (EM) algorithm is not an imputation technique but a computational method for obtaining maximum likelihood estimates in the presence of incomplete data. EM iteratively estimates sufficient statistics and model parameters without explicitly filling in missing values. Related likelihood-based approaches, such as full information maximum likelihood, directly estimate parameters from all available observed data under MAR assumptions and are widely used in longitudinal and structural equation modelling.

Multiple imputation (MI) approaches

MI addresses the limitations of single imputation by generating m plausible datasets with imputed values (34). The frequentist approach creates multiple complete datasets using repeated simulations, while Bayesian MI employs Markov Chain Monte Carlo (MCMC) methods with non-informative priors to sample from the posterior distribution. If the imputation model fails to converge, propensity score methods may also be applied. Although MI may be conceptually challenging for non-specialist users, it consistently yields more robust and unbiased estimates than deletion or single imputation techniques.

In applied healthcare research, method selection should be driven by the most plausible missingness mechanism, the analytic objective, and the risk of biased decision-making. When missingness is minimal and plausibly MCAR, complete-case analysis may be acceptable, although power loss should be assessed. Under MAR, MI or likelihood-based methods are generally preferred as they use available information and preserve uncertainty. Where MNAR is plausible, such as non-response linked to symptom severity, stigma, or adverse outcomes, analyses should not rely solely on MAR assumptions; instead, sensitivity analyses using pattern-mixture or selection models should be routinely implemented to evaluate robustness of conclusions that may inform hospital performance evaluation or policy decisions.

Advantages of imputation

Imputation reduces bias introduced by non-random missingness. It preserves valuable data that would otherwise be discarded. Imputation maintains a rectangular dataset format compatible with standard statistical software, enabling routine analytical procedures.

Limitations of imputation

Imputed values are not actual observations and must be interpreted cautiously. Improperly specified imputation models may allow observed data to unduly influence imputed values. Single imputation tends to underestimate variance leading to overconfident estimates that fail to capture the true uncertainty associated with missing data.

Recommendations for managing missing data

Missing data can significantly diminish a study’s statistical power and introduce bias, even when sample sizes are adjusted to compensate (Table 4). Effective management of missing data requires a dual strategy including prevention through robust study design and appropriate statistical handling when gaps occur.

Maximize data collection at the outset: the most effective way to address missing data is to minimize its occurrence from the beginning. This can be achieved by streamlining study protocols and data collection forms to focus only on essential variables, training all personnel rigorously on data collection procedures and employing pilot studies to identify and address potential issues before full-scale implementation.
Plan for missing data during study design: integrate strategies for handling potential missingness into the study design. This includes adjusting target sample sizes to account for anticipated dropouts, including all variables that might influence data loss, even if they aren’t part of the primary analysis, to help explain and manage the missingness later and establishing real-time monitoring systems to detect and address missing data as it arises.
Comprehensive inclusion of relevant variables: ensure that any variables potentially linked to missingness are included in your imputation models. Even if these variables are not directly analysed in the final model, they can provide critical context and help produce more accurate estimates.

Table 4

Practical guidance for preventing and addressing missing data in healthcare research and policy analysis

Stage	Recommended actions	Relevance to hospital management and health policy
Study design	Minimise respondent burden; pilot data collection tools; anticipate attrition	Improves data completeness for quality indicators and performance metrics
Data collection	Standardised protocols; staff training; real-time data validation	Reduces systematic missingness across hospital units and facilities
Monitoring	Routine audits of missingness patterns; early corrective action	Prevents accumulation of biased or incomplete operational data
Analysis planning	Pre-specify missing data assumptions; include auxiliary variables	Supports transparent and reproducible policy-relevant analyses
Primary analysis	Use MI or likelihood-based methods under MAR	Preserves statistical validity for outcome and quality assessments
Sensitivity analysis	Explore MNAR scenarios using pattern-mixture or selection models	Evaluates robustness of conclusions used for policy decisions
Reporting	Explicitly report extent, handling method, and assumptions	Enhances interpretability and trust in healthcare evidence

MAR, missing at random; MCAR, missing completely at random; MI, multiple imputation; MNAR, missing not at random.

The key to effectively managing missing data lies in prevention and thoughtful, robust statistical analysis. While sophisticated imputation techniques offer a powerful way to handle unavoidable data gaps, they should always complement and never replace careful study planning and high-quality data collection.

Conclusions

Missing data remain a critical yet underappreciated threat to the validity of healthcare research, hospital performance evaluation, and health policy decision-making. This narrative review highlights that effective missing data management requires both preventive strategies during study design and analytically appropriate handling aligned with the assumed missingness mechanism. While advanced methods such as MI and likelihood-based estimation offer substantial advantages under MAR assumptions, MNAR scenarios necessitate sensitivity analyses and transparent reporting. Strengthening missing data practices is essential for improving the reliability of evidence used to guide healthcare management and policy. From a hospital management perspective, incomplete data can distort benchmarking exercises, bias evaluations of service effectiveness, and weaken quality improvement initiatives by masking true outcome distributions. For health policy, biased evidence arising from poorly handled missingness can misdirect resource allocation and exacerbate inequities if missingness differs systematically across patient groups or care settings.

Acknowledgments

None.

Footnote

Reporting Checklist: The author has completed the Narrative Review reporting checklist. Available at https://jhmhp.amegroups.com/article/view/10.21037/jhmhp-25-52/rc

Peer Review File: Available at https://jhmhp.amegroups.com/article/view/10.21037/jhmhp-25-52/prf

Funding: None.

Conflicts of Interest: The author has completed the ICMJE uniform disclosure form (available at https://jhmhp.amegroups.com/article/view/10.21037/jhmhp-25-52/coif). The author has no conflicts of interest to declare.

Ethical Statement: The author is accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Zamanian A, von Kleist H, Ciora OA, et al. Analysis of Missingness Scenarios for Observational Health Data. J Pers Med 2024;14:514. [Crossref] [PubMed]
Cross JL, Choma MA, Onofrey JA. Bias in medical AI: Implications for clinical decision-making. PLOS Digit Health 2024;3:e0000651. [Crossref] [PubMed]
Gurupur V, Hooshmand S, Prabhu DF, et al. Incompleteness of Electronic Health Records: An Impending Process Problem Within Healthcare. Healthcare (Basel) 2025;13:2900. [Crossref] [PubMed]
Gomes M, Gutacker N, Bojke C, et al. Addressing Missing Data in Patient-Reported Outcome Measures (PROMS): Implications for the Use of PROMS for Comparing Provider Performance. Health Econ 2016;25:515-28. [Crossref] [PubMed]
Mukherjee K, Gunsoy NB, Kristy RM, et al. Handling Missing Data in Health Economics and Outcomes Research (HEOR): A Systematic Review and Practical Recommendations. Pharmacoeconomics 2023;41:1589-601. [Crossref] [PubMed]
Mirzaei A, Carter SR, Patanwala AE, et al. Missing data in surveys: Key concepts, approaches, and applications. Res Social Adm Pharm 2022;18:2308-16. [Crossref] [PubMed]
Talbert S, Sole ML. Too much information: research issues associated with large databases. Clin Nurse Spec 2013;27:73-80. [Crossref] [PubMed]
Carpenter JR, Smuk M. Missing data: A statistical framework for practice. Biom J 2021;63:915-47. [Crossref] [PubMed]
Khan SI, Hoque ASML. SICE: an improved missing data imputation technique. J Big Data 2020;7:37. [Crossref] [PubMed]
Hunt NB, Gardarsdottir H, Bazelier MT, et al. A systematic review of how missing data are handled and reported in multi-database pharmacoepidemiologic studies. Pharmacoepidemiol Drug Saf 2021;30:819-26. [Crossref] [PubMed]
Ren W, Liu Z, Wu Y, et al. Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records. Health Data Sci 2024;4:0176.
Lee KJ, Tilling KM, Cornish RP, et al. Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework. J Clin Epidemiol 2021;134:79-88. [Crossref] [PubMed]
Rubin DB. Inference and missing data. Biometrika 1976;63:581-92.
Psychogyios K, Ilias L, Ntanos C, et al. Missing Value Imputation Methods for Electronic Health Records. IEEE Access 2023;11:21562-74.
Decorte T, Mortier S, Lembrechts JJ, et al. Missing Value Imputation of Wireless Sensor Data for Environmental Monitoring. Sensors (Basel) 2024;24:2416. [Crossref] [PubMed]
Using Patient-Reported Outcome Measures during Routine Care of Patients with Type 2 Diabetes - NCBI Bookshelf [Internet]. [cited 2026 Feb 6]. Available online: https://www.ncbi.nlm.nih.gov/books/NBK592769/
Fielding S, Fayers PM, Ramsay CR. Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches. Health Qual Life Outcomes 2009;7:57. [Crossref] [PubMed]
Ranganathan P, Hunsberger S. Handling missing data in research. Perspect Clin Res 2024;15:99-101. [Crossref] [PubMed]
Kwak SK, Kim JH. Statistical data preparation: management of missing values and outliers. Korean J Anesthesiol 2017;70:407-11. [Crossref] [PubMed]
Ibrahim JG, Chu H, Chen MH. Missing data in clinical studies: issues and methods. J Clin Oncol 2012;30:3297-303. [Crossref] [PubMed]
Heymans MW, Twisk JWR. Handling missing data in clinical research. J Clin Epidemiol 2022;151:185-8. [Crossref] [PubMed]
Enders CK. Missing data: An update on the state of the art. Psychol Methods 2025;30:322-39. [Crossref] [PubMed]
Chaudhury S, Agiwal V. Strategies for preventing and addressing missing data in research. Curr Med Issues 2024;22:181-3.
Mack C, Su Z, Westreich D. Managing missing data in patient registries: Addendum to registries for evaluating patient outcomes: a user’s guide. Third Edition [Internet]. Rockville, MD: Agency for Healthcare Research and Quality (US); 2018.
Dong Y, Peng CY. Principled missing data methods for researchers. Springerplus 2013;2:222. [Crossref] [PubMed]
Kang H. The prevention and handling of the missing data. Korean J Anesthesiol 2013;64:402-6. [Crossref] [PubMed]
Austin PC, White IR, Lee DS, et al. Missing Data in Clinical Research: A Tutorial on Multiple Imputation. Can J Cardiol 2021;37:1322-31. [Crossref] [PubMed]
Dettori JR, Norvell DC, Chapman JR. The Sin of Missing Data: Is All Forgiven by Way of Imputation? Global Spine J 2018;8:892-4. [Crossref] [PubMed]
Solomon N, Lokhnygina Y, Halabi S. Comparison of regression imputation methods of baseline covariates that predict survival outcomes. J Clin Transl Sci 2020;5:e40. [Crossref] [PubMed]
Lachin JM. Fallacies of last observation carried forward analyses. Clin Trials 2016;13:161-8. [Crossref] [PubMed]
Baker SG. Maximum likelihood estimation with missing outcomes: From simplicity to complexity. Stat Med 2019;38:4453-74. [Crossref] [PubMed]
Malan L, Smuts CM, Baumgartner J, et al. Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns. Nutr Res 2020;75:67-76. [Crossref] [PubMed]
Fielding S, Fayers PM, McDonald A, et al. Simple imputation methods were inadequate for missing not at random (MNAR) quality of life data. Health Qual Life Outcomes 2008;6:57. [Crossref] [PubMed]
Beesley LJ, Bondarenko I, Elliot MR, et al. Multiple imputation with missing data indicators. Stat Methods Med Res 2021;30:2685-700. [Crossref] [PubMed]

doi: 10.21037/jhmhp-25-52
Cite this article as: Al-Turbag M. Addressing missing data in health research: a narrative review of mechanisms, methods, and implications for healthcare quality and policy. J Hosp Manag Health Policy 2026;10:22.

Addressing missing data in health research: a narrative review of mechanisms, methods, and implications for healthcare quality and policy

Introduction

Methods

Table 1

Key content and findings

Missing data: causes, consequences, and considerations

Risks of missing data

Understanding missing data mechanisms

Table 2

Missing completely at random (MCAR)

Missing at random (MAR)

Missing not at random (MNAR)

Approaches for MNAR (non-ignorable) missingness

Recent advances in missing data handling

Key strategies for preventing and addressing missing data

Case deletion methods

Listwise (complete case) deletion

Pairwise deletion

Imputation techniques

Mean substitution

Table 3

Single imputation techniques

Maximum likelihood approaches

Multiple imputation (MI) approaches

Advantages of imputation

Limitations of imputation

Recommendations for managing missing data

Table 4

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share