Prediction of hospital length of stay: leveraging ensemble tree models and intelligent feature selection

Bo Peng; Shan Gao

doi:10.21037/jhmhp-24-129

Original Article

Prediction of hospital length of stay: leveraging ensemble tree models and intelligent feature selection

Bo Peng¹ , Shan Gao²

¹Institute of Human Virology, School of Medicine, University of Maryland Baltimore, Baltimore, MD, USA; ²Department of Radiology, Tianjin Hospital of ITCWM Nankai Hospital, Tianjin, China

Contributions: (I) Conception and design: B Peng; (II) Administrative support: None; (III) Provision of study materials or patients: B Peng; (IV) Collection and assembly of data: Both authors; (V) Data analysis and interpretation: B Peng; (VI) Manuscript writing: Both authors; (VII) Final approval of manuscript: Both authors.

Correspondence to: Bo Peng, PhD. Institute of Human Virology, School of Medicine, University of Maryland Baltimore, 725 W. Lombard Street, Baltimore, MD 21201, USA. Email: bpeng@ihv.umaryland.edu.

Background: The prediction of hospitalization length of stay (LOS) is crucial for effective healthcare management, influencing resource allocation, patient flow, and overall hospital efficiency. Traditional LOS prediction methods often lack accuracy and adaptability to individual patient conditions, leading to potential overutilization or underutilization of hospital resources. This study aims to improve the accuracy of hospital LOS predictions by employing ensemble tree learning models and exploring the impact of information-based feature selection methods on model performance and computational efficiency.

Methods: We utilized a publicly available, comprehensive dataset comprising 26 features from 100,000 patient admissions records to predict LOS in days. The data were collected from multiple sites within one hospital system in the United States. Advanced machine learning and deep learning models, including random forest, gradient boosting machine (GBM) boosted trees, XGBoost, and convolutional neural network (CNN), were fine-tuned and evaluated. Additionally, the mutual information feature selection technique was applied to optimize the feature set, reducing model complexity and computational time while maintaining high prediction accuracy. The analysis further explored feature importance of different patient health conditions at the time of admission.

Results: The XGBoost model outperformed other models in predicting LOS with significant improvements in accuracy, as indicated by root mean square error (RMSE =0.3881) and mean absolute error (MAE =0.1553). The information-based feature selection method yielded comparably high performance while substantially reducing computational time and model complexity. The analysis also revealed differential features’ importance across patient health conditions, demonstrating the models’ potential capability to adapt to diverse scenarios.

Conclusions: Our findings underscore the effectiveness of ensemble tree learning models, particularly XGBoost, enhanced by information-based feature selection methods, in accurately predicting hospital LOS. The integration of such advanced predictive analytics in hospital management can lead to optimized resource allocation, improved patient care, and increased hospital operational efficiency. Additionally, the analysis of feature importance for different patients’ health conditions provided insights into the varying impact of clinical and demographic variables on LOS, highlighting the potential for personalized patient care strategies. Future research should focus on the practical implementation of these models in hospital settings and exploring their adaptability to real-time data and dynamic hospital environments.

Keywords: Length of stay prediction (LOS prediction); hospitalization management; ensemble tree learning models; information-based feature selection

Received: 25 October 2024; Accepted: 14 April 2025; Published online: 13 June 2025.

doi: 10.21037/jhmhp-24-129

Highlight box

Key findings

• This study demonstrates the effectiveness of using XGBoost, enhanced by mutual information (MI) feature selection, for precise hospital length of stay (LOS) predictions. XGBoost outperformed other models, achieving the lowest mean absolute error and root mean square error. MI feature selection reduced computational complexity, significantly improving model efficiency without sacrificing accuracy. The analysis of feature importance across patient risk levels highlighted readmission history, health complexity, and other clinical indicators as important predictors of LOS.

What is known and what is new?

• It is known that predicting LOS is crucial for hospital management, and machine learning models, such as random forest and gradient boosting machine, have been explored for this purpose.

• This study adds to the literature by demonstrating the superior performance of XGBoost, particularly when combined with MI feature selection, in predicting LOS. Additionally, it provides insights into the varying importance of clinical features across different patient risk levels.

What is the implication, and what should change now?

• The study’s findings suggest hospitals should adopt advanced predictive models like XGBoost to optimize resource allocation, improve patient flow, and enhance overall operational efficiency. Implementing real-time LOS prediction systems integrated with electronic health records could significantly reduce patient wait times, prevent overcrowding, more efficient use of hospital resources and improve clinical staff workload management. This would also support more personalized patient care and better overall patient experience.

Introduction

Background

Hospitalization management, encompassing patient flow, capacity planning and resource allocation, significantly influences the operational efficiency, financial outcomes, and clinical outcomes of healthcare institutions. According to Stone et al. (1), efficient patient flow management can significantly enhance operational efficiency, leading to reduced waiting times and increased patient throughput, which are vital for both patient satisfaction and hospital revenue. Moreover, strategically planning hospital capacity and resources is essential for improving resource utilization and reducing idle time, resulting in significant financial savings and increased revenue opportunities for hospitals (2).

Furthermore, effective hospitalization management profoundly impacts patients’ clinical outcomes and satisfaction. Efficient hospitalization processes, including streamlined admissions, timely transfers, and well-coordinated discharges, contribute to shorter hospital stays and reduced wait times, which are directly linked to improved patient outcomes and higher satisfaction levels. A study by Sonis et al. (3), found that hospitals with better-managed patient flows and shorter wait times had higher patient satisfaction scores. Efficient patient flow management can also reduce the risk of hospital-acquired infections and complications, leading to better clinical outcomes and enhanced patient safety (4).

The benefits of efficient hospitalization management extend beyond operational efficiencies and financial gains, impacting on the work environment and well-being of healthcare staff, including doctors and nurses. Effective hospitalization management practices contribute to a more structured and predictable work environment, reducing the stress and uncertainty often associated with patient care. This predictability allows staff to work more cohesively as a team, fostering a sense of accomplishment and job satisfaction (5). Moreover, when hospitals manage patient flow effectively, staff are less likely to be overburdened by sudden surges in patient numbers, which can significantly decrease burnout rates and improve overall morale (6). The improvement in work conditions and the reduction of stress levels lead to higher job satisfaction among healthcare professionals, contributing to a more positive workplace culture (7).

Rationale and knowledge gap

In reality, the endeavor to achieve optimal hospitalization management is fraught with challenges. At the heart of these challenges lies the prediction of patient’s length of stay (LOS), a complex task influenced by a multitude of factors including patient’s physical health conditions, clinical conditions, and the variability of treatment responses (1,8,9). Traditional methods for LOS prediction, such as reliance on historical averages or subjective human estimations, are plagued by inaccuracies and a lack of adaptability to individual patient conditions. This often leads to the underutilization or overutilization of hospital resources, adversely affecting both hospital operations and patient care (8,10).

Given these challenges, there is an urgent need for more precise and adaptable methods to predict LOS. Previous studies in this field have explored various approaches, ranging from statistical techniques to machine learning models, to improve the accuracy of LOS predictions (8,11,12). These studies, while contributing valuable insights, often focus on specific contexts of care or utilize limited datasets, thereby highlighting a research gap in the development of more generalized, scalable and interpretable models capable of adapting to the diverse needs of healthcare facilities (9).

Related work

The prediction of hospital LOS has been a topic of extensive research, reflecting various methodological approaches that evolve with advancements in data analytics and machine learning. Several systematic reviews (9,13) categorize these approaches into three primary types, operational research-based approaches, statistical and arithmetic approaches and machine learning and data mining approaches. The operational research-based approaches, such as simulation models, are valuable for operational planning but can sometimes oversimplify the complexities inherent in hospital environments, which can introduce limitations in accurately capturing real-world variability and uncertainty (14).

Statistical and arithmetic approaches have long been established in this field. These approaches primarily use regression models and other statistical methods (15). They are valuable for providing baseline predictions and are robust in simpler analytical settings. However, they often struggle with large datasets or with data exhibiting highly complex, non-linear interactions (16).

Machine learning and data mining approaches represent the most advanced techniques currently used in LOS prediction. Methods like decision trees, neural networks, support vector machines, and ensemble models [such as random forest and gradient boosting machine (GBM)] are noted for their ability to handle big data and extract nuanced insights from it. These models are particularly advantageous due to their improved accuracy and adaptability to new data (12).

The increased adoption of machine learning in healthcare has necessitated models that are not only accurate but also interpretable. Interpretability is crucial because healthcare providers and managers must be able to trust and understand the predictions made by models to integrate them effectively into clinical and operational workflows (17). Ensemble tree models, which include random forest, GBM, and XGBoost, have been highlighted for their robust performance in predicting LOS while also offering significant interpretability. These models generate an ensemble of decision trees, which can be examined to understand feature importance and decision paths, providing insights into the factors driving predictions (18). This transparency allows clinicians to assess the rationale behind the model’s predictions, fostering trust and facilitating the adoption of these technologies in clinical settings.

The utilization of ensemble tree models in our study addresses the critical need for accurate and interpretable predictive tools in hospital LOS management. By analyzing feature importance and model performance across different health conditions, we not only enhance predictive accuracy but also contribute to personalized patient care, allowing for adjustments based on individual patient profiles.

Objective

Our study aims to bridge the identified gap by leveraging advanced ensemble tree learning and deep learning models, including random forest, GBM, XGBoost, and convolutional neural network (CNN), combined with innovative feature selection analysis. By integrating these advanced techniques, we aim to develop a more effective approach to LOS modeling, enhancing predictive accuracy and computational efficiency to enable precise and adaptable predictions. Furthermore, our approach emphasizes the analysis of feature importance, offering actionable insights into the key factors influencing LOS.

By addressing the limitations of traditional LOS prediction methods and exploring the potential of advanced machine learning techniques, our study not only contributes to the body of knowledge in hospitalization management but also holds promise for practical implementation in healthcare settings, ultimately leading to improved hospital efficiency and patient care. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://jhmhp.amegroups.com/article/view/10.21037/jhmhp-24-129/rc).

Methods

Data description and pre-processing

This study employs a publicly available dataset published by Microsoft, containing 100,000 data entries with 26 features, aimed at predicting the LOS in different hospital facilities within one healthcare system in US (19). Eligibility criteria included hospital admissions with complete information on the selected features from January 2012 to January 2013. This dataset is critical for our analysis as it provides insights into the variations in LOS across different facilities and specialties (i.e., general medicine, pulmonary, geriatrics, behavioral) within the healthcare system, reflecting a broad spectrum of patient conditions and treatment complexities. The large sample size enhances the statistical power of the analysis, enabling the development and evaluation of advanced machine learning models with greater reliability and generalizability within the scope of the available data.

In this data (see Table 1), the variable “rcount” recorded the times of patient’s readmissions within last 180 days. There are 11 clinical binary indicators (‘0’ means non-exist, ‘1’ means exist) that represent the presence or absence of certain conditions during the patient’s hospital encounter. These binary flags are vital for predicting LOS due to their direct impact on patient care complexity (20). The renal disease variable (“dialysisrenalendstage”) reflects patient if has renal disease, a condition that significantly impacts LOS (21). The respiratory variables (“asthma”, “pneum”) for asthma and pneumonia are important, as respiratory issues are common causes of extended hospital stays (22). Iron deficiency (“irondef”) can indicate anemia, which can prolong hospitalization (23). Substance dependence (“substancedependence”) may affect LOS through the need for additional support services. Major psychological disorders (“psychologicaldisordermajor”), depression (“depress”), and other psychological disorders (“psychother”) can lengthen LOS due to the need for psychiatric consultation and treatment adjustments (24). Flags for fibrosis (“fibrosisandother”), malnutrition (“malnutrition”), and blood disorders (“hemo”), each contribute differently to LOS, with malnutrition being particularly associated with delayed recovery (25).

Table 1

Summary of features available in the LOS dataset

Feature names	n (%) or mean (SD)	Data type	Descriptions
eid	NA	Integer	Unique Id of the hospital admission
vdate	NA	String	Visit date
Gender
Male	42,357 (42%)	Categorical	Gender of the patient: male or female
Female	57,643 (58%)	Categorical	Gender of the patient: male or female
rcount
0	55,031 (55%)	Categorical	Number of readmissions within last 180 days
1	15,007 (15%)
2	9,987 (10%)
3	8,047 (8%)
4	6,941 (7%)
5+	4,987 (5%)
dialysisrenalendstage
0	96,358 (96%)	Categorical	Flag for renal disease during encounter
1	3,642 (4%)	Categorical	Flag for renal disease during encounter
asthma
0	96,473 (96%)	Categorical	Flag for asthma during encounter
1	3,527 (4%)	Categorical	Flag for asthma during encounter
irondef
0	90,506 (91%)	Categorical	Flag for iron deficiency during encounter
1	9,494 (9%)	Categorical	Flag for iron deficiency during encounter
pneum
0	96,055 (96%)	Categorical	Flag for pneumonia during encounter
1	3,945 (4%)	Categorical	Flag for pneumonia during encounter
substancedependence
0	93,694 (94%)	Categorical	Flag for substance dependence during encounter
1	6,306 (6%)	Categorical	Flag for substance dependence during encounter
psychologicaldisordermajor
0	76,094 (76%)	Categorical	Flag for major psychological disorder during encounter
1	23,904 (24%)	Categorical	Flag for major psychological disorder during encounter
depress
0	94,834 (95%)	Categorical	Flag for depression during encounter
1	5,166 (5%)	Categorical	Flag for depression during encounter
psychother
0	95,061 (95%)	Categorical	Flag for other psychological disorder during encounter
1	4,939 (5%)	Categorical	Flag for other psychological disorder during encounter
fibrosisandother
0	99,521 (99.5%)	Categorical	Flag for fibrosis during encounter
1	479 (0.5%)	Categorical	Flag for fibrosis during encounter
malnutrition
0	95,052 (95%)	Categorical	Flag for malnutrition during encounter
1	4,948 (5%)	Categorical	Flag for malnutrition during encounter
hemo
0	92,000 (92%)	Categorical	Flag for blood disorder during encounter
1	8,000 (8%)	Categorical	Flag for blood disorder during encounter
hematocrit	0 (1)	Numerical	Average hematocritic value during encounter
neutrophils	0 (1)	Numerical	Average neutrophils value during encounter
sodium	0 (1)	Numerical	Average sodium value during encounter
glucose	0 (1)	Numerical	Average glucose value during encounter
bloodureanitro	0 (1)	Numerical	Average blood urea nitrogen value during encounter
creatinine	0 (1)	Numerical	Average creatinine value during encounter
bmi	0 (1)	Numerical	Average BMI during encounter
pulse	0 (1)	Numerical	Average pulse during encounter
respiration	0 (1)	Numerical	Average respiration during encounter
secondarydiagnosisnonicd9	0 (1)	Numerical	Flag for whether a non-ICD 9 formatted diagnosis was coded as a secondary diagnosis
facid
A	30,034 (30%)	Categorical	Facility ID at which the encounter occurred. A: General Medicine 3 West. B: Pulmonary 2 West. C: General Medicine 3 South. D: Geriatrics 2 East. E: Behavioral 1 East
B	30,011 (30%)
C	4,699 (5%)
D	4,499 (4%)
E	30,755 (31%)

‘0’ means non-exist, ‘1’ means exist. BMI, body mass index; LOS, length of stay; SD, standard deviation.

In addition, there are nine continuous variables that provide quantitative data about the patient’s clinical status. Hematological measures, hematocrit value (“hematocrit”) and neutrophils value (“neutrophils”) give insights into the patient’s blood composition, which is fundamental for diagnosing and managing various patients’ conditions (26). Fluctuations in electrolytes (“sodium”) and renal function tests (“glucose”, “bloodureanitro”, “creatinine”) variables can signal acute medical conditions that could extend hospitalization (27). Body mass index (“BMI”), pulse rate (“pulse”) and respiratory rate (“respiration”) are basic health indicators that influence LOS due to their association with overall health and recovery potential (28). Facility ID (facid), which is a categorical variable, indicates variations in LOS due to different hospital protocols, resource availability, and patient conditions. Each of these features requires careful consideration in the modeling process, as they contribute to the complexity of the patient’s condition and, consequently, the LOS. Understanding the interplay between these variables and the LOS is imperative in developing accurate predictive models that can be used to improve healthcare delivery and resource management.

Initially, the dataset is checked for missing values. In this instance, the dataset contains no missing entries. Secondly, the numerical variables (“hematocrit”, “neutrophils”, “sodium”, “glucose”, “bloodureanitro”, “creatinine”, “bmi”, “pulse”, “respiration”) are standardized. Standardization [mean =0, standard deviation (SD) =1] is critical in ensemble models that helps to enhance model accuracy and convergence speeds (29). The process involves subtracting the mean and dividing by the SD for each numerical variable. Thirdly, we create a new feature (“number_of_issues”) by summing several binary healthy indicators (“dialysisrenalendstage”, “asthma”, “irondef”, “pneum”, “substancedependence”, “psychologicaldisordermajor”, “depress”, “psychother”, “fibrosisandother”, “malnutrition” and “hemo”) for patient’s overall health conditions. This composite metric provides a single, quantifiable measure of a patient’s health complexity. Certain redundant features like “eid” (patient’s ID) and “vdate” (patient visit date) are removed to reduce model complexity. These features do not contribute to the predictive power concerning future patients and are typically related to specific events rather than patient characteristics. Several non-numeric factors (“rcount”, “secondarydiagnosisnonicd9”, and “facid”) are transformed through one-hot encoding. In one-hot encoding, each category is represented as a binary vector, where a “1” indicates the presence of a specific category, and “0” indicates its absence. This transformation is vital for models that require numeric input and helps to capture the presence or absence of a category effectively (30).

After preprocessing, the dataset consists of 100,000 entries and 42 features, including the newly engineered features and one-hot encoded variables. This refined dataset is poised for use in advanced machine learning models to predict LOS accurately. The meticulous data preparation outlined supports the integrity and potential of the subsequent analytical phases, ensuring that the predictive models developed are both robust and reliable. This foundational work is crucial for leveraging machine learning in healthcare, providing a clear pathway from raw clinical data to actionable insights.

Before modeling, we plot the LOS for all cases to explore the underlying patterns and variability in hospital stay durations. This preliminary analysis serves to inform the subsequent development of predictive models by highlighting key trends and distribution characteristics in the data. The plot (see Figure 1) illustrates the distribution of the LOS for patients within the dataset, providing a comprehensive view of hospital stay durations across a sample of 100,000 observations. The x-axis represents the LOS in days, and the y-axis represents the count of occurrences. The distribution shows that shorter stays are more common, with a peak at 1 day (17.979% of the sample), gradually decreasing as the LOS increases. The lengths of stay of 2, 3, and 4 days are also notably frequent, capturing 12.825%, 16.068% and 14.822% of the dataset, respectively. There is a marked decline in frequency as the LOS extends beyond 6 days, indicating that longer hospital stays are less common.

Figure 1 Actual length of stay bar chart. SD, standard deviation.

A significant observation is the long tail of the distribution, where stays longer than 10 days are increasingly rare (less than 1% of the observations), with the 17-day stay occurring in only 0.004% of cases. This skewed distribution, with a high frequency of short stays and a rapid decrease as duration increases, suggests that most patients require only brief hospitalization interventions. The mean LOS is 4 days with a SD of 2.36, indicating moderate variability around the average stay duration. This skewed distribution is typical in healthcare settings, where a large proportion of admissions are for minor treatments or conditions that require short-term cares (31).

The prevalence of short hospital stays highlighted by the data underscores the necessity of more efficient and accurate prediction models for LOS. The marked decline in frequency beyond 6 days, coupled with the extreme rarity of stays exceeding 10 days, delineates a skewed distribution that poses unique challenges for resource allocation and operational planning in healthcare settings. Such a distribution, characterized by a significant concentration of short-term stays, necessitates predictive models that are not only precise but also nuanced enough to differentiate between the factors contributing to short versus extended hospitalizations.

Given the complexity and variability of factors influencing LOS, as evidenced by the initial data analysis, the following section will delve into the development of machine learning models designed to tackle these challenges. Through a detailed exploration of ensemble tree methods and neural network approaches, we aim to enhance the predictive accuracy and operational utility of LOS forecasts, paving the way for more informed decision-making in hospital settings.

Model development and evaluation

In this study, we employed three types of ensemble tree learning methods—GBM, random forest, and XGBoost—alongside a deep learning approach using a CNN. These models are well-suited for the prediction task due to their ability to handle complex, non-linear relationships inherent in healthcare data (32).

GBM is an additive model that works sequentially introducing new models to correct errors made by existing ensembles. The model benefits from strong predictive performance but can be sensitive to overfitting if not tuned properly (33). The second model we considered is Random Forest. This model constructs multiple decision trees during training and makes predictions by averaging the results for regression tasks or selecting the majority class for classification tasks. Random Forest is known for its robustness, as it averages multiple deep decision trees, trained on different parts of the same training set, to reduce variance (34).

XGBoost is an implementation of gradient boosted decision trees designed for speed and performance. It is highly flexible and has shown to provide better performance on a variety of machine learning benchmarks. Moreover, XGBoost excels in managing the nonlinear and complex nature of healthcare data, making it particularly effective in capturing intricate patterns and interactions that are critical for accurate predictions in medical settings (35). Although traditionally used for image processing,

CNNs can also be adapted for sequential data, such as time series and regression tasks. CNNs can effectively capture complex relationships, making them well-suited for healthcare datasets where important features may not be immediately apparent (36). The CNN model employed consists of a one-dimensional convolutional layer using ReLU activation, followed by batch normalization, a dropout layer, and flattening. This is connected to a dense layer, culminating in a single output neuron that predicts the LOS.

In addition, we conducted parameter tuning for the ensemble tree models (GBM tuning parameters: ‘ntrees’, ‘min_rows’, ‘max_depth’, ‘col_sample_rate’ and ‘learn_rate’; random forest tuning parameters: ‘mtry’, ‘ntree’, ‘nodesize’, ‘sampsize’; XGBoost tuning parameters: nrounds, max_depth, min_child_weight, subsample, eta, gamma).

After selecting the best parameter sets for each model, the dataset was separated into 70% training and 30% testing subsets. To ensure the reliability and robustness of our findings, we conducted 30 independent experimental trials, each using a unique random seed for the data splits. The experiments were conducted on a MSI laptop equipped with a 12th Gen Intel Core i9-12900HX Processor and 32GB RAM. Their performance was evaluated based on several metrics, which are critical in determining their predictive accuracy and computational efficiency. The first metric we chose is mean absolute error (MAE), which provides a straightforward average of the error magnitudes across all predictions, offering a clear measure of predictive accuracy (37). Mathematically, MAE is defined as:

$M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |$ [1]

Where $y_{i}$ represents the true value, ${\hat{y}}_{i}$ is the predicted value, and $n$ is the total number of predictions.

The second metric, root mean square error (RMSE), gives a sense of the average magnitude of errors by squaring the errors before averaging, which emphasizes larger errors more than MAE (38). It is defined as:

$R M S E = \sqrt{\frac{1}{n} {\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})}^{2}}$ [2]

These metrics provide complementary perspectives on predictive accuracy, with RMSE penalizing larger errors more heavily than MAE. Relative absolute error (RAE) and relative squared error (RSE) are normalized error metrics that provide a contextual measure of model accuracy by comparing the prediction errors to those of a baseline approach, which in our study is the mean prediction baseline (39). Moreover, we recorded the training and testing time to assess the computational demand of each model, ensuring the chosen approaches are not only accurate but also efficient and scalable within real-world hospital settings.

To further enhance the efficiency and effectiveness of our predictive models, we applied information-based feature selection methods, specifically mutual information (MI). MI quantifies the amount of information obtained about one random variable through another random variable (40). In this study, MI was used to identify and retain the most influential features for predicting hospital LOS, aiming to reduce model complexity and computational demands without significantly compromising prediction accuracy.

The MI feature selection method was chosen for its ability to capture non-linear relationships and quantify the dependence between features and the target variable (LOS) (41). Unlike other methods, such as Boruta or recursive feature elimination, MI directly measures the shared information between features and the target, providing a robust basis for selecting the most predictive features (42). This capability is particularly advantageous for healthcare datasets, where complex interactions between variables often exist. Mathematically, the MI $I (X; Y)$ between a feature $x$ and the target variable $y_{(L O S)}$ is defined as:

$I (X, Y) = \sum_{x \in X} \sum_{y \in Y} p (x, y) \log (\frac{p (x, y)}{p (x) p (y)})$ [3]

Where $p (x, y)$ is the joint probability distribution of $x$ and $y$ , and $p (x)$ and $p (y)$ are the marginal probability distributions of $x$ and $y$ , respectively. This equation quantifies how much knowing the value of a feature reduces the uncertainty about the target variable.

Using the reduced feature set, we retrained the model to evaluate the impact of feature selection on model performance and computational efficiency. The training and testing phases were repeated, with careful monitoring of the same key performance indicators. The retrained model with a reduced number of features was expected to maintain an acceptable level of accuracy while reducing the computational burden. This balance is critical in practical healthcare settings where both accuracy and efficiency are essential for real-time decision-making (43). By rigorously analyzing these metrics, the study aims to not only assess the effectiveness of the predictive models but also their practicality for implementation in healthcare environments.

Results

To evaluate the predictive capabilities and efficiency of the models, various performance metrics were analyzed. Tables 2,3 present the details of the predictive performance of four models: CNN, XGBoost, GBM, and Random Forest. The evaluation metrics included MAE, RMSE, RAE, and RSE. Additionally, computational efficiency was assessed through the mean training and testing times for each model.

Table 2

Performance of different ML models

Models	Mean MAE (SD)	Mean RMSE (SD)	Mean RAE (SD)	Mean RSE (SD)
CNN	0.3002 (0.0093)	0.4477 (0.0086)	0.1570 (0.0049)	0.0362 (0.0014)
XGBoost	0.2974 (0.0011)	0.3881 (0.0018)	0.1553 (0.0008)	0.0270 (0.0003)
GBM	0.3057 (0.0013)	0.4026 (0.0021)	0.1596 (0.0009)	0.0291 (0.0003)
Random forest	0.3641 (0.0030)	0.5959 (0.0067)	0.1900 (0.0017)	0.0637 (0.0013)

CNN, convolutional neural network; GBM, gradient boosting machine; MAE, mean absolute error; ML, machine learning; RMSE, root mean square error; RAE, relative absolute error; RSE, relative squared error; SD, standard deviation.

Table 3

Computational time of different models

Models	Mean training time (SD), s	Mean testing time (SD), s
CNN	3,168.18 (2.65)	16.12 (0.19)
XGBoost	441.63 (30.20)	1.26 (0.31)
GBM	402.40 (70.90)	0.97 (0.33)
Random forest	3,989.25 (171.00)	4.83 (0.43)

CNN, convolutional neural network; GBM, gradient boosting machine; SD, standard deviation.

Predictive performance of different ML models

Among the models tested, XGBoost demonstrated superior performance. It achieved the lowest mean MAE of 0.2974 with SD of 0.0011, indicating the highest accuracy in predicting LOS. The mean RMSE was 0.3881 (SD: 0.0018), further showcasing its robustness in minimizing prediction errors. Furthermore, XGBoost recorded the lowest mean RAE at 0.1553 (SD: 0.0008) and mean RSE at 0.0270 (SD: 0.0003), underscoring its efficiency in providing consistent and reliable predictions (44). The CNN model was the second-best performer, with a mean MAE of 0.3002 (SD: 0.0093) and mean RMSE of 0.4477 (SD: 0.0086). Although slightly less accurate than XGBoost, CNN demonstrated strong predictive capabilities with a mean RAE of 0.1570 (SD: 0.0049) and mean RSE of 0.0362 (SD: 0.0014). However, the relatively higher SDs suggest some variability in its predictions (45).

GBM exhibited commendable performance with a mean MAE of 0.3057 (SD: 0.0013) and mean RMSE of 0.4026 (SD: 0.0021). Its mean RAE was 0.1596 (SD: 0.0009) and mean RSE was 0.0291 (SD: 0.0003). Although GBM was slightly less accurate than XGBoost and CNN, it maintained a good balance between accuracy and variability (46). Random forest had the least favorable results among the models tested. It recorded a mean MAE of 0.3641 (SD: 0.0030) and mean RMSE of 0.5959 (SD: 0.0067). Additionally, its mean RAE was 0.19 (SD: 0.0017) and mean RSE was 0.0637 (SD: 0.0013). These metrics indicate that Random Forest struggled to match the predictive accuracy of the other models (47).

Computational efficiency

In terms of computational efficiency (see Table 3), GBM demonstrated the best performance with an average training time of 402.40 seconds (SD: 70.90) and an average testing time of 0.97 seconds (SD: 0.33). However, XGBoost was a close contender with a training time of 441.63 seconds (SD: 30.20) and a testing time of 1.26 seconds (SD: 0.31). The lower SD in XGBoost’s training time indicates more consistent performance. CNN had significantly longer training times, averaging 3168.18 seconds (SD: 2.65), and a testing time of 16.12 seconds (SD: 0.19). While its accuracy was high, the substantial computational resources required make it less practical for real-time applications (36). Random Forest, on the other hand, had the highest training time of 3,989.25 seconds (SD: 171.00) and a testing time of 4.83 seconds (SD: 0.43), further reducing its feasibility for time-sensitive predictions (48).

Despite GBM’s superior computational efficiency, XGBoost was ultimately selected as the preferred model due to its exceptional balance between accuracy and efficiency. XGBoost not only achieved the highest predictive accuracy but also demonstrated relatively low training and testing times with minimal variability. These attributes make XGBoost an ideal choice for practical implementation in hospital settings where both precision and computational speed are critical (49). The XGBoost model’s ability to handle non-linear relationships and its robust performance in managing the complexities of healthcare data make it highly suitable for predicting hospital LOS. Its scalability and efficiency ensure that it can be integrated seamlessly into existing hospital management systems, facilitating optimized resource allocation and improved patient care (44).

Model performance with MI-based feature selection

The performance and computational efficiency of the XGBoost models with different sets of features selected using the MI method have been comprehensively evaluated. The results, as shown in Tables 4,5, indicate that the model with 30 features (“rcount0”, “number_of_issues”, “facidE”, “psychologicaldisordermajor”, “rcount5”, “rcount4”, “rcount3”, “rcount2”, “rcount1”, “bloodureanitro”, “hematocrit”, “respiration”, “neutrophils”, “irondef”, “hemo”, “facidB”, “substancedependence”, “psychother”, “malnutrition”, “depress”, “dialysisrenalendstage”, “pneum”, “sodium”, “bmi”, “glucose”, “creatinine”, “pulse”, “asthma”, “facidC”, “facidD”) is the optimal choice for predicting hospital LOS. This section delves into a detailed discussion of these findings, focusing on the trade-offs between accuracy and computational efficiency.

Table 4

Performance of XGBoost models with different number of selected features

XGBoost models	Mean MAE (SD)	Mean RMSE (SD)	Mean RAE (SD)	Mean RSE (SD)
40 features	0.2974 (0.0011)	0.3881 (0.0018)	0.1553 (0.0008)	0.0270 (0.0003)
35 features	0.2974 (0.0010)	0.3882 (0.0017)	0.1553 (0.0007)	0.0271 (0.0002)
30 features	0.2973 (0.0011)	0.3879 (0.0017)	0.1552 (0.0008)	0.0270 (0.0003)
25 features	0.4663 (0.0022)	0.6518 (0.0036)	0.2434 (0.0017)	0.0763 (0.0011)
20 features	0.6755 (0.0024)	0.9014 (0.0050)	0.3527 (0.0021)	0.1459 (0.0019)

MAE, mean absolute error; RMSE, root mean square error; RAE, relative absolute error; RSE, relative squared error; SD, standard deviation.

Table 5

Computational time of XGBoost models with different number of selected features

XGBoost models	Mean training time (SD), s	Mean testing time (SD), s
40 features	310.54 (21.89)	0.80 (0.12)
35 features	99.71 (2.83)	0.28 (0.01)
30 features	99.00 (1.85)	0.28 (0.00)
25 features	92.84 (2.56)	0.27 (0.01)
20 features	61.27 (1.17)	0.28 (0.01)

SD, standard deviation.

The XGBoost model built with the top 30 features selected by the MI method exhibited the best performance across all evaluated metrics (see Table 4). Specifically, it achieved a mean MAE of 0.2973 (SD: 0.0011), mean RMSE of 0.3879 (SD: 0.0017), mean RAE of 0.1552 (SD: 0.0008), and mean RSE of 0.0270 (SD: 0.0003). These values indicate high predictive accuracy and low variability, making it a reliable model for practical applications.

In comparison, models with 35 and 40 features also performed well but did not significantly surpass the 30-feature model. The 35-feature model had a mean MAE of 0.2974 (SD: 0.001) and mean RMSE of 0.3882 (SD: 0.0017), while the 40-feature model showed similar results. This suggests that additional features beyond the top 30 do not substantially enhance predictive accuracy, potentially introducing redundancy and increasing model complexity without substantial gains. On the other hand, models with 25 and 20 features showed a marked decline in performance. The 25-feature model had a mean MAE of 0.4663 (SD: 0.0022) and mean RMSE of 0.6518 (SD: 0.0036), while the 20-feature model performed even worse, with a mean MAE of 0.6755 (SD: 0.0024) and mean RMSE of 0.9014 (SD: 0.0050). This sharp increase in error metrics highlights the loss of critical predictive information when fewer features are used, underscoring the importance of an optimal feature set.

The analysis of mean training and testing times provides insights into the feasibility of these models in real-time scenarios. The model with 40 features had the highest mean training time of 310.54 seconds (SD: 21.89) and a mean testing time of 0.80 seconds (SD: 0.12), indicating substantial computational demand. Conversely, the model with 30 features demonstrated a significantly lower mean training time of 99.00 seconds (SD: 1.85) and a mean testing time of 0.28 seconds (SD: 0.00), striking an optimal balance between accuracy and efficiency.

Models with 35 features also showed reasonable computational efficiency, with mean training and testing times of 99.71 seconds (SD: 2.83) and 0.28 seconds (SD: 0.01), respectively. However, the slight increase in SD for training time suggests marginally less consistency compared to the 30-feature model. While models with 25 and 20 features demonstrated reduced computational times (92.84 and 61.27 seconds for training, respectively), the significant drop in predictive accuracy makes them less suitable for reliable LOS prediction.

Based on the combined analysis of predictive performance and computational efficiency, the XGBoost model with 30 features selected by the MI stands out as the optimal solution. This model not only provides the highest accuracy with the lowest mean MAE and RMSE but also maintains efficient training and testing times, making it highly suitable for integration into hospital management systems. The scatter plots of mean MAE versus mean training time and mean MAE versus mean testing time (Figures 2,3) visually reinforce this conclusion. The green dots, representing the 30-feature model, are positioned at the lowest values for both MAE and computational times, clearly indicating the optimal trade-off between accuracy and efficiency.

Figure 2 Mean MAE and mean training time for models with different number of features. MAE, mean absolute error.

Figure 3 Mean MAE and mean testing time for models with different number of features. MAE, mean absolute error.

Analysis of feature importance

Patients were later classified into different risk levels based on the feature “number_of_issues”, which sums several binary health indicators. This composite metric provides a single, quantifiable measure of a patient’s overall health complexity. Low-risk patients are defined as those with a “number_of_issues” between 0 and 2; medium-risk patients have a “number_of_issues” between 3 and 5; and high-risk patients have a “number_of_issues” more than 6. This approach aligns with risk stratification methodologies commonly used in healthcare to categorize patients based on the severity and complexity of their conditions, which aids in tailoring healthcare management and resource allocation effectively.

As discussed above, we used MI to select the top 30 features from the dataset, which were then utilized to train the XGBoost model. Following this, we analyzed the top 5 features in terms of their information gain within the XGBoost model for different patient risk groups. Information gain in the context of the XGBoost model refers to the reduction in entropy or uncertainty about the target variable (in this case, LOS) achieved by partitioning the data based on a particular feature. It quantifies the importance of a feature in making accurate predictions within the model (50). The analysis, illustrated in the Figure 4, demonstrates how these predictors vary across all patients, low-risk patients, medium-risk patients, and high-risk patients.

Figure 4 Top 5 features in information gain. (A) Model of all patients; (B) model of low-risk patients; (C) model of medium-risk patients; (D) model of high-risk patients.

For the entire patient cohort (see Figure 4A), the top five features influencing LOS prediction are: “rcount0” (binary, 0 readmissions within the last 180 days), “number_of_issues”, “rcount1” (binary, 1 readmission within the last 180 days), “rcount2” (binary, 2 readmissions within the last 180 days), “hematocrit” (average hematocrit value during the encounter). This indicates that readmission history and overall health complexity are critical predictors of LOS for the general patient population. The finding of readmission history as a critical predictor of LOS is also consistent with previous research highlighting frequent readmissions as an indicator of underlying chronic conditions and the need for more intensive care management (51). Patients with high readmission rates may benefit from targeted interventions such as enhanced discharge planning and post-hospitalization follow-ups to reduce prolonged hospitalizations.

For low-risk patients (see Figure 4B), the top features remain similar to those for all patients, with “rcount0” being the most significant. Medium-risk patients (see Figure 4C) have a slightly different set of significant features, emphasizing the importance of blood parameters and readmission history. It is interesting to note that high-risk patients (see Figure 4D) show a broader range of critical clinical indicators. The top features include “rcount0”, “sodium”, “pulse”, “creatinine” and “bmi”, reflecting their complex health conditions.

Understanding which features significantly impact LOS for different patient groups enables more customized and tailored interventions. For instance, based on the above analysis, managing patients with a high number of readmissions differently could potentially improve hospitalization operation efficiency. Further, the ability to interpret the model’s predictions through feature importance helps build trust among clinicians. By seeing which factors are influencing the predictions, healthcare providers can adjust care plans, accordingly, ensuring that the prediction results are not only accurate but also actionable. This interpretability is essential for integrating machine learning models into clinical practice, where transparency and reliability are paramount (52).

Discussion

Key findings and contributions

This study demonstrates the significant potential of advanced machine learning techniques, specifically the XGBoost model with MI-based feature selection, in accurately predicting hospital LOS. The XGBoost model outperformed other models—including CNN, GBM and Random Forest—in both predictive accuracy and computational efficiency. By leveraging MI for feature selection, the model effectively identified the most relevant predictors of LOS, achieving high accuracy while reducing computational demands. The analysis revealed that readmission history, overall health complexity (number_of_issues) and hematocrit are critical factors influencing LOS for all the patients in this dataset. The identification of these key factors aligns with clinical practices, underscoring the necessity for proactive patient management. For example, targeted interventions for patients at high risk of readmission, such as personalized discharge planning and follow-up care, have been shown to reduce hospital stays (53).

The contributions of this study are threefold: firstly, it enhances the modeling of hospital LOS predictions through the application of advanced ensemble tree models, which have demonstrated superior performance in predicting LOS with significant improvements in accuracy and computational efficiency. Secondly, our analysis of feature selection method, specifically MI-based technique, showcases the efficacy of this method in optimizing the feature set while reducing model complexity. Lastly, the analysis of feature importance across different patient health conditions offers invaluable insights into the impact of clinical and physiological variables on LOS, potentially paving the way for personalized patient care strategies.

Strengths and limitations

One of the primary strengths of this study is the use of advanced modeling techniques. The XGBoost model, combined with MI-based feature selection, represents a cutting-edge approach in predictive analytics capable of handling the complex, non-linear relationships inherent in healthcare data. The model achieved superior performance metrics, such as the lowest MAE and RMSE, indicating high accuracy in LOS predictions. Additionally, XGBoost demonstrated efficient training and testing times, making it practical for real-time applications in hospital settings. The analysis of feature importance enhances the model’s transparency, allowing clinicians to understand the factors influencing predictions, which is crucial for trust and adoption in clinical practice.

Despite these strengths, the study has several limitations. First, the dataset lacks a wider representation in terms of demographic and clinical variables. To enhance the model’s applicability in real clinical settings for predicting LOS, it is crucial to include additional variables, particularly some key clinical and sociodemographic factors such as age and race, during model training. Future datasets should also consider encompassing a broader range of hospital systems, reflecting variations across geographical locations and patient demographics to enhance the model’s generalizability. Developing consistent data collection and reporting frameworks across healthcare systems will play an important role in improving the reliability, comparability, and real-world applicability of hospital LOS prediction models. The absence of comprehensive demographic variables also limits the ability to fully characterize the patient population and may affect the external validity of the findings.

While ensemble tree models offer more interpretability than some other machine learning techniques, there remains a need for even more transparent models to ensure clinician trust and facilitate integration into clinical workflows. The models also do not account for real-time changes in hospital environments, such as sudden surges in-patient admissions due to epidemics or emergencies, which may affect their adaptability and accuracy in dynamic scenarios. The use of static historical data may not reflect current trends or future shifts in healthcare practices, limiting the model’s long-term applicability without regular updates. The actual impact of implementing LOS predictions in real-world hospital settings requires validation through prospective studies. Future research should focus on pilot implementations to assess the practical effectiveness of LOS predictions in hospital workflows. These studies should evaluate staff adoption, integration with EHR, and the influence on decision-making processes. In addition, a more rigorous classification of patient risk can be achieved by incorporating additional variables related to their admitted conditions, such as other malignancy, chronic disease, or heart failure, along with relevant domain knowledge.

While our study demonstrates the potential of machine learning-based LOS prediction models to optimize hospital resource allocation and patient care, it does not incorporate a formal cost-effectiveness analysis. One of the primary challenges in assessing the cost-effectiveness of predictive models is the complexity of hospital cost structures, which include fixed costs (e.g., infrastructure, staffing) and variable costs (e.g., medications, consumables, diagnostics). Future research should explore health economic models, such as cost-benefit or cost-utility analyses, to systematically evaluate the economic viability of LOS prediction models. Additionally, prospective studies should measure the real-world impact on hospital expenditures, including operational efficiencies, reduction in avoidable hospital stays, and overall healthcare costs.

Comparison with similar research

Previous studies have explored various machine learning approaches for predicting hospital LOS, including regression models, decision trees, neural networks, and support vector machines (9,13,19). These studies have shown that machine learning models can outperform traditional statistical methods in accuracy and adaptability. However, many prior studies focused on specific patient populations or utilized limited datasets, reducing their applicability across different healthcare facilities (1,12). Moreover, many previous studies tend to focus on the LOS prediction accuracies, and ignore some other aspects of LOS problem, such as computational efficiency and model interpretability (1,8).

This study extends the existing literature by employing a comprehensive dataset containing 100,000 patient admission records from different hospital departments and 26 features, allowing for a nuanced understanding of LOS across diverse patient conditions. The use of the XGBoost model with MI-based feature selection resulted in higher predictive accuracy and computational efficiency compared to other models. Additionally, the emphasis on feature importance and model interpretability addresses the gaps in earlier research.

Explanations of findings

The superior performance of the XGBoost model can be attributed to several factors. Firstly, XGBoost’s ability to manage non-linear interactions and complex patterns within the data makes it well-suited for healthcare applications where such complexities are common. The effective feature selection through MI efficiently identified the most relevant predictors of LOS, keeping model accuracy while reducing computational load. Additionally, unlike some machine learning methods that require extensive preprocessing or imputation strategies, XGBoost’s robustness allows for more streamlined data handling, reducing the burden on users during input data preparation (44).

Moreover, understanding which features significantly impact LOS enables more targeted interventions (54). For example, in our study, implementing more detailed and tailored approaches for patients with a high frequency of readmissions could enhance the management of hospitalization and LOS. This targeted approach is also important for patients, who may require more intensive monitoring and personalized care plans based on their specific health metrics. These findings align clinical knowledge, as factors like previous readmissions and the number of health issues are known to influence hospital stay durations (55,56). The model’s effectiveness in capturing these relationships demonstrates its potential utility in clinical settings. The ability to interpret the model’s predictions through feature importance helps build trust among clinicians, as they can adjust care plans accordingly, ensuring that the predictions are not only accurate but also actionable.

Implications and actions needed

The implications of this study are significant for hospital operations, healthcare professionals, and patient care. Accurate LOS predictions enable better bed management, reduced overcrowding, and optimized scheduling of surgeries and procedures, leading to improved operational efficiency. This enhancement extends to the scheduling of surgeries and procedures, resulting in more efficient use of hospital beds, operating rooms and other critical facilities. For instance, a study examining the influence of prolonged hospitalizations found that 0.6% of admission episodes accounted for 11% of overall bed occupancy over a five-year period (57). This indicates that extended stays by a small fraction of patients can disproportionately affect bed availability, underscoring the importance of administrators’ strategies to accurately predict and reduce LOS to optimize bed utilization. Another study demonstrated that a relatively small decrease (half-day) in LOS for patients with community-acquired pneumonia could lead to substantial cost savings, estimated between $457 to $846 per episode, amounting to $500–$900 million annually (58). Such savings can be redirected to other critical areas within the hospital, thereby improving overall healthcare delivery.

For healthcare professionals, the automated predictions and alerts generated by the model streamline workflow processes, allowing providers to focus more on patient care rather than administrative duties, contributing to improved job satisfaction among healthcare staff. Optimizing staffing levels based on accurate LOS predictions can reduce the risk of overwork and burnout among healthcare professionals, maintaining a balanced and effective workforce. Furthermore, the ability to predict LOS with high accuracy enhances the timeliness and efficiency of patient care, enabling healthcare professionals to provide more responsive and effective treatment plans.

Patients also stand to benefit significantly from the deployment of this predictive model. The reduction in wait times and improved access to necessary treatments, facilitated by better resource allocation and scheduling, led to enhanced patient satisfaction (59,60). Optimizing LOS predictions helps mitigate the risk of hospital-acquired infections, as patients are less likely to experience prolonged hospital stays unnecessarily. This efficiency in patient throughput contributes to a better overall patient experience, characterized by timely care and well-coordinated discharge planning, ultimately leading to better medical outcomes (61). Early identification of patients’ expected discharge dates facilitates better coordination with outpatient services, rehabilitation centers, and home care providers. This ensures a seamless transition from hospital to home or another care facility (62).

To realize these benefits, several actions are needed. Developing a web or cloud-based platform that integrates the trained XGBoost model for real-time LOS predictions can significantly enhance its accessibility and usability (63,64). Such a platform can be integrated with electronic health records (EHR) systems to automatically retrieve relevant patient data and provide immediate predictions. Incorporating real-time data streams into the predictive model can improve its adaptability to dynamic hospital environments, allowing it to adjust predictions based on current hospital capacity and patient influx. Establishing standardized practices for data collection and sharing will further enhance the potential for actionable and comprehensive findings in LOS prediction research, aligning with the recommendations highlighted in recent study (1). Utilizing a more diverse dataset that includes records across different hospital systems and locations can improve the model’s generalizability, helping to capture a broader spectrum of patient conditions and healthcare practices. The time horizon for predictions and the availability of variables at that specific point are also critical factors to consider when translating models into real-world settings. Exploring and integrating more interpretable machine learning models, such as explainable AI (XAI) techniques, may further improve clinicians’ trust and acceptance of predictive analytics (65). This would involve developing tools that provide clearer and more understandable explanations for model predictions.

By addressing these actions, the predictive models developed in this study can be refined and effectively integrated into healthcare systems. This integration would lead to optimized resource allocation, reducing costs and improving patient flow. Tailoring care strategies based on individual LOS predictions and risk profiles enhances patient outcomes, contributing to personalized patient care. Clinicians can utilize model insights to make more informed decisions regarding treatment plans and discharge timing, enhancing clinical decision-making.

Conclusions

Through rigorous evaluation, XGBoost emerged as the superior model, demonstrating significant improvements in accuracy, as evidenced by its lowest MAE, RMSE, RAE and RSE compared to other models (i.e., GBM, random forest, and XGBoost, alongside CNN). Our analysis also highlighted the importance of feature selection in optimizing model performance. The identified top features also improve the model’s interpretability and practicality among clinicians, potentially facilitate more tailored treatment plan for patients. Furthermore, the streamlined computational requirements and robust performance of the model pave the way for integration into hospital workflows, promising improvements in resource allocation, operational efficiency, and overall patient care.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://jhmhp.amegroups.com/article/view/10.21037/jhmhp-24-129/rc

Peer Review File: Available at https://jhmhp.amegroups.com/article/view/10.21037/jhmhp-24-129/prf

Funding: None.

Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://jhmhp.amegroups.com/article/view/10.21037/jhmhp-24-129/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Stone K, Zwiggelaar R, Jones P, et al. A systematic review of the prediction of hospital length of stay: Towards a unified framework. PLOS Digit Health 2022;1:e0000017. [Crossref] [PubMed]
Humphreys P, Spratt B, Tariverdi M, et al. An Overview of Hospital Capacity Planning and Optimisation. Healthcare (Basel) 2022;10:826. [Crossref] [PubMed]
Sonis JD, Aaronson EL, Lee RY, et al. Emergency Department Patient Experience: A Systematic Review of the Literature. J Patient Exp 2018;5:101-6. [Crossref] [PubMed]
Åhlin P, Almström P, Wänström C. Solutions for improved hospital-wide patient flows - a qualitative interview study of leading healthcare providers. BMC Health Serv Res 2023;23:17. [Crossref] [PubMed]
Ellis LA, Tran Y, Pomare C, et al. Hospital organizational change: The importance of teamwork culture, communication, and change readiness. Front Public Health 2023;11:1089252. [Crossref] [PubMed]
Kelly LA, Gee PM, Butler RJ. Impact of nurse burnout on organizational and position turnover. Nurs Outlook 2021;69:96-102. [Crossref] [PubMed]
Lu L, Ko YM, Chen HY, et al. Patient Safety and Staff Well-Being: Organizational Culture as a Resource. Int J Environ Res Public Health 2022;19:3722. [Crossref] [PubMed]
Awad A, Bader-El-Den M, McNicholas J. Patient length of stay and mortality prediction: A survey. Health Serv Manage Res 2017;30:105-20. [Crossref] [PubMed]
Lequertier V, Wang T, Fondrevelle J, et al. Hospital Length of Stay Prediction Methods: A Systematic Review. Med Care 2021;59:929-38. [Crossref] [PubMed]
Kruk ME, Gage AD, Arsenault C, et al. High-quality health systems in the Sustainable Development Goals era: time for a revolution. Lancet Glob Health 2018;6:e1196-252. [Crossref] [PubMed]
Mansoori A, Zeinalnezhad M, Nazarimanesh L. Optimization of Tree-Based Machine Learning Models to Predict the Length of Hospital Stay Using Genetic Algorithm. J Healthc Eng 2023;2023:9673395. [Crossref] [PubMed]
Zeleke AJ, Palumbo P, Tubertini P, et al. Machine learning-based prediction of hospital prolonged length of stay admission at emergency department: a Gradient Boosting algorithm analysis. Front Artif Intell 2023;6:1179226. [Crossref] [PubMed]
Jaotombo F, Pauly V, Fond G, et al. Machine-learning prediction for hospital length of stay using a French medico-administrative database. J Mark Access Health Policy 2022;11:2149318. [Crossref] [PubMed]
Vázquez-Serrano JI, Peimbert-García RE, Cárdenas-Barrón LE. Discrete-Event Simulation Modeling in Healthcare: A Comprehensive Review. Int J Environ Res Public Health 2021;18:12262. [Crossref] [PubMed]
Urach C, Zauner G, Wahlbeck K, et al. Statistical methods and modelling techniques for analysing hospital readmission of discharged psychiatric patients: a systematic literature review. BMC Psychiatry 2016;16:413. [Crossref] [PubMed]
Wang Y, Liu L, Wang C. Trends in using deep learning algorithms in biomedical prediction systems. Front Neurosci 2023;17:1256351. [Crossref] [PubMed]
Kolyshkina I, Simoff S. Interpretability of Machine Learning Solutions in Public Healthcare: The CRISP-ML Approach. Front Big Data 2021;4:660206. [Crossref] [PubMed]
Nambiar A. S H, S S. Model-agnostic explainable artificial intelligence tools for severity prediction and symptom analysis on Indian COVID-19 data. Front Artif Intell 2023;6:1272506. [Crossref] [PubMed]
Microsoft [Internet]. 2017 [cited 2024 Apr 11]. Predicting Hospital Length of Stay. Available online: https://github.com/Microsoft/r-server-hospital-length-of-stay
Hyland SL, Faltys M, Hüser M, et al. Early prediction of circulatory failure in the intensive care unit using machine learning. Nat Med 2020;26:364-73. [Crossref] [PubMed]
Gupta R, Woo K, Yi JA. Epidemiology of end-stage kidney disease. Semin Vasc Surg 2021;34:71-8. [Crossref] [PubMed]
Yeh JJ, Lin HC, Yang YC, et al. Asthma Therapies on Pulmonary Tuberculosis Pneumonia in Predominant Bronchiectasis-Asthma Combination. Front Pharmacol 2022;13:790031. [Crossref] [PubMed]
Kumar A, Sharma E, Marley A, et al. Iron deficiency anaemia: pathophysiology, assessment, practical management. BMJ Open Gastroenterol 2022;9:e000759. [Crossref] [PubMed]
Karrouri R, Hammani Z, Benjelloun R, et al. Major depressive disorder: Validated treatments and future challenges. World J Clin Cases 2021;9:9350-67. [Crossref] [PubMed]
Ahmed S. Long-term health after Severe Acute Malnutrition in children and adults- the role of the Pancreas (SAMPA): Protocol. F1000Res 2022;11:777. [Crossref] [PubMed]
Seo IH, Lee YJ. Usefulness of Complete Blood Count (CBC) to Assess Cardiovascular and Metabolic Diseases in Clinical Settings: A Comprehensive Literature Review. Biomedicines 2022;10:2697. [Crossref] [PubMed]
Basile DP, Anderson MD, Sutton TA. Pathophysiology of acute kidney injury. Compr Physiol 2012;2:1303-53. [Crossref] [PubMed]
Pojednic R, D'Arpino E, Halliday I, et al. The Benefits of Physical Activity for People with Obesity, Independent of Weight Loss: A Systematic Review. Int J Environ Res Public Health 2022;19:4981. [Crossref] [PubMed]
Xin Z, Lv R, Liu W, et al. An ensemble learning-based feature selection algorithm for identification of biomarkers of renal cell carcinoma. PeerJ Comput Sci 2024;10:e1768. [Crossref] [PubMed]
Sarker IH. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput Sci 2021;2:420. [Crossref] [PubMed]
Shi L. The impact of primary care: a focused review. Scientifica (Cairo) 2012;2012:432892. [Crossref] [PubMed]
Lu SC, Swisher CL, Chung C, et al. On the importance of interpretable machine learning predictions to inform clinical decision making in oncology. Front Oncol 2023;13:1129380. [Crossref] [PubMed]
Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot 2013;7:21. [Crossref] [PubMed]
Lindner C. Automated Image Interpretation Using Statistical Shape Models. In: Zheng G, Li S, Székely G, editors. Statistical Shape and Deformation Analysis: Methods, Implementation and Applications. Academic Press, 2017;3-32.
Xiao Y, Chen Y, Huang R, et al. Interpretable machine learning in predicting drug-induced liver injury among tuberculosis patients: model development and validation study. BMC Med Res Methodol 2024;24:92. [Crossref] [PubMed]
Alzubaidi L, Zhang J, Humaidi AJ, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 2021;8:53. [Crossref] [PubMed]
Li J. Assessing the accuracy of predictive models for numerical data: Not r nor r2, why not? Then what? PLoS One 2017;12:e0183250. [Crossref] [PubMed]
Robeson SM, Willmott CJ. Decomposition of the mean absolute error (MAE) into systematic and unsystematic components. PLoS One 2023;18:e0279774. [Crossref] [PubMed]
Canbek G. BenchMetrics Prob: benchmarking of probabilistic error/loss performance evaluation instruments for binary classification problems. Int J Mach Learn Cybern 2023; Epub ahead of print. [Crossref] [PubMed]
Benish WA. Mutual information as an index of diagnostic test performance. Methods Inf Med 2003;42:260-4. [Crossref] [PubMed]
Beraha M, Metelli AM, Papini M, et al. Feature Selection via Mutual Information: New Theoretical Insights. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE; 2019. p. 1-9.
Chen Y, Ma L, Yu D, et al. Comparison of feature selection methods for mapping soil organic matter in subtropical restored forests. Ecol Indic 2022;135:108545. [Crossref]
Hossain T, Shamrat FMJM, Zhou X, et al. Development of a multi-fusion convolutional neural network (MF-CNN) for enhanced gastrointestinal disease diagnosis in endoscopy image analysis. PeerJ Comput Sci 2024;10:e1950. [Crossref] [PubMed]
Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016. p. 785-94.
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436-44. [Crossref] [PubMed]
Friedman JH. Greedy function approximation: A gradient boosting machine. The Ann Statist 2001;29:1189-232. [Crossref]
Breiman L. Random Forests. Mach Learn 2001;45:5-32. [Crossref]
Liaw A, Wiener MC. Classification and Regression by randomForest. In 2007. Available online: https://api.semanticscholar.org/CorpusID:3093707
Bohr A, Memarzadeh K. The rise of artificial intelligence in healthcare applications. In: Bohr A, Memarzadeh K, editor. Artificial Intelligence in Healthcare. Elsevier; 2020. p. 25-60.
Singer G, Cohen I. An Objective-Based Entropy Approach for Interpretable Decision Tree Models in Support of Human Resource Management: The Case of Absenteeism at Work. Entropy (Basel) 2020;22:821. [Crossref] [PubMed]
Brunner-La Rocca HP, Peden CJ, Soong J, et al. Reasons for readmission after hospital discharge in patients with chronic diseases-Information from an international dataset. PLoS One 2020;15:e0233457. [Crossref] [PubMed]
Metta C, Beretta A, Pellungrini R, et al. Towards Transparent Healthcare: Advancing Local Explanation Methods in Explainable Artificial Intelligence. Bioengineering (Basel) 2024;11:369. [Crossref] [PubMed]
Kripalani S, Theobald CN, Anctil B, et al. Reducing hospital readmission rates: current strategies and future directions. Annu Rev Med 2014;65:471-85. [Crossref] [PubMed]
Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ 2023;23:689. [Crossref] [PubMed]
Benbassat J, Taragin M. Hospital readmissions as a measure of quality of health care: advantages and limitations. Arch Intern Med 2000;160:1074-81. [Crossref] [PubMed]
Shalchi Z, Saso S, Li HK, et al. Factors influencing hospital readmission rates after acute medical treatment. Clin Med (Lond) 2009;9:426-30. [Crossref] [PubMed]
Quinn MP, Courtney AE, Fogarty DG, et al. Influence of prolonged hospitalization on overall bed occupancy: a five-year single-centre study. QJM 2007;100:561-6. [Crossref] [PubMed]
Raut M, Schein J, Mody S, et al. Estimating the economic impact of a half-day reduction in length of hospital stay among patients with community-acquired pneumonia in the US. Curr Med Res Opin 2009;25:2151-7. [Crossref] [PubMed]
Al Harbi S, Aljohani B, Elmasry L, et al. Streamlining patient flow and enhancing operational efficiency through case management implementation. BMJ Open Qual 2024;13:e002484. [Crossref] [PubMed]
Robinson J, Porter M, Montalvo Y, et al. Losing the wait: improving patient cycle time in primary care. BMJ Open Qual 2020;9:e000910. [Crossref] [PubMed]
Bhati D, Deogade MS, Kanyal D. Improving Patient Outcomes Through Effective Hospital Administration: A Comprehensive Review. Cureus 2023;15:e47731. [Crossref] [PubMed]
Gonçalves-Bradley DC, Lannin NA, Clemson L, et al. Discharge planning from hospital. Cochrane Database Syst Rev 2022;2:CD000313. [PubMed]
Mahyoub MA, Dougherty K, Yadav RR, et al. Development and validation of a machine learning model integrated with the clinical workflow for inpatient discharge date prediction. Front Digit Health 2024;6:1455446. [Crossref] [PubMed]
Guan Z, Li H, Liu R, et al. Artificial intelligence in diabetes management: Advancements, opportunities, and challenges. Cell Rep Med 2023;4:101213. [Crossref] [PubMed]
Ali S, Abuhmed T, El-Sappagh S, et al. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Information Fusion 2023;99:101805. [Crossref]

doi: 10.21037/jhmhp-24-129
Cite this article as: Peng B, Gao S. Prediction of hospital length of stay: leveraging ensemble tree models and intelligent feature selection. J Hosp Manag Health Policy 2025;9:13.

Prediction of hospital length of stay: leveraging ensemble tree models and intelligent feature selection

Highlight box

Introduction

Background

Rationale and knowledge gap

Related work

Objective

Methods

Data description and pre-processing

Table 1

Model development and evaluation

Results

Table 2

Table 3

Predictive performance of different ML models

Computational efficiency

Model performance with MI-based feature selection

Table 4

Table 5

Analysis of feature importance

Discussion

Key findings and contributions

Strengths and limitations

Comparison with similar research

Explanations of findings

Implications and actions needed

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share