Background: The association between the glucose-to-lymphocyte ratio (GLR) and adverse outcomes in intensive care unit patients receiving mechanical ventilation (MV) has not been clearly established.
Aims: To examine the link between GLR and 28-day mortality in MV patients and to develop an interpretable machine learning model to predict mortality risk.
Study Design: A retrospective study.
Methods: Data were obtained from the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 3.1) database. Receiver operating characteristic (ROC) and restricted cubic spline (RCS) curves were employed to assess the relationship between GLR and mortality. Patients were categorized into high and low GLR groups for Kaplan-Meier survival analysis. Subgroup analyses were performed to evaluate the association across different patient populations. Selected variables were used to construct eXtreme Gradient Boosting (XGBoost), support vector machine, Naive Bayes, and k-nearest neighbors models. Model interpretability was assessed using SHapley Additive exPlanations (SHAP) values.
Results: A total of 5,738 patients met the inclusion criteria. RCS analysis indicated a nonlinear relationship between GLR and 28-day mortality. Patients with elevated GLR had significantly higher 28-day mortality rates (hazard ratio > 1, p < 0.05). Among the models, XGBoost demonstrated the best performance, achieving an area under the ROC curve of 0.969 and an F1-score of 0.963. SHAP analysis identified Acute Physiology Score III, GLR, and lactate as the three most important predictors.
Conclusion: GLR is nonlinearly associated with 28-day mortality in patients undergoing MV and may serve as a valuable prognostic marker. The interpretable XGBoost model confirmed the significant association between GLR and short-term mortality.
Mechanical ventilation (MV) is a widely used therapeutic approach in the intensive care unit (ICU), that is vital for supporting respiratory function and enhancing oxygenation in critically ill patients.1 Despite its benefits, MV is linked to several risks, including complications like ventilator-associated pneumonia and airway injuries. These complications are significantly associated with patient outcomes and may lead to increased mortality.2 Consequently, the early identification of high-risk patients receiving MV and the application of appropriate interventions are critical for improving survival.
In recent years, there has been growing interest in the relationship between blood-based biomarkers and clinical outcomes. Among these, the glucose-to-lymphocyte ratio (GLR) has gained attention as a simple and accessible biomarker. GLR reflects both the inflammatory state and immune function of the body, with elevated values potentially indicating systemic inflammation, immune suppression, or serious infections, which may negatively impact prognosis.3 Prior research has shown that GLR is linked to outcomes in several conditions, including acute respiratory distress syndrome,4 chronic obstructive pulmonary disease,5 and acute pancreatitis.6 However, comprehensive studies examining the prognostic value of GLR in patients undergoing MV, particularly in relation to mortality, are still lacking.
This study presents an innovative approach by integrating the GLR with machine learning (ML) techniques to analyze the dose-response relationship between GLR and 28-day in-hospital mortality and to develop an early-warning predictive model for stratifying mortality risk in critically ill patients receiving MV. ML algorithms are particularly effective at capturing complex nonlinear interactions,7 thereby supporting the development of interpretable decision-making tools tailored for the personalized management of MV patients.8
Data source
The patient data for this study were derived from the Medical Information Mart for Intensive Care IV, version 3.1 (MIMIC-IV v3.1), a comprehensive and publicly available database comprising clinical data from critically ill patients. This dataset includes detailed information on individuals admitted to the ICU at Beth Israel Deaconess Medical Center in Boston, United State of America. Data extraction complied with the ethical and privacy standards governing the MIMIC-IV database, with all personal identifiers removed to protect patient confidentiality. Access to the MIMIC-IV database was obtained with the appropriate certification (certification number, 57838517).
Inclusion criteria
Exclusion criteria
Data extraction and handling
We extracted patient-related information including vital signs, height, weight, laboratory test results, clinical scoring systems, renal replacement therapy (RRT) status, administration of epinephrine, and comorbid conditions such as hypertension and diabetes. All variables were obtained from the initial monitoring data recorded within the first 24 hours after ICU admission.
The GLR was calculated by dividing the glucose concentration (mg/dL) made by the lymphocyte count (K/µL).6 Using the median GLR value6 as a cut-off, patients with a GLR > 83.333 were assigned to the high GLR group, while those with a GLR ≤ 83.333 were assigned to the low GLR group. The primary outcome of interest in this study was the 28-day mortality.
Variables with more than 10% missing data were excluded from further analysis. For the remaining missing values, multiple imputation was performed using the “mice” package in R. Random forest methods (method = “rf”) were applied for imputing continuous variables. A total of 5 imputed datasets were created following 10 iterations, and convergence was assessed using diagnostic plots showing trends in means and variances. A Pareto chart illustrating the distribution of variables with missing data is provided in Supplementary Figure 1.
Statistical analysis
All statistical analyses were conducted using R software version 4.4.1. Continuous variables were expressed as mean ± standard deviation (SD) or as medians with interquartile ranges, depending on data distribution. The Shapiro-Wilk test was applied to assess normality. For data with a normal distribution, the t-test was used; for non-normally distributed data, the Mann-Whitney U test was applied. Categorical variables were summarized as frequencies and percentages, and group comparisons were carried out using either the chi-squared test or Fisher’s exact test. A two-sided p < 0.05 was considered statistically significant.
GLR and mortality
To evaluate the predictive value of GLR for adverse outcomes, a receiver operating characteristic (ROC) curve was constructed. For comparison, ROC curve was also generated to assess the predictive ability of the neutrophil-to-lymphocyte ratio (NLR) with respect to 28-day mortality. Additionally, Kaplan-Meier survival curves were used to compare all-cause mortality between patients with high and low GLR values. A restricted cubic spline (RCS) curve was plotted to examine the potential non-linear association between GLR and mortality. Subgroup analyses were performed using Cox proportional hazards regression, stratifying by age (< 65 and ≥ 65 years), gender, presence of hypertension or diabetes, use of RRT, and epinephrine administration. Hazard ratios (HRs) with 95% confidence intervals (CIs) were displayed using forest plots.
Feature selection
Continuous variables were standardized using z-score normalization, converting each to have a mean of 0 and SD of 1. Categorical variables were transformed through one-hot encoding to produce binary, mutually exclusive representations. Initially, univariate logistic regression (LR) analysis was performed to identify significant variables. These were then subjected to Least Absolute Shrinkage and Selection Operator (LASSO) regression to reduce model complexity and minimize overfitting. Multivariate LR analysis was subsequently applied for further refinement of the variable set.9 To check for multicollinearity among the retained predictors, variance inflation factors (VIF) were calculated, with a threshold of VIF < 5 to ensure acceptable levels.
Interpretable model development and validation
The dataset was randomly divided into training and testing subsets in a 7:3 ratio. Stratified sampling was used based on the target outcome to maintain consistent proportions of mortality events across both subsets. Predictive models for estimating the 28-day mortality risk in MV patients were developed and compared using four ML algorithms: eXtreme Gradient Boosting (XGBoost), support vector machines (SVM), Naive Bayes (NB), and k-nearest neighbors (KNN). Model tuning was performed through hyperparameter optimization and fivefold cross-validation.10
Model performance was evaluated using ROC curves and the area under the curve (AUC). Decision curve analysis (DCA) was employed to assess clinical applicability, while calibration curves were used to evaluate the accuracy of predicted absolute risk values.
To enhance interpretability, SHapley Additive exPlanations (SHAP) values-grounded in cooperative game theory-were used to quantify the individual contribution of each feature to the model’s predictions,11,12 thereby supporting model transparency and trustworthiness.
Baseline characteristics
As illustrated in Figure 1, a total of 5,738 patients met the inclusion criteria. The baseline characteristics of the study cohort are summarized in Table 1. The median age was 66.7 years, and 67.3% of the patients were male. The group with a high GLR exhibited a significantly higher 28-day all-cause mortality rate compared to the low GLR group (22.2% vs. 5.06%, p < 0.001). This group also had elevated values for heart rate (HR), respiratory rate, systolic and diastolic blood pressure, MV duration, white blood cell (WBC) count, lactate concentration, red blood cell count, hemoglobin (Hb), red blood cell distribution width, blood urea nitrogen, creatinine, Simplified Acute Physiology Score II, and Acute Physiology Score III (APS III), along with a higher incidence of RRT. In contrast, the high GLR group had lower levels of pH, PaO2/FiO2 ratio, international normalized ratio, and prothrombin time.
Statistical results
The predictive value of GLR for 28-day mortality in MV patients was evaluated using the ROC curve (Figure 2a). The GLR yielded an AUC of 0.75 (95% CI, 0.73-0.76), outperforming the NLR, which had an AUC of 0.73 (95% CI, 0.710-0.750), indicating superior sensitivity and specificity for GLR in predicting 28-day mortality (Supplementary Figure 2). The optimal GLR threshold determined from the ROC analysis was 103.865, based on Youden’s index, achieving a sensitivity of 0.73 and a specificity of 0.66. Additionally, the Kaplan-Meier survival analysis (Figure 2b) demonstrated a statistically significant association between elevated GLR levels and reduced survival probability (p < 0.0001).
Whether or not covariates were adjusted for, RCS analysis was performed, as shown in Supplementary Figure 3. The RCS curve revealed a nonlinear relationship between the GLR and 28-day all-cause mortality. In the RCS plot, it was observed that when GLR was below 83.48 [odds ratio (OR) < 1], the curve gradually increased as GLR values rose, indicating that higher GLR was associated with a higher risk of mortality. At approximately 83.48 (OR = 1), the risk of mortality reached baseline levels. As GLR continued to increase, the curve ascended further, suggesting that further rises in GLR were linked to an increased mortality risk. However, at very high GLR values, nearing the hundreds, the curve began to plateau and slightly decline, indicating that the impact of GLR on mortality risk may diminish at very high levels.
The subgroup analysis (Figure 3) revealed significant interactions of GLR with age (p for interaction = 0.033) and RRT (p for interaction < 0.001). For older adults (> 65 years, HR = 3.90, p < 0.001), targeted interventions are recommended. GLR remained strongly associated with mortality across different subgroups, including gender (age > 65, HR = 3.90, p < 0.001), hypertension status (HR >1, p < 0.001, p for interaction = 0.194), and diabetes status (HR > 1, p < 0.001, p for interaction = 0.252).
Establishment and validation of the prediction model
Univariate LR analysis was conducted to remove nonsignificant variables, including magnesium, Glasgow Coma Scale scores, body mass index, epinephrine use, and diabetes status. Next, LASSO regression was applied for feature selection, which resulted in the identification of 22 variables (Supplementary Figure 4). These variables were then subjected to multivariate LR analysis (Supplementary Table 1), and after excluding those without statistical significance, 17 key variables were selected as the basis for model development. All chosen features had VIF values below 2, confirming that multicollinearity was controlled (Supplementary Table 2).
Four ML models-XGBoost, SVM, NB, KNN-were trained on 70% of the dataset (n = 4,017). In the training set, XGBoost exhibited the best performance, with an AUC of 0.969 (95% CI, 0.963-0.975) and an F1-score of 0.963, followed by SVM, KNN, and NB (Table 2). Similar outcomes were observed in the testing set, where XGBoost remained the top performer, achieving an AUC of 0.899 (95% CI, 0.883-0.916) and an F1-score of 0.935 (Table 3).
The model’s discrimination ability and clinical utility were further confirmed through ROC (Figures 4a, c) and DCA (Figures 4b, d). Given its strong performance in both sets, XGBoost was selected for subsequent interpretation. In addition, the calibration curve for the XGBoost model in the testing set is shown in Supplementary
Figure 5.
Model explanation
To interpret the model, SHAP values were used to assess the contribution of each feature to the predictions at both the global and local levels. In the global analysis, the SHAP summary plot (Figure 5a) was created, where purple represents lower values and yellow represents higher values. This plot provides detailed insights, for instance, it shows that the APS III score is a significant feature, with its magnitude (from low to high) indicating whether it has a positive or negative effect on the model’s predictions. Moreover, the GLR impact region is relatively broad (with a wide horizontal spread), suggesting that it has a substantial influence on the model’s predictions.
Additionally, the variable importance ranking plot (Figure 5b) illustrates that features with higher mean absolute SHAP values have a stronger influence on the model’s predictions. The features are ranked in descending order of importance. APS III, GLR, and lactate exhibited the strongest predictive impacts, likely being central drivers of mortality risk, whereas partial thromboplastin time and Hb showed lesser contributions, potentially reflecting auxiliary indicators or weaker associations with the outcome.
In the local analysis, the SHAP force plot for a randomly selected patient (Figure 5c) confirmed these findings, quantifying the individual contributions of each feature to the prediction and identifying the key factors driving the model’s decision.
The findings of this study suggest a strong association between GLR and MV in critically ill patients. Regardless of gender or the presence of hypertension and diabetes, patients with higher GLR levels (GLR > 83.333) are more likely to experience negative outcomes, establishing GLR an independent risk factor for 28-day mortality.
A growing body of evidence has highlighted the relationship between GLR and various disease outcomes. Cai et al.13 demonstrated that elevated GLR in ICU septic patients was linked to in-hospital mortality, with a nonlinear association. Additionally, the research by Liu and Hu14 proposed that GLR could serve as a potential prognostic indicator for acute myocardial infarction, independent of diabetic status. These studies reinforce the clinical relevance of GLR in guiding decision-making, although the mechanisms linking GLR to mortality in MV critically ill patients remain unclear, with several possible pathways to consider.
First, GLR is a biomarker that reflects the combination of glucose and lymphocyte levels. Patients undergoing MV typically have higher levels of inflammatory cytokines such as tumor necrosis factor-a (TNF-a), interleukin-6 (IL-6), and C-reactive protein,15 all of which are associated with inflammation and impaired insulin signaling.16,17 For instance, IL-6 promotes gluconeogenesis in the liver,18,19 while TNF-a hinders insulin release and suppresses the tyrosine phosphorylation of insulin receptor substrate 1, which reduces glucose uptake and increases systemic insulin resistance.20 Furthermore, critical illness triggers activation of the adrenal axis, which can lead to stress-induced hyperglycemia,21 as glucocorticoids decrease the expression of glucose transporter type 4, worsening hyperglycemia.22,23 Additionally, previous studies have shown that hyperglycemia itself stimulate the production and release of IL-6,24,25 thus creating a feedback loop of “inflammation-hyperglycemia.”
Second, insulin is essential for regulating the metabolism and function of immune cells by influencing their energy use.26,27 Inflammation-induced insulin resistance encourages the polarization of T lymphocytes toward pro-inflammatory subsets (Th1, Th17),28 while also impairing B-cell responsiveness by reducing glucose uptake and antibody production, leading to widespread immune dysfunction.29 This immunotolerant state, in combination with hyperglycemia, contributes to significantly higher 28-day mortality in patients with elevated GLR levels.
The results suggest that XGBoost, as the most optimal interpretable ML model, underscores GLR as a key factor affecting mortality. Through the use of SHAP values to interpret the model, the top three factors influencing mortality were identified as APS III, GLR, and age. When GLR was excluded from the model, similar to findings in other studies, APS III was still found to have the most considerable impact on adverse outcomes. For example, a retrospective study by Fan and Ma30 comparing five scoring systems found that APS III had the highest predictive value for in-hospital mortality in patients with sepsis-related acute respiratory failure. Another study that developed a predictive model to assess the impact of noninvasive ventilation on intubation rates also identified both APS III and age as risk factors that increased the likelihood of intubation and mortality.31 Other studies have similarly shown that elevated WBC levels and lower PaO2/FiO2 ratios are linked to an increased risk of in-hospital mortality in various diseases.32-34
This study is the first to examine the association between GLR and adverse outcomes in MV patients using the most recent version of the MIMIC database. However, there are several limitations. First, as a retrospective study, there is a risk of information bias. Second, the study did not account for dynamic changes in GLR, which could enhance the overall effectiveness and accuracy. Finally, because the MIMIC database is based on data from a single medical institution, the lack of external validation restricts the generalizability of the results. Additional multicenter, large-scale, and dynamic studies are required to validate these findings.
This retrospective analysis, based on the MIMIC-IV database, demonstrates a significant association between GLR and 28-day mortality in critically ill patients undergoing MV. By utilizing interpretable ML techniques, we developed a practical mortality risk prediction model, offering new tools and insights for improving the management and prognosis of MV patients.
Ethics Committee Approval: Not applicable.
Informed Consent: Not applicable.
Data Sharing Statement: The data that support the findings of this study are available from the corresponding author upon reasonable request.
Authorship Contributions: Concept- M.Z., D.W., J.P.; Design- J.H., J.P.; Supervision- D.W., J.H., J.P.; Funding- D.W., J.H., J.P.; Materials- J.P.; Data Collection or Processing- J.P.; Analysis and/or Interpretation- M.Z., J.P.; Literature Review- M.Z., D.W., J.P.; Writing- M.Z., D.W., J.H., J.P.; Critical Review- Y D.W., J.H., J.P.
Conflict of Interest: The authors declare that they have no conflict of interest.
Funding: This article was supported by the National Natural Science Foundation of China (Grant no.82270091) and the First batch of key Disciplines on Public Health in Chongqing (NO.2022-71).
Supplementary Tables 1, 2: https://balkanmedicaljournal.org/img/files/49-SUPPLEMENTARY-TABLE-1-2..pdf
Supplementary Figures 1-5: https://balkanmedicaljournal.org/img/files/49-SUPPLEMENTARY-F%C4%B0GURE-1-5..pdf