Comparative Analysis of Machine Learning Models for LDL Cholesterol Estimation
DOI:
https://doi.org/10.24002/konstelasi.v5i2.13305Keywords:
Lipids, LDL cholesterol, Machine learning, TriglycerideAbstract
Accurate estimation of low-density lipoprotein cholesterol (LDL-C) is essential for cardiovascular risk assessment and treatment decision-making. Traditional formula-based LDL-C estimations, such as Friedewald, Sampson, and Martin equations, show decreasing accuracy at higher triglyceride (TG) levels. This study compares nine machine learning (ML) models against conventional formulas using a large dataset of 120,174 subjects. After data preprocessing and feature selection, four predictors (TC, TG, HDL-C, and age) were used to train ML models with 5-fold cross-validation. Among all models, Light Gradient Boosting Machine (LightGBM) demonstrated the best performance, achieving R² = 0.8749, MSE = 204.53 mg²/dL², and PCC = 0.935 on the internal test set. Similar superiority was observed in the external validation cohort (n = 10,183), particularly in hypertriglyceridemic ranges (TG ≥ 200 mg/dL), where classical equations showed substantial performance degradation. Machine learning models, especially ensemble-based approaches, maintain robust predictive ability across TG strata and significantly reduce error around clinically relevant LDL-C thresholds (70, 100, and 130 mg/dL). These findings support the integration of ML-assisted LDL-C estimation into routine laboratory workflows and highlight its potential contribution to clinical decision support.








