Researchers from Peking University have conducted a comprehensive systematic review on the integration of machine learning into statistical methods for disease risk prediction models, shedding light on the potential of such integrated models in clinical diagnosis and screening practices. The study, led by Professor Feng Sun from the Department of Epidemiology and Biostatistics, School of Public Health, Peking University, has been published in Health Data Science.
Disease risk prediction is crucial for early diagnosis and effective clinical decision-making. However, traditional statistical models, such as logistic regression and Cox proportional hazards regression, often face limitations due to underlying assumptions that may not always hold in practice. Meanwhile, machine learning methods, despite their flexibility and ability to handle complex and unstructured data, have not consistently demonstrated superior performance over traditional models in certain scenarios. To address these challenges, integrating machine learning with traditional statistical methods may offer more robust and accurate prediction models.
The systematic review analyzed various integration strategies for classification and regression models, including majority voting, weighted voting, stacking, and model selection, based on whether predictions from statistical methods and machine learning disagreed. The study found that integration models generally outperformed both statistical and machine learning methods when used alone. For example, stacking was particularly effective for models involving over 100 predictors, as it allows for the combination of the strengths of different models while minimizing weaknesses.
"Our findings suggest that integrating machine learning into traditional statistical methods can provide more accurate and generalizable models for disease risk prediction," said Professor Feng Sun, the lead researcher. "This approach has the potential to enhance clinical decision-making and improve patient outcomes."
Looking ahead, the research team plans to validate and improve existing integration methods further and develop comprehensive tools for evaluating these models in various clinical settings. The ultimate goal is to establish more efficient and generalizable integration models tailored to different scenarios, ultimately advancing clinical diagnosis and screening practices.