Detecting cancer during its early stages, that is, before it spreads to other parts of the body, almost always leads to better treatment outcomes and lower mortality rates. However, for people without good access to healthcare, such as those with low resources or who live in rural areas, timely diagnosis is rare. This is in great part due to the lack of simple, quick, and cost-effective diagnostic techniques for many types of cancer.
One promising approach to detecting malignancies early is using diffuse reflectance spectroscopy (DRS) for "optical biopsy" of suspicious tissue. In general, DRS-based measurements can be performed quickly using relatively inexpensive equipment. The idea is to analyze the target tissue's response to light over several frequencies and estimate key optical parameters. These include the absorption coefficient (μa) and the reduced extinction coefficient (μ′s), both of which tend to vary between tumors and healthy tissue.
Today, inverse Monte Carlo (MCI) simulations are considered the gold standard for analyzing DRS data and estimating the optical properties of tissue. Such numerical approaches are, unfortunately, computationally intensive. On the other hand, machine learning (ML)-based methods are a competitive alternative. Their main drawback is that they require a lot of training data, and simulated datasets are often used to simplify data collection steps. However, simulated datasets do not accurately reflect all types of errors caused by improper use of medical instruments (or "use-errors"), and thus ML-based DRS analysis techniques have low accuracy when applied to real measurements.
To address these issues, a research team led by Associate Professor Bing Yu of Marquette University and Medical College of Wisconsin, USA, developed a more robust ML model to analyze DRS data and predict μa and μ′s. Their work, published in the SPIE Journal of Biomedical Optics (JBO), could ease the way for more accessible tools for cancer diagnosis in resource-constrained settings.
The proposed model is a "wavelength-independent regressor" (WIR), which uses a novel set of features from the DRS data to achieve higher accuracies in the face of use-errors. To train this model, the researchers developed a comprehensive dataset comprising both simulated data and experimental measurements taken from 170 tissue phantoms.
More specifically, the simulated dataset was subdivided into seven smaller datasets. The first included "perfect data," without any artifacts caused by noise or use-error. In contrast, the second dataset included Gaussian noise, while the third and fourth represented effects of wavelength miscalibration, and the fifth and sixth contained intensity fluctuations similar to those caused by improper thermal management (or overheating). Finally, the seventh dataset contained all the errors at the same time. "This study includes the largest and most diverse experimental DRS dataset we are aware of, which has been used to train and validate a model for DRS optical property prediction," highlights Yu.
Using this comprehensive dataset, the researchers validated the proposed WIR model and compared it to the MCI approach. The results were very encouraging, to say the least. "When compounding all use-errors on simulated data, the WIR model balanced accuracy and speed best, yielding errors of only 1.75 percent for μa and 1.53 percent for μ′s, compared to the MCI's 50.9 percent for μa and 24.6 percent for μ′s. Regarding experimental data, the WIR model had mean errors of 13.2 percent and 6.1 percent for μa and μ′s, respectively, and the errors for MCI were about eight times higher," highlights Dr. Yu. "Thus, the WIR model offers reliable optical property predictions from DRS data that are robust to use-errors."
The proposed approach is less computationally intensive than other ML-based techniques and MCI models, making it an attractive option for clinical settings. Moreover, by taking use-errors into account, the WIR model can partially offset a lack of extensive training on medical equipment, which is commonplace among clinicians in resource-constrained regions.
Hopefully, the method developed in this study will help with early diagnosis of cancer and other diseases, saving lives and reducing healthcare costs.
For details, read the original Gold Open Access article by Scarbrough, Chen, and Yu, "Designing a use-error robust machine learning model for quantitative analysis of diffuse reflectance spectra," J. Biomed. Opt. 29(1), 015001 (2024), doi 10.1117/1.JBO.29.1.015001.