Medical big data holds immense potential for enhancing healthcare quality and advancing medical research. However, cross-center sharing of medical data, essential for constructing large and diverse datasets, raises privacy concerns and the risk of personal information misuse. Several methods have been developed to address this problem. De-identification methods are prone to re-identification risks, and differential privacy often compromises data utility by introducing noise. In regions with strict data-sharing regulations, federated learning has been proposed as a potential solution, enabling collaborative model training without sharing raw data. However, it remains vulnerable to privacy leakage from model updates or the final model. Therefore, achieving safe and efficient medical data sharing remains an urgent issue.
To address these challenges, Professor Zhou's team developed CoLDiT, a conditional latent diffusion model with a diffusion transformer (DiT) backbone, capable of generating high-resolution breast ultrasound images conditioned on BI-RADS categories (BI-RADS 3, 4a, 4b, 4c, and 5) (see Figure 1). The training set for CoLDiT comprised 9,705 breast ultrasound images from 5,243 patients across 202 hospitals, utilizing various ultrasound vendors to ensure data diversity and comprehensiveness.
To validate privacy protection during image generation, the team conducted nearest neighbor analysis, confirming that CoLDiT-generated images did not replicate any images from the training set, thus safeguarding patient privacy. For quality assessment, they invited radiologists to evaluate the realism and BI-RADS classification of CoLDiT-generated images. In the realism evaluation, except for one senior radiologist with an AUC score greater than 0.7, the other five radiologists achieved AUCs ranging between 0.53 and 0.63 (see Figure 2B). Figure 3 presents examples of real and CoLDiT-generated breast ultrasound images that were labeled oppositely by at least four out of six readers. Furthermore, the overall performance of BI-RADS classification on synthetic images was comparable to that on real images for all three radiologists, with two even surpassing their performance on real images (see Figure 2C).
Additionally, the study utilized the synthetic breast ultrasound images for data augmentation in a BI-RADS classification model. The results indicated that after replacing half of the real data in the training set with synthetic data, the model's performance remained comparable to the model trained exclusively with real data (P = 0.81) (see Figure 4).
This study offers several advantages over prior works. First, the use of a large, multicenter dataset ensured diverse data sources from 202 hospitals, encompassing different vendors and device grades. This allowed the model to capture a comprehensive range of variations inherent in real-world breast ultrasound images, leading to the generation of more realistic and precise synthetic images. Second, employing a pure transformer backbone instead of the traditional U-Net capitalizes on transformers' exceptional ability to capture long-range dependencies, enabling the model to generate more coherent and detailed images. Third, conditioning the image synthesis on BI-RADS labels allows for the generation of ultrasound images corresponding to specific BI-RADS categories. This is particularly valuable in medical contexts, where the ability to generate images tailored to specific clinical scenarios is crucial for accurate diagnosis and treatment planning.
Professor Zhou's team believes that synthetic data, as a privacy-protecting solution, will play a key role in the secure utilization of medical big data, accelerating progress in medical research and clinical applications, and ultimately enhancing the quality of medical services and patient health. In the future, the team plans to integrate generative artificial intelligence with more types of medical imaging data to verify its applicability in different medical scenarios.