Innumerable medical images are generated in routine diagnostics every day, but only a tiny fraction of them is accessible to train machine learning models. Oftentimes, privacy concerns are cited as a hurdle to share medical data. Nevertheless, robust diagnostic AI has to be trained on large and diverse data sets.
Synthetic data mimics real data in all its properties without jeopardizing patient privacy. In our new paper, we generated synthetic microscopy images of bone marrow smears from patients with acute leukemias.
Our synthetic images were indistinguishable from real images even for the trained eyes of eight hematologists.
We trained image-based leukemia detection models with varying proportions of real and synthetic data and found that both can be used interchangeably yielding an area-under-the-curve of above 0.95 even with minimal real data or fully synthetic training sets.
Our findings support the use of synthetic data to overcome data sharing hurdles in medicine and make the wealth of medical data accessible for machine learning without compromising patient privacy.
Check out our new paper in Nature Portfolio’s npj digital medicine:
https://www.nature.com/articles/s41746-025-01563-9
Dresden, Germany
Synthetic Microscopy Image Data Augments Leukemia Detection Models
Current Projects
Contact Us
contact@ai-in-cancer.org
Understanding Blood Cancer Biology with Unsupervised Learning
Personalized antineoplastic treatment is a crucial advancement in cancer…
Medical Data Generation with Adversarial Networks
The term ‚big data‘ is increasingly used as a…
Predicting Treatment Response with Supervised Learning
Personalized antineoplastic treatment is a crucial advancement in cancer…
A New ‚Foundation‘ for AI Models in Healthcare
A common challenge in the age of ‚big data‘…
Diagnosis of Blood Cancer with Computer Vision
As of now, the evaluation of bone marrow smears…