October 30, 2020 Conference


Deep transfer learning classification algorithms built on fundus photos generalize with variable accuracy across devices

Dr. John Miller, Mass Eye and Ear Infirmary (Presenter)
Kun-Hsing Yu, Harvard Medical School
Ashley Kras
Purpose: As deep learning applications to ophthalmology imaging increase in clinical relevance and deployment, there is limited knowledge regarding generalizability of algorithm performance across different platforms. This study set out to examine the accuracy of a fundus photo classifier built on one device dataset can be replicated in a second dataset captured on a different device. Method: 25,000 high quality fundus photos were manually selected from the UK Biobank (UKBB) (Topcon 3D OCT-1000, field angle 45°). A simple deep transfer learning model based on VGG architecture was built to classify images into right vs left eyes. This untouched algorithm was then validated on 2 smaller samples (n=430) of the fundus photos (Optos® California, field angle 200°) from Mass. Eye and Ear Infirmary (MEEI); the first sample was cropped to the posterior pole (MEEI-a) to approximate the region captured by the UKBB sample and the second same (same images) was cropped to the circular fundus edge (MEEI-b). The same process was then repeated in reverse; a model constructed on MEEI images was deployed on UKBB images. Results: The UKBB laterality classification model (LCM) achieved AUROC 0.997. When evaluated on dataset MEEI-a and MEEI-b, the resulting AUROC’s were 0.944 and 0.778 respectively. The LCM subsequently built on MEEI-a achieved AUROC 0.991. When evaluated on MEEI-b and UKBB datasets, performance dropped to AUROC’s of 0.545 and 0.713 respectively. Conclusion: Simple and accurate algorithms generalize variably across devices and scanning protocols. We expect to see similar limitations in other forms of multimodal imaging, including OCT, AF, and OCT-A. This finding highlights the importance of validation studies prior to clinical deployment.
Presentation Video