Header

UZH-Logo

Maintenance Infos

Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists


Haenssle, H A; Fink, C; Schneiderbauer, R; Toberer, F; Buhl, T; Blum, A; Kalloo, A; Hassen, A Ben Hadj; Thomas, L; Enk, A; Uhlmann, L; Reader study level-I and level-II Groups (2018). Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology, 29(8):1836-1842.

Abstract

Background
Deep learning convolutional neural networks (CNN) may facilitate melanoma detection, but data comparing a CNN's diagnostic performance to larger groups of dermatologists are lacking.

Methods
Google's Inception v4 CNN architecture was trained and validated using dermoscopic images and corresponding diagnoses. In a comparative cross-sectional reader study a 100-image test-set was used (level-I: dermoscopy only; level-II: dermoscopy plus clinical information and images). Main outcome measures were sensitivity, specificity and area under the curve (AUC) of receiver operating characteristics (ROC) for diagnostic classification (dichotomous) of lesions by the CNN versus an international group of 58 dermatologists during level-I or -II of the reader study. Secondary end points included the dermatologists' diagnostic performance in their management decisions and differences in the diagnostic performance of dermatologists during level-I and -II of the reader study. Additionally, the CNN's performance was compared with the top-five algorithms of the 2016 International Symposium on Biomedical Imaging (ISBI) challenge.

Results
In level-I dermatologists achieved a mean (±standard deviation) sensitivity and specificity for lesion classification of 86.6% (±9.3%) and 71.3% (±11.2%), respectively. More clinical information (level-II) improved the sensitivity to 88.9% (±9.6%, P = 0.19) and specificity to 75.7% (±11.7%, P < 0.05). The CNN ROC curve revealed a higher specificity of 82.5% when compared with dermatologists in level-I (71.3%, P < 0.01) and level-II (75.7%, P < 0.01) at their sensitivities of 86.6% and 88.9%, respectively. The CNN ROC AUC was greater than the mean ROC area of dermatologists (0.86 versus 0.79, P < 0.01). The CNN scored results close to the top three algorithms of the ISBI 2016 challenge.

Conclusions
For the first time we compared a CNN's diagnostic performance with a large international group of 58 dermatologists, including 30 experts. Most dermatologists were outperformed by the CNN. Irrespective of any physicians' experience, they may benefit from assistance by a CNN's image classification.

Clinical trial number
This study was registered at the German Clinical Trial Register (DRKS-Study-ID: DRKS00013570; https://www.drks.de/drks_web/).

Abstract

Background
Deep learning convolutional neural networks (CNN) may facilitate melanoma detection, but data comparing a CNN's diagnostic performance to larger groups of dermatologists are lacking.

Methods
Google's Inception v4 CNN architecture was trained and validated using dermoscopic images and corresponding diagnoses. In a comparative cross-sectional reader study a 100-image test-set was used (level-I: dermoscopy only; level-II: dermoscopy plus clinical information and images). Main outcome measures were sensitivity, specificity and area under the curve (AUC) of receiver operating characteristics (ROC) for diagnostic classification (dichotomous) of lesions by the CNN versus an international group of 58 dermatologists during level-I or -II of the reader study. Secondary end points included the dermatologists' diagnostic performance in their management decisions and differences in the diagnostic performance of dermatologists during level-I and -II of the reader study. Additionally, the CNN's performance was compared with the top-five algorithms of the 2016 International Symposium on Biomedical Imaging (ISBI) challenge.

Results
In level-I dermatologists achieved a mean (±standard deviation) sensitivity and specificity for lesion classification of 86.6% (±9.3%) and 71.3% (±11.2%), respectively. More clinical information (level-II) improved the sensitivity to 88.9% (±9.6%, P = 0.19) and specificity to 75.7% (±11.7%, P < 0.05). The CNN ROC curve revealed a higher specificity of 82.5% when compared with dermatologists in level-I (71.3%, P < 0.01) and level-II (75.7%, P < 0.01) at their sensitivities of 86.6% and 88.9%, respectively. The CNN ROC AUC was greater than the mean ROC area of dermatologists (0.86 versus 0.79, P < 0.01). The CNN scored results close to the top three algorithms of the ISBI 2016 challenge.

Conclusions
For the first time we compared a CNN's diagnostic performance with a large international group of 58 dermatologists, including 30 experts. Most dermatologists were outperformed by the CNN. Irrespective of any physicians' experience, they may benefit from assistance by a CNN's image classification.

Clinical trial number
This study was registered at the German Clinical Trial Register (DRKS-Study-ID: DRKS00013570; https://www.drks.de/drks_web/).

Statistics

Citations

Dimensions.ai Metrics
274 citations in Web of Science®
373 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

3 downloads since deposited on 04 Jan 2019
2 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:04 Faculty of Medicine > University Hospital Zurich > Dermatology Clinic
Dewey Decimal Classification:610 Medicine & health
Scopus Subject Areas:Health Sciences > Hematology
Health Sciences > Oncology
Language:English
Date:1 August 2018
Deposited On:04 Jan 2019 11:41
Last Modified:29 Jul 2020 08:51
Publisher:Oxford University Press
ISSN:0923-7534
OA Status:Closed
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1093/annonc/mdy166
PubMed ID:29846502

Download

Closed Access: Download allowed only for UZH members