Supervised deep learning embeddings for the prediction of cervical cancer diagnosis

PeerJ Comput Sci. 2018 May 14:4:e154. doi: 10.7717/peerj-cs.154. eCollection 2018.

Abstract

Cervical cancer remains a significant cause of mortality all around the world, even if it can be prevented and cured by removing affected tissues in early stages. Providing universal and efficient access to cervical screening programs is a challenge that requires identifying vulnerable individuals in the population, among other steps. In this work, we present a computationally automated strategy for predicting the outcome of the patient biopsy, given risk patterns from individual medical records. We propose a machine learning technique that allows a joint and fully supervised optimization of dimensionality reduction and classification models. We also build a model able to highlight relevant properties in the low dimensional space, to ease the classification of patients. We instantiated the proposed approach with deep learning architectures, and achieved accurate prediction results (top area under the curve AUC = 0.6875) which outperform previously developed methods, such as denoising autoencoders. Additionally, we explored some clinical findings from the embedding spaces, and we validated them through the medical literature, making them reliable for physicians and biomedical researchers.

Keywords: Artificial neural networks; Autoencoder; Binary classification; Biomedical informatics; Cervical cancer; Deep learning; Denoising autoencoder; Dimensionality reduction; Health informatics; Health-care informatics.

Grants and funding

This work was funded by the Project “NanoSTIMA: Macro-to-Nano Human Sensing: Towards Integrated Multimodal Health Monitoring and Analytics/NORTE-01-0145-FEDER-000016” financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF), and also by Fundacao para a Ciencia e a Tecnologia (FCT) within the PhD grant number SFRH/BD/93012/2013. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.