Dataset Information

Using autoencoders as a weight initialization method on deep neural networks for disease detection.

ABSTRACT: BACKGROUND:As of today, cancer is still one of the most prevalent and high-mortality diseases, summing more than 9 million deaths in 2018. This has motivated researchers to study the application of machine learning-based solutions for cancer detection to accelerate its diagnosis and help its prevention. Among several approaches, one is to automatically classify tumor samples through their gene expression analysis. METHODS:In this work, we aim to distinguish five different types of cancer through RNA-Seq datasets: thyroid, skin, stomach, breast, and lung. To do so, we have adopted a previously described methodology, with which we compare the performance of 3 different autoencoders (AEs) used as a deep neural network weight initialization technique. Our experiments consist in assessing two different approaches when training the classification model - fixing the weights after pre-training the AEs, or allowing fine-tuning of the entire network - and two different strategies for embedding the AEs into the classification network, namely by only importing the encoding layers, or by inserting the complete AE. We then study how varying the number of layers in the first strategy, the AEs latent vector dimension, and the imputation technique in the data preprocessing step impacts the network's overall classification performance. Finally, with the goal of assessing how well does this pipeline generalize, we apply the same methodology to two additional datasets that include features extracted from images of malaria thin blood smears, and breast masses cell nuclei. We also discard the possibility of overfitting by using held-out test sets in the images datasets. RESULTS:The methodology attained good overall results for both RNA-Seq and image extracted data. We outperformed the established baseline for all the considered datasets, achieving an average F1 score of 99.03, 89.95, and 98.84 and an MCC of 0.99, 0.84, and 0.98, for the RNA-Seq (when detecting thyroid cancer), the Malaria, and the Wisconsin Breast Cancer data, respectively. CONCLUSIONS:We observed that the approach of fine-tuning the weights of the top layers imported from the AE reached higher results, for all the presented experiences, and all the considered datasets. We outperformed all the previous reported results when comparing to the established baselines.

SUBMITTER: Ferreira MF

PROVIDER: S-EPMC7439655 | biostudies-literature | 2020 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Using autoencoders as a weight initialization method on deep neural networks for disease detection.

Ferreira Mafalda Falcão MF Camacho Rui R Teixeira Luís F LF

BMC medical informatics and decision making 20200820 Suppl 5

<h4>Background</h4>As of today, cancer is still one of the most prevalent and high-mortality diseases, summing more than 9 million deaths in 2018. This has motivated researchers to study the application of machine learning-based solutions for cancer detection to accelerate its diagnosis and help its prevention. Among several approaches, one is to automatically classify tumor samples through their gene expression analysis.<h4>Methods</h4>In this work, we aim to distinguish five different types of ...[more]

PMID: 32819347

Similar Datasets

Project description:Early identification of individuals at risk of developing Alzheimer's disease (AD) dementia is important for developing disease-modifying therapies. In this study, given multimodal AD markers and clinical diagnosis of an individual from one or more timepoints, we seek to predict the clinical diagnosis, cognition and ventricular volume of the individual for every month (indefinitely) into the future. We proposed and applied a minimal recurrent neural network (minimalRNN) model to data from The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) challenge, comprising longitudinal data of 1677 participants (Marinescu et al., 2018) from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We compared the performance of the minimalRNN model and four baseline algorithms up to 6 years into the future. Most previous work on predicting AD progression ignore the issue of missing data, which is a prevalent issue in longitudinal data. Here, we explored three different strategies to handle missing data. Two of the strategies treated the missing data as a "preprocessing" issue, by imputing the missing data using the previous timepoint ("forward filling") or linear interpolation ("linear filling). The third strategy utilized the minimalRNN model itself to fill in the missing data both during training and testing ("model filling"). Our analyses suggest that the minimalRNN with "model filling" compared favorably with baseline algorithms, including support vector machine/regression, linear state space (LSS) model, and long short-term memory (LSTM) model. Importantly, although the training procedure utilized longitudinal data, we found that the trained minimalRNN model exhibited similar performance, when using only 1 input timepoint or 4 input timepoints, suggesting that our approach might work well with just cross-sectional data. An earlier version of our approach was ranked 5th (out of 53 entries) in the TADPOLE challenge in 2019. The current approach is ranked 2nd out of 63 entries as of June 3rd, 2020.

Project description:Objectives: The aim of this study is to develop a scar detection method for routine computed tomography angiography (CTA) imaging using deep convolutional neural networks (CNN), which relies solely on anatomical information as input and is compatible with existing clinical workflows. Background: Identifying cardiac patients with scar tissue is important for assisting diagnosis and guiding interventions. Late gadolinium enhancement (LGE) magnetic resonance imaging (MRI) is the gold standard for scar imaging; however, there are common instances where it is contraindicated. CTA is an alternative imaging modality that has fewer contraindications and is faster than Cardiovascular magnetic resonance imaging but is unable to reliably image scar. Methods: A dataset of LGE MRI (200 patients, 83 with scar) was used to train and validate a CNN to detect ischemic scar slices using segmentation masks as input to the network. MRIs were segmented to produce 3D left ventricle meshes, which were sampled at points along the short axis to extract anatomical masks, with scar labels from LGE as ground truth. The trained CNN was tested with an independent CTA dataset (25 patients, with ground truth established with paired LGE MRI). Automated segmentation was performed to provide the same input format of anatomical masks for the network. The CNN was compared against manual reading of the CTA dataset by 3 experts. Results: Note that 84.7% cross-validated accuracy (AUC: 0.896) for detecting scar slices in the left ventricle on the MRI data was achieved. The trained network was tested against the CTA-derived data, with no further training, where it achieved an 88.3% accuracy (AUC: 0.901). The automated pipeline outperformed the manual reading by clinicians. Conclusion: Automatic ischemic scar detection can be performed from a routine cardiac CTA, without any scar-specific imaging or contrast agents. This requires only a single acquisition in the cardiac cycle. In a clinical setting, with near zero additional cost, scar presence could be detected to triage images, reduce reading times, and guide clinical decision-making.

Dataset Information

Using autoencoders as a weight initialization method on deep neural networks for disease detection.

Publications

Using autoencoders as a weight initialization method on deep neural networks for disease detection.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets