Dataset Information

Creating artificial human genomes using generative neural networks.

ABSTRACT: Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss. We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by data augmentation to reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and features for solving supervised tasks. Generative models and AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases.

SUBMITTER: Yelmen B

PROVIDER: S-EPMC7861435 | biostudies-literature | 2021 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Creating artificial human genomes using generative neural networks.

Yelmen Burak B Decelle Aurélien A Ongaro Linda L Marnetto Davide D Tallec Corentin C Montinaro Francesco F Furtlehner Cyril C Pagani Luca L Jay Flora F

PLoS genetics 20210204 2

Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they ...[more]

PMID: 33539374

Similar Datasets

Project description:BackgroundCraniosynostosis, a congenital condition characterized by the premature fusion of cranial sutures, necessitates objective methods for evaluating cranial morphology to enhance patient treatment. Current subjective assessments often lead to inconsistent outcomes. This study introduces a novel, quantitative approach to classify craniosynostosis and measure its severity.MethodsAn artificial neural network was trained to classify normocephalic, trigonocephalic, and scaphocephalic head shapes based on a publicly available dataset of synthetic 3D head models. Each 3D model was converted into a low-dimensional shape representation based on the distribution of normal vectors, which served as the input for the neural network, ensuring complete patient anonymity and invariance to geometric size and orientation. Explainable AI methods were utilized to highlight significant features when making predictions. Additionally, the Feature Prominence (FP) score was introduced, a novel metric that captures the prominence of distinct shape characteristics associated with a given class. Its relationship with clinical severity scores was examined using the Spearman Rank Correlation Coefficient.ResultsThe final model achieved excellent test accuracy in classifying the different cranial shapes from their low-dimensional representation. Attention maps indicated that the network's attention was predominantly directed toward the parietal and temporal regions, as well as toward the region signifying vertex depression in scaphocephaly. In trigonocephaly, features around the temples were most pronounced. The FP score showed a strong positive monotonic relationship with clinical severity scores in both scaphocephalic (ρ = 0.83, p < 0.001) and trigonocephalic (ρ = 0.64, p < 0.001) models. Visual assessments further confirmed that as FP values rose, phenotypic severity became increasingly evident.ConclusionThis study presents an innovative and accessible AI-based method for quantifying cranial shape that mitigates the need for adjustments due to age-specific size variations or differences in the spatial orientation of the 3D images, while ensuring complete patient privacy. The proposed FP score strongly correlates with clinical severity scores and has the potential to aid in clinical decision-making and facilitate multi-center collaborations. Future work will focus on validating the model with larger patient datasets and exploring the potential of the FP score for broader applications. The publicly available source code facilitates easy implementation, aiming to advance craniofacial care and research.

Dataset Information

Creating artificial human genomes using generative neural networks.

Publications

Creating artificial human genomes using generative neural networks.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets