Dataset Information

A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize.

ABSTRACT: Analysis of population genetic variation and structure is a common practice for genome-wide studies, including association mapping, ecology, and evolution studies in several crop species. In this study, machine learning (ML) clustering methods, K-means (KM), and hierarchical clustering (HC), in combination with non-linear and linear dimensionality reduction techniques, deep autoencoder (DeepAE) and principal component analysis (PCA), were used to infer population structure and individual assignment of maize inbred lines, i.e., dent field corn (n = 97) and popcorn (n = 86). The results revealed that the HC method in combination with DeepAE-based data preprocessing (DeepAE-HC) was the most effective method to assign individuals to clusters (with 96% of correct individual assignments), whereas DeepAE-KM, PCA-HC, and PCA-KM were assigned correctly 92, 89, and 81% of the lines, respectively. These findings were consistent with both Silhouette Coefficient (SC) and Davies-Bouldin validation indexes. Notably, DeepAE-HC also had better accuracy than the Bayesian clustering method implemented in InStruct. The results of this study showed that deep learning (DL)-based dimensional reduction combined with ML clustering methods is a useful tool to determine genetically differentiated groups and to assign individuals into subpopulations in genome-wide studies without having to consider previous genetic assumptions.

SUBMITTER: Lopez-Cortes XA

PROVIDER: S-EPMC7732446 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize.

López-Cortés Xaviera Alejandra XA Matamala Felipe F Maldonado Carlos C Mora-Poblete Freddy F Scapim Carlos Alberto CA

Frontiers in genetics 20201124

Analysis of population genetic variation and structure is a common practice for genome-wide studies, including association mapping, ecology, and evolution studies in several crop species. In this study, machine learning (ML) clustering methods, K-means (KM), and hierarchical clustering (HC), in combination with non-linear and linear dimensionality reduction techniques, deep autoencoder (DeepAE) and principal component analysis (PCA), were used to infer population structure and individual assignm ...[more]

PMID: 33329691

Similar Datasets

Project description:Analyses of the genetic distance and composition of inbred lines are a prerequisite for parental selection and to exploit heterosis in plant breeding programs. The study aimed to assess genetic diversity and population structure of a maize germplasm panel comprising 182 founder lines and 866 derived inbred lines using Single Nucleotide Polymorphism (SNP) markers to identify genetically unique lines for hybrid breeding. The founder lines were genotyped with 1201 SNPs, and the derived lines with 1484 SNPs. Moderate genetic variation, with genetic diversity ranging from 0.004 to 0.44 with a mean of 0.25, was recorded for the founder lines, while corresponding values of 0.004 to 0.34 with a mean of 0.13 were recorded for the derived lines. Heterozygosity values ranging from 0.00 to 0.24 and a mean of 0.08 were recorded for both lines. Of the SNP markers used, 82% of the 1201 markers and 84% of the 1484 markers exhibited polymorphism information content ranging from 0.25 to 0.50. Analysis of molecular variance revealed significant genetic differences (P ≤ 0.001) among and within populations in the founder and derived lines. Most detected variations, i.e., 97% and 88.38%, were attributed to within populations in the founder and derived lines, respectively. Population structure analysis identified three distinct subpopulations among founder lines and two among derived lines. Cluster analysis supported the population structure The following genetically distant founder and derived inbred lines were selected: G15NL337 and G15NL312 (Cluster 1), 15ARG152 and RGS-PL44 (Cluster 2), RGS-PL44 and 15ARG149 (Cluster 2), and RGS-PL33 and RGS-PL44 (Cluster 2), respectively. The selected lines are genetically distinct and recommended for marker-assisted hybrid maize breeding to exploit the frequency of beneficial alleles. This study provides valuable insights for maize breeding programs, enabling the exploitation of beneficial alleles and contributing to improved crop yields and food security through hybrid breeding.

Project description:BackgroundMolecular characterization is important for efficient utilization of germplasm and development of improved varieties. In the present study, we investigated the genetic purity, relatedness and population structure of 265 maize inbred lines from the Ethiopian Institute of Agricultural Research (EIAR), the International Maize and Wheat Improvement Centre (CIMMYT) and the International Institute of Tropical Agriculture (IITA) using 220,878 single nucleotide polymorphic (SNP) markers obtained using genotyping by sequencing (GBS).ResultsOnly 22% of the inbred lines were considered pure with <5% heterogeneity, while the remaining 78% of the inbred lines had a heterogeneity ranging from 5.1 to 31.5%. Pairwise genetic distances among the 265 inbred lines varied from 0.011 to 0.345, with 89% of the pairs falling between 0.301 and 0.345. Only <1% of the pairs had a genetic distance lower than 0.200, which included 14 pairs of sister lines that were nearly identical. Relative kinship analysis showed that the kinship coefficients for 59% of the pairs of lines was close to zero, which agrees with the genetic distance estimates. Principal coordinate analysis, discriminant analysis of principal components (DAPC) and the model-based population structure analysis consistently suggested the presence of three groups, which generally agreed with pedigree information (genetic background). Although not distinct enough, the SNP markers showed some level of separation between the two CIMMYT heterotic groups A and B established based on pedigree and combining ability information.ConclusionsThe high level of heterogeneity detected in most of the inbred lines suggested the requirement for purification or further inbreeding except those deliberately maintained at early inbreeding level. The genetic distance and relative kinship analysis clearly indicated the uniqueness of most of the inbred lines in the maize germplasm available for breeders in the mid-altitude maize breeding program of Ethiopia. Results from the present study facilitate the maize breeding work in Ethiopia and germplasm exchange among breeding programs in Africa. We suggest the incorporation of high density molecular marker information in future heterotic group assignments.

Dataset Information

A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize.

Publications

A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets