Dataset Information

Sampling Variation of RAD-Seq Data from Diploid and Tetraploid Potato (Solanum tuberosum L.).

ABSTRACT: The new sequencing technology enables identification of genome-wide sequence-based variants at a population level and a competitively low cost. The sequence variant-based molecular markers have motivated enormous interest in population and quantitative genetic analyses. Generation of the sequence data involves a sophisticated experimental process embedded with rich non-biological variation. Statistically, the sequencing process indeed involves sampling DNA fragments from an individual sequence. Adequate knowledge of sampling variation of the sequence data generation is one of the key statistical properties for any downstream analysis of the data and for implementing statistically appropriate methods. This paper reports a thorough investigation on modeling the sampling variation of the sequence data from the optimized RAD-seq (Restriction sit associated DNA sequencing) experiments with two parents and their offspring of diploid and autotetraploid potato (Solanum tuberosum L.). The analysis shows significant dispersion in sampling variation of the sequence data over that expected under multinomial distribution as widely assumed in the literature and provides statistical methods for modeling the variation and calculating the model parameters, which may be easily implemented in real sequence datasets. The optimized design of RAD-seq experiments enabled effective control of presentation of undesirable chloroplast DNA and RNA genes in the sequence data generated.

SUBMITTER: Dang Z

PROVIDER: S-EPMC7915145 | biostudies-literature | 2021 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sampling Variation of RAD-Seq Data from Diploid and Tetraploid Potato (<i>Solanum tuberosum</i> L.).

Dang Zhenyu Z Yang Jixuan J Wang Lin L Tao Qin Q Zhang Fengjun F Zhang Yuxin Y Luo Zewei Z

Plants (Basel, Switzerland) 20210207 2

The new sequencing technology enables identification of genome-wide sequence-based variants at a population level and a competitively low cost. The sequence variant-based molecular markers have motivated enormous interest in population and quantitative genetic analyses. Generation of the sequence data involves a sophisticated experimental process embedded with rich non-biological variation. Statistically, the sequencing process indeed involves sampling DNA fragments from an individual sequence. ...[more]

PMID: 33562246

Similar Datasets

Project description:Potato breeding must improve its efficiency by increasing the reliability of selection as well as identifying a promising germplasm for crossing. This study shows the prediction accuracy of genomic-estimated breeding values for several potato (Solanum tuberosum L.) breeding clones and the released cultivars that were evaluated at three locations in northern and southern Sweden for various traits. Three dosages of marker alleles [pseudo-diploid (A), additive tetrasomic polyploidy (B), and additive-non-additive tetrasomic polyploidy (C)] were considered in the genome-based prediction models, for single environments and multiple environments (accounting for the genotype-by-environment interaction or G × E), and for comparing two kernels, the conventional linear, Genomic Best Linear Unbiased Prediction (GBLUP) (GB), and the non-linear Gaussian kernel (GK), when used with the single-kernel genetic matrices of A, B, C, or when employing two-kernel genetic matrices in the model using the kernels from B and C for a single environment (models 1 and 2, respectively), and for multi-environments (models 3 and 4, respectively). Concerning the single site analyses, the trait with the highest prediction accuracy for all sites under A, B, C for model 1, model 2, and for GB and GK methods was tuber starch percentage. Another trait with relatively high prediction accuracy was the total tuber weight. Results show an increase in prediction accuracy of model 2 over model 1. Non-linear Gaussian kernel (GK) did not show any clear advantage over the linear kernel GBLUP (GB). Results from the multi-environments had prediction accuracy estimates (models 3 and 4) higher than those obtained from the single-environment analyses. Model 4 with GB was the best method in combination with the marker structure B for predicting most of the tuber traits. Most of the traits gave relatively high prediction accuracy under this combination of marker structure (A, B, C, and B-C), and methods GB and GK combined with the multi-environment with G × E model.

Project description:BackgroundInvertases are ubiquitous enzymes that irreversibly cleave sucrose into fructose and glucose. Plant invertases play important roles in carbohydrate metabolism, plant development, and biotic and abiotic stress responses. In potato (Solanum tuberosum), invertases are involved in 'cold-induced sweetening' of tubers, an adaptive response to cold stress, which negatively affects the quality of potato chips and French fries. Linkage and association studies have identified quantitative trait loci (QTL) for tuber sugar content and chip quality that colocalize with three independent potato invertase loci, which together encode five invertase genes. The role of natural allelic variation of these genes in controlling the variation of tuber sugar content in different genotypes is unknown.ResultsFor functional studies on natural variants of five potato invertase genes we cloned and sequenced 193 full-length cDNAs from six heterozygous individuals (three tetraploid and three diploid). Eleven, thirteen, ten, twelve and nine different cDNA alleles were obtained for the genes Pain-1, InvGE, InvGF, InvCD141 and InvCD111, respectively. Allelic cDNA sequences differed from each other by 4 to 9%, and most were genotype specific. Additional variation was identified by single nucleotide polymorphism (SNP) analysis in an association-mapping population of 219 tetraploid individuals. Haplotype modeling revealed two to three major haplotypes besides a larger number of minor frequency haplotypes. cDNA alleles associated with chip quality, tuber starch content and starch yield were identified.ConclusionsVery high natural allelic variation was uncovered in a set of five potato invertase genes. This variability is a consequence of the cultivated potato's reproductive biology. Some of the structural variation found might underlie functional variation that influences important agronomic traits such as tuber sugar content. The associations found between specific invertase alleles and chip quality, tuber starch content and starch yield will facilitate the selection of superior potato genotypes in breeding programs.

Dataset Information

Sampling Variation of RAD-Seq Data from Diploid and Tetraploid Potato (Solanum tuberosum L.).

Publications

Sampling Variation of RAD-Seq Data from Diploid and Tetraploid Potato (<i>Solanum tuberosum</i> L.).

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets