Dataset Information

An application of kernel methods to variety identification based on SSR markers genetic fingerprinting.

ABSTRACT:

Background

In crop production systems, genetic markers are increasingly used to distinguish individuals within a larger population based on their genetic make-up. Supervised approaches cannot be applied directly to genotyping data due to the specific nature of those data which are neither continuous, nor nominal, nor ordinal but only partially ordered. Therefore, a strategy is needed to encode the polymorphism between samples such that known supervised approaches can be applied. Moreover, finding a minimal set of molecular markers that have optimal ability to discriminate, for example, between given groups of varieties, is important as the genotyping process can be costly in terms of laboratory consumables, labor, and time. This feature selection problem also needs special care due to the specific nature of the data used.

Results

An approach encoding SSR polymorphisms in a positive definite kernel is presented, which then allows the usage of any kernel supervised method. The polymorphism between the samples is encoded through the Nei-Li genetic distance, which is shown to define a positive definite kernel between the genotyped samples. Additionally, a greedy feature selection algorithm for selecting SSR marker kits is presented to build economical and efficient prediction models for discrimination. The algorithm is a filter method and outperforms other filter methods adapted to this setting. When combined with kernel linear discriminant analysis or kernel principal component analysis followed by linear discriminant analysis, the approach leads to very satisfactory prediction models.

Conclusions

The main advantage of the approach is to benefit from a flexible way to encode polymorphisms in a kernel and when combined with a feature selection algorithm resulting in a few specific markers, it leads to accurate and economical identification models based on SSR genotyping.

SUBMITTER: Martin F

PROVIDER: S-EPMC3128031 | biostudies-literature | 2011 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An application of kernel methods to variety identification based on SSR markers genetic fingerprinting.

Martin Florian F

BMC bioinformatics 20110520

<h4>Background</h4>In crop production systems, genetic markers are increasingly used to distinguish individuals within a larger population based on their genetic make-up. Supervised approaches cannot be applied directly to genotyping data due to the specific nature of those data which are neither continuous, nor nominal, nor ordinal but only partially ordered. Therefore, a strategy is needed to encode the polymorphism between samples such that known supervised approaches can be applied. Moreover ...[more]

PMID: 21595989

Similar Datasets

Project description:Melon is an important horticultural crop with a pleasant aromatic flavor and abundance of health-promoting substances. Numerous melon varieties have been cultivated worldwide in recent years, but the high number of varieties and the high similarity between them poses a major challenge for variety evaluation, discrimination, as well as innovation in breeding. Recently, simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs), two robust molecular markers, have been utilized as a rapid and reliable method for variety identification. To elucidate the genetic structure and diversity of melon varieties, we screened out 136 perfect SSRs and 164 perfect SNPs from the resequencing data of 149 accessions, including the most representative lines worldwide. This study established the DNA fingerprint of 259 widely-cultivated melon varieties in China using Target-seq technology. All melon varieties were classified into five subgruops, including ssp. agrestis, ssp. melo, muskmelon and two subgroups of foreign individuals. Compared with ssp. melo, the ssp. agrestis varieties might be exposed to a high risk of genetic erosion due to their extremely narrow genetic background. Increasing the gene exchange between ssp. melo and ssp. agrestis is therefore necessary in the breeding procedure. In addition, analysis of the DNA fingerprints of the 259 melon varieties showed a good linear correlation (R2 = 0.9722) between the SSR genotyping and SNP genotyping methods in variety identification. The pedigree analysis based on the DNA fingerprint of 'Jingyu' and 'Jingmi' series melon varieties was consistent with their breeding history. Based on the SNP index analysis, ssp. agrestis had low gene exchange with ssp. melo in chromosome 4, 7, 10, 11and 12, two specific SNP loci were verified to distinguish ssp. agrestis and ssp. melon varieties. Finally, 23 SSRs and 40 SNPs were selected as the core sets of markers for application in variety identification, which could be efficiently applied to variety authentication, variety monitoring, as well as the protection of intellectual property rights in melon.

Project description:Olive is an ancient oil-producing tree, widely cultivated in Mediterranean countries, and now spread to other areas of the world, including China. Recently, several molecular databases were constructed in different countries and platforms for olive identification using simple sequence repeats (SSRs) or single-nucleotide polymorphisms (SNPs). However, comparing their results across laboratories was difficult. Herein, hundreds of polymorphic single-copy nuclear sequence markers were developed from the olive genome. Using the advantage of multiplex PCR amplification and high-throughput sequencing, a fingerprint database was constructed for the majority of olives cultivated in China. We used 100 high-quality sequence loci and estimated the genetic diversity and structure among all these varieties. We found that compared with that based on SSRs, the constructed fingerprint database based on these 100 sequences or a few of them, could provide a reliable olive variety identification platform in China, with high discrimination among different varieties using the principle of BLAST algorithm. An example of such identification platform based on this study was displayed on the web for the olive database in China (http://olivedb.cn/jianding). After resolving redundant genotypes, we identified 126 olive varieties with distinct genotypes in China. These varieties could be divided into two clusters, and it was revealed that the grouping of the varieties has a certain relationship with their origin. Herein, it is concluded that these single-copy orthologous nuclear sequences could be used to construct a universal fingerprint database of olives across different laboratories and platforms inexpensively. Based on such a database, variety identification can be performed easily by any laboratory, which would further facilitate olive breeding and variety exchange globally.Supplementary informationThe online version contains supplementary material available at 10.1007/s11032-023-01434-9.

Dataset Information

An application of kernel methods to variety identification based on SSR markers genetic fingerprinting.

Background

Results

Conclusions

Publications

An application of kernel methods to variety identification based on SSR markers genetic fingerprinting.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets