Unknown

Dataset Information

0

Recombination spot identification Based on gapped k-mers.


ABSTRACT: Recombination is crucial for biological evolution, which provides many new combinations of genetic diversity. Accurate identification of recombination spots is useful for DNA function study. To improve the prediction accuracy, researchers have proposed several computational methods for recombination spot identification. The k-mer feature is one of the most useful features for modeling the properties and function of DNA sequences. However, it suffers from the inherent limitation. If the value of word length k is large, the occurrences of k-mers are closed to a binary variable, with a few k-mers present once and most k-mers are absent. This usually causes the sparse problem and reduces the classification accuracy. To solve this problem, we add gaps into k-mer and introduce a new feature called gapped k-mer (GKM) for identification of recombination spots. By using this feature, we present a new predictor called SVM-GKM, which combines the gapped k-mers and Support Vector Machine (SVM) for recombination spot identification. Experimental results on a widely used benchmark dataset show that SVM-GKM outperforms other highly related predictors. Therefore, SVM-GKM would be a powerful predictor for computational genomics.

SUBMITTER: Wang R 

PROVIDER: S-EPMC4814916 | biostudies-literature | 2016 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Recombination spot identification Based on gapped k-mers.

Wang Rong R   Xu Yong Y   Liu Bin B  

Scientific reports 20160331


Recombination is crucial for biological evolution, which provides many new combinations of genetic diversity. Accurate identification of recombination spots is useful for DNA function study. To improve the prediction accuracy, researchers have proposed several computational methods for recombination spot identification. The k-mer feature is one of the most useful features for modeling the properties and function of DNA sequences. However, it suffers from the inherent limitation. If the value of  ...[more]

Similar Datasets

| S-EPMC3895138 | biostudies-literature
| S-EPMC5587640 | biostudies-literature
| S-EPMC3495599 | biostudies-other
| S-EPMC4855236 | biostudies-literature
| S-EPMC4989901 | biostudies-literature
| S-EPMC3022572 | biostudies-literature
| S-EPMC2678279 | biostudies-literature
| PRJEB18734 | ENA
| S-EPMC8200320 | biostudies-literature
| S-EPMC11008012 | biostudies-literature