Unknown

Dataset Information

0

CRF: detection of CRISPR arrays using random forest.


ABSTRACT: CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.

SUBMITTER: Wang K 

PROVIDER: S-EPMC5407274 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

altmetric image

Publications

CRF: detection of CRISPR arrays using random forest.

Wang Kai K   Liang Chun C  

PeerJ 20170425


CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and ac  ...[more]

Similar Datasets

| S-EPMC6194674 | biostudies-literature
| S-EPMC9022175 | biostudies-literature
| S-EPMC7055778 | biostudies-literature
| S-EPMC2387219 | biostudies-literature
| S-EPMC11351278 | biostudies-literature
| S-EPMC7711527 | biostudies-literature
| S-EPMC6823902 | biostudies-literature
| S-EPMC2916923 | biostudies-literature
| S-EPMC8012581 | biostudies-literature
| S-EPMC8543977 | biostudies-literature