Unknown

Dataset Information

0

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme.


ABSTRACT: BACKGROUND: High-throughput transcriptome sequencing (RNA-seq) technology promises to discover novel protein-coding and non-coding transcripts, particularly the identification of long non-coding RNAs (lncRNAs) from de novo sequencing data. This requires tools that are not restricted by prior gene annotations, genomic sequences and high-quality sequencing. RESULTS: We present an alignment-free tool called PLEK (predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme), which uses a computational pipeline based on an improved k-mer scheme and a support vector machine (SVM) algorithm to distinguish lncRNAs from messenger RNAs (mRNAs), in the absence of genomic sequences or annotations. The performance of PLEK was evaluated on well-annotated mRNA and lncRNA transcripts. 10-fold cross-validation tests on human RefSeq mRNAs and GENCODE lncRNAs indicated that our tool could achieve accuracy of up to 95.6%. We demonstrated the utility of PLEK on transcripts from other vertebrates using the model built from human datasets. PLEK attained >90% accuracy on most of these datasets. PLEK also performed well using a simulated dataset and two real de novo assembled transcriptome datasets (sequenced by PacBio and 454 platforms) with relatively high indel sequencing errors. In addition, PLEK is approximately eightfold faster than a newly developed alignment-free tool, named Coding-Non-Coding Index (CNCI), and 244 times faster than the most popular alignment-based tool, Coding Potential Calculator (CPC), in a single-threading running manner. CONCLUSIONS: PLEK is an efficient alignment-free computational tool to distinguish lncRNAs from mRNAs in RNA-seq transcriptomes of species lacking reference genomes. PLEK is especially suitable for PacBio or 454 sequencing data and large-scale transcriptome data. Its open-source software can be freely downloaded from https://sourceforge.net/projects/plek/files/.

SUBMITTER: Li A 

PROVIDER: S-EPMC4177586 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme.

Li Aimin A   Zhang Junying J   Zhou Zhongyin Z  

BMC bioinformatics 20140919


<h4>Background</h4>High-throughput transcriptome sequencing (RNA-seq) technology promises to discover novel protein-coding and non-coding transcripts, particularly the identification of long non-coding RNAs (lncRNAs) from de novo sequencing data. This requires tools that are not restricted by prior gene annotations, genomic sequences and high-quality sequencing.<h4>Results</h4>We present an alignment-free tool called PLEK (predictor of long non-coding RNAs and messenger RNAs based on an improved  ...[more]

Similar Datasets

| S-EPMC6262761 | biostudies-literature
| S-EPMC4593643 | biostudies-literature
| S-EPMC6779387 | biostudies-literature
| S-EPMC6535665 | biostudies-literature
| S-EPMC6459964 | biostudies-literature
| S-EPMC10792800 | biostudies-literature
| S-EPMC4113672 | biostudies-literature
| S-EPMC5802165 | biostudies-literature
| S-EPMC4094766 | biostudies-literature
| S-EPMC5048419 | biostudies-literature