Unknown

Dataset Information

0

ExUTR: a novel pipeline for large-scale prediction of 3'-UTR sequences from NGS data.


ABSTRACT: The three prime untranslated region (3'-UTR) is known to play a pivotal role in modulating gene expression by determining the fate of mRNA. Many crucial developmental events, such as mammalian spermatogenesis, tissue patterning, sex determination and neurogenesis, rely heavily on post-transcriptional regulation by the 3'-UTR. However, 3'-UTR biology seems to be a relatively untapped field, with only limited tools and 3'-UTR resources available. To elucidate the regulatory mechanisms of the 3'-UTR on gene expression, firstly the 3'-UTR sequences must be identified. Current 3'-UTR mining tools, such as GETUTR, 3USS and UTRscan, all depend on a well-annotated reference genome or curated 3'-UTR sequences, which hinders their application on a myriad of non-model organisms where the genomes are not available. To address these issues, the establishment of an NGS-based, automated pipeline is urgently needed for genome-wide 3'-UTR prediction in the absence of reference genomes.Here, we propose ExUTR, a novel NGS-based pipeline to predict and retrieve 3'-UTR sequences from RNA-Seq experiments, particularly designed for non-model species lacking well-annotated genomes. This pipeline integrates cutting-edge bioinformatics tools, databases (Uniprot and UTRdb) and novel in-house Perl scripts, implementing a fully automated workflow. By taking transcriptome assemblies as inputs, this pipeline identifies 3'-UTR signals based primarily on the intrinsic features of transcripts, and outputs predicted 3'-UTR candidates together with associated annotations. In addition, ExUTR only requires minimal computational resources, which facilitates its implementation on a standard desktop computer with reasonable runtime, making it affordable to use for most laboratories. We also demonstrate the functionality and extensibility of this pipeline using publically available RNA-Seq data from both model and non-model species, and further validate the accuracy of predicted 3'-UTR using both well-characterized 3'-UTR resources and 3P-Seq data.ExUTR is a practical and powerful workflow that enables rapid genome-wide 3'-UTR discovery from NGS data. The candidates predicted through this pipeline will further advance the study of miRNA target prediction, cis elements in 3'-UTR and the evolution and biology of 3'-UTRs. Being independent of a well-annotated reference genome will dramatically expand its application to much broader research area, encompassing all species for which RNA-Seq is available.

SUBMITTER: Huang Z 

PROVIDER: S-EPMC5674806 | biostudies-literature | 2017 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

ExUTR: a novel pipeline for large-scale prediction of 3'-UTR sequences from NGS data.

Huang Zixia Z   Teeling Emma C EC  

BMC genomics 20171106 1


<h4>Background</h4>The three prime untranslated region (3'-UTR) is known to play a pivotal role in modulating gene expression by determining the fate of mRNA. Many crucial developmental events, such as mammalian spermatogenesis, tissue patterning, sex determination and neurogenesis, rely heavily on post-transcriptional regulation by the 3'-UTR. However, 3'-UTR biology seems to be a relatively untapped field, with only limited tools and 3'-UTR resources available. To elucidate the regulatory mech  ...[more]

Similar Datasets

| S-EPMC4525226 | biostudies-literature
| S-EPMC5870678 | biostudies-literature
| S-EPMC7707597 | biostudies-literature
2014-09-17 | GSE59408 | GEO
| S-EPMC5587802 | biostudies-literature
2014-09-17 | E-GEOD-59408 | biostudies-arrayexpress
| S-EPMC4262678 | biostudies-literature
| S-EPMC8425578 | biostudies-literature
| S-EPMC5799033 | biostudies-literature
| S-EPMC8314304 | biostudies-literature