Dataset Information

OrfM: a fast open reading frame predictor for metagenomic data.

ABSTRACT:

Unlabelled

Finding and translating stretches of DNA lacking stop codons is a task common in the analysis of sequence data. However, the computational tools for finding open reading frames are sufficiently slow that they are becoming a bottleneck as the volume of sequence data grows. This computational bottleneck is especially problematic in metagenomics when searching unassembled reads, or screening assembled contigs for genes of interest. Here, we present OrfM, a tool to rapidly identify open reading frames (ORFs) in sequence data by applying the Aho-Corasick algorithm to find regions uninterrupted by stop codons. Benchmarking revealed that OrfM finds identical ORFs to similar tools ('GetOrf' and 'Translate') but is four-five times faster. While OrfM is sequencing platform-agnostic, it is best suited to large, high quality datasets such as those produced by Illumina sequencers.

Availability and implementation

Source code and binaries are freely available for download at http://github.com/wwood/OrfM or through GNU Guix under the LGPL 3+ license. OrfM is implemented in C and supported on GNU/Linux and OSX.

Contacts

b.woodcroft@uq.edu.au

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Woodcroft BJ

PROVIDER: S-EPMC5013905 | biostudies-literature | 2016 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

OrfM: a fast open reading frame predictor for metagenomic data.

Woodcroft Ben J BJ Boyd Joel A JA Tyson Gene W GW

Bioinformatics (Oxford, England) 20160503 17

<h4>Unlabelled</h4>Finding and translating stretches of DNA lacking stop codons is a task common in the analysis of sequence data. However, the computational tools for finding open reading frames are sufficiently slow that they are becoming a bottleneck as the volume of sequence data grows. This computational bottleneck is especially problematic in metagenomics when searching unassembled reads, or screening assembled contigs for genes of interest. Here, we present OrfM, a tool to rapidly identif ...[more]

PMID: 27153669

Similar Datasets

Project description:Human astrovirus (HAstV) strains exhibit high levels of genetic diversity, and many recombinant strains with different recombination patterns have been reported. The aims of the present study were to investigate the emergence of HAstV recombinant strains and to characterize the recombination patterns of the strains detected in pediatric patients admitted to the hospital with acute gastroenteritis in Chiang Mai, Thailand. A total of 92 archival HAstV strains detected in 2011 to 2020 were characterized regarding their open reading frame 1a (ORF1a) genotypes in comparison with their ORF1b genotypes to identify recombinant strains. The recombination breakpoints of the putative recombinant strains were determined by whole-genome sequencing and were analyzed by SimPlot and RDP software. Three HAstV strains (CMH-N178-12, CMH-S059-15, and CMH-S062-15) were found to be recombinant strains of three different HAstV genotypes, i.e., HAstV5, HAstV8, and HAstV1 within the ORF1a, ORF1b, and ORF2 regions, respectively. The CMH-N178-12 strain displayed recombination breakpoints at nucleotide positions 2681 and 4357 of ORF1a and ORF1b, respectively, whereas the other two recombinant strains, CMH-S059-15 and CMH-S062-15, displayed recombination breakpoints at nucleotide positions 2612 and 4357 of ORF1a and ORF1b, respectively. This is the first study to reveal nearly full-length genome sequences of HAstV recombinant strains with a novel recombination pattern of ORF1a-ORF1b-ORF2 genotypes. This finding may be useful as a guideline for identifying other recombinant HAstV strains in other geographical regions and may provide a better understanding of their genetic diversity, as well as basic knowledge regarding virus evolution. IMPORTANCE Recombination is one of the mechanisms that plays a crucial role in the genetic diversity and evolution of HAstV. We wished to investigate the emergence of HAstV recombinant strains and to analyze the whole-genome sequences of the putative HAstV recombinant strains detected in pediatric patients with acute gastroenteritis in 2011 to 2020. We reported 3 novel intergenotype recombinant strains of HAstV5-HAstV8-HAstV1 at the ORF1a-ORF1b-ORF2 regions of the HAstV genome. The hot spots of recombination occur frequently near the ORF1a-ORF1b and ORF1b-ORF2 junctions of the HAstV genome. The findings indicate that intergenotype recombination of HAstV occurs frequently in nature. The emergence of a novel recombinant strain allows the new virus to adapt and successfully escape from the host immune system, eventually emerging as the predominant genotype to infect human populations that lack herd immunity against novel recombinant strains. The virus may cause an outbreak and needs to be monitored continually.

Project description:Translation of an mRNA in eukaryotes starts at AUG in most cases. Near-cognate codons (NCCs) such as UUG, ACG and AUU are also used as start sites at low levels in S. cerevisiae. Initiation from NCCs or AUGs in the 5’-untranslated regions (UTRs) of mRNAs can lead to translation of upstream open reading frames (uORFs) that might regulate expression of the main ORF (mORF). Although there is some circumstantial evidence that the translation of uORFs can be affected by environmental conditions, little is known about how it is affected by changes in growth temperature. Using reporter assays, we found that changes in growth temperature can affect translation from NCC start sites in yeast cells, suggesting the possibility that gene expression could be regulated by temperature by altering use of different uORF start codons. Using ribosome profiling, we provide evidence that growth temperature regulates the efficiency of translation of nearly 200 uORFs in S. cerevisiae. Of these uORFs, most that start with an AUG codon have increased translational efficiency at 37 ˚C relative to 30 ˚C and decreased efficiency at 20 ˚C. For translationally regulated uORFs starting with NCCs, we did not observe a general trend for the direction of regulation as a function of temperature, suggesting mRNA-specific features can determine the mode of temperature-dependent regulation. Consistent with this conclusion, the position of the uORFs in the 5’-leader relative to the 5’-cap and the start codon of the main ORF correlates with the direction of temperature-dependent regulation of uORF translation. We have identified several novel cases in which changes in uORF translation are inversely correlated with changes in the translational efficiency of the downstream main ORF. Our data suggest that translation of these mRNAs is subject to temperature-dependent, uORF-mediated regulation. Overall, our data suggest that alterations in the translation of specific uORFs by temperature can regulate gene expression in S. cerevisiae.

Dataset Information

OrfM: a fast open reading frame predictor for metagenomic data.

Unlabelled

Availability and implementation

Contacts

Supplementary information

Publications

OrfM: a fast open reading frame predictor for metagenomic data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets