Unknown

Dataset Information

0

A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank.


ABSTRACT: Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding to the RNA structure. Compounding the problem of boundary definition is the fact that a majority of group II intron copies in bacteria are truncated.Here we present a pipeline of 11 programs that collect and analyze group II intron sequences from GenBank. The pipeline begins with a BLAST search of GenBank using a set of representative group II IEPs as queries. Subsequent steps download the corresponding genomic sequences and flanks, filter out non-group II introns, assign introns to phylogenetic subclasses, filter out incomplete and/or non-functional introns, and assign IEP sequences and RNA boundaries to the full-length introns. In the final step, the redundancy in the data set is reduced by grouping introns into sets of ?95% identity, with one example sequence chosen to be the representative.These programs should be useful for comprehensive identification of group II introns in sequence databases as data continue to rapidly accumulate.

SUBMITTER: Abebe M 

PROVIDER: S-EPMC4028801 | biostudies-other | 2013 Dec

REPOSITORIES: biostudies-other

altmetric image

Publications

A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank.

Abebe Michael M   Candales Manuel A MA   Duong Adrian A   Hood Keyar S KS   Li Tony T   Neufeld Ryan A E RAE   Shakenov Abat A   Sun Runda R   Wu Li L   Jarding Ashley M AM   Semper Cameron C   Zimmerly Steven S  

Mobile DNA 20131220 1


<h4>Background</h4>Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding t  ...[more]