Dataset Information

Statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species.

ABSTRACT: A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species by performing Bayesian regression on the annotated species. First we set the rationale of our method by applying it within the same species, then we extend it to use data available in closely related species. Finally, we generalise the method to handle the case when a certain number of experiments, from several species close to the species on which to make inference, are available. In order to show the performance of the method, we analyse three functionally related networks in the Ascomycota. Two gene network case studies are related to the G2/M phase of the Ascomycota cell cycle; the third is related to morphogenesis. We also compared the method with MatrixReduce and discuss other types of validation and tests. The first network is well known and provides a biological validation test of the method. The two cell cycle case studies, where the gene network size is conserved, demonstrate an effective utility in annotating new species sequences using all the available replicas from model species. The third case, where the gene network size varies among species, shows that the combination of information is less powerful but is still informative. Our methodology is quite general and could be extended to integrate other high-throughput data from model organisms.

SUBMITTER: Lio P

PROVIDER: S-EPMC3439465 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species.

Liò Pietro P Angelini Claudia C De Feis Italia I Nguyen Viet-Anh VA

PloS one 20120911 9

A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species ...[more]

PMID: 22984403

Similar Datasets

Project description:To accumulate evidence of the phylogeny of Mileewinae and the relationships among Mileewa, Processina, and Ujna genera, we sequenced the complete mitochondrial genomes of four Mileewa spp., namely, Mileewa mira, Mileewa lamellata, Mileewa sharpa, and Mileewa amplimacula. The first complete mitogenome of the genus Processina (P. sexmaculata), established by Yang, Deitz & Li from China and comprising five species, was also sequenced in this study. Annotation showed that the five mitogenomes were 14787 -15436 bp in length, and all harbored 37 typical genes. The AT content of the five mitogenomes ranged from 78.3% to 80.2%, which was similar to that of other sequenced Mileewinae species. For protein-coding genes (PCGs), ATN was the start codon, while atp8 and nad5 genes were initiated with TTG, and a great majority of them used TAA or TAG as stop codons, whereas cox2 and nad1 ended with an incomplete codon T-. All tRNAs had a typical cloverleaf secondary structure, except for trnS1, which had a reduced dihydrouridine arm. We further used 59 Membracoidea species and two outgroups to reconstruct phylogenetic trees based on 13 PCGs under an independent partition model with Bayesian inference and Maximum-likelihood methods. Among these two trees, each of the subfamilies Cicadellinae, Typhlocybinae, and Mileewinae were recovered as a monophyletic group with high support values, suggesting that Typhlocybinae was more ancient than Mileewinae and Cicadellinae. Within the Mileewinae subfamily, all species maintained the same relationships and topologies according to both the BI and ML analyses (PP > 0.8, BS > 83) as follows: (M. sharpa + (U. puerana + ((M. ponta + (M. mira + M. lamellata)) + ((M. albovittata + (M. margheritae + M. amplimacula)) + (M. rufivena + (P. sexmaculata + M. alara)))))), and the monophyly of the genera Processina, Mileewa and Ujna were not supported. This study further enriches the Mileewinae mitogenome database and will contribute to future research on the systematics, evolution, and classification of this group.

Dataset Information

Statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species.

Publications

Statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets