Dataset Information

Characterization of 954 bovine full-CDS cDNA sequences.

ABSTRACT: BACKGROUND: Genome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes. Functional genomics studies also rely on transcript sequence to create expression microarrays or interpret digital tag data produced by methods such as Serial Analysis of Gene Expression (SAGE). Transcript sequence can be predicted based on reconstruction from overlapping expressed sequence tags (EST) that are obtained by single-pass sequencing of random cDNA clones, but these reconstructions are prone to errors caused by alternative splice forms, transcripts from gene families with related sequences, and expressed pseudogenes. These errors confound genome assembly and annotation. The most useful transcript sequences are derived by complete insert sequencing of clones containing the entire length, or at least the full protein coding sequence (CDS) portion, of the source mRNA. While the bovine genome sequencing initiative is nearing completion, there is currently a paucity of bovine full-CDS mRNA and protein sequence data to support bovine genome assembly and functional genomics studies. Consequently, the production of high-quality bovine full-CDS cDNA sequences will enhance the bovine genome assembly and functional studies of bovine genes and gene products. The goal of this investigation was to identify and characterize the full-CDS sequences of bovine transcripts from clones identified in non-full-length enriched cDNA libraries. In contrast to several recent full-length cDNA investigations, these full-CDS cDNAs were selected, sequenced, and annotated without the benefit of the target organism's genomic sequence, by using comparison of bovine EST sequence to existing human mRNA to identify likely full-CDS clones for full-length insert cDNA (FLIC) sequencing. RESULTS: The predicted bovine protein lengths, 5' UTR lengths, and Kozak consensus sequences from 954 bovine FLIC sequences (bFLICs; average length 1713 nt, representing 762 distinct loci) are all consistent with previously sequenced mammalian full-length transcripts. CONCLUSION: In most cases, the bFLICs span the entire CDS of the genes, providing the basis for creating predicted bovine protein sequences to support proteomics and comparative evolutionary research as well as functional genomics and genome annotation. The results demonstrate the utility of the comparative approach in obtaining predicted protein sequences in other species.

SUBMITTER: Harhay GP

PROVIDER: S-EPMC1314900 | biostudies-literature | 2005

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Characterization of 954 bovine full-CDS cDNA sequences.

Harhay Gregory P GP Sonstegard Tad S TS Keele John W JW Heaton Michael P MP Clawson Michael L ML Snelling Warren M WM Wiedmann Ralph T RT Van Tassell Curt P CP Smith Timothy P L TP

BMC genomics 20051123

<h4>Background</h4>Genome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes. Functional genomics studies also rely on transcript sequence to create expression microarrays or interpret digital tag data produced by methods such as Serial Analysis of Gene Expression (SAGE). Transcript sequence can be predicted based on reconstruction from overlapping expressed sequence tags (EST) that are obtained by ...[more]

PMID: 16305752

Similar Datasets

Project description:Historically part of the coronavirus (CoV) family, torovirus (ToV) was recently classified in the new family Tobaniviridae. While reverse genetics systems have been established for various CoVs, none exist for ToVs. Here, we developed a reverse genetics system using an infectious full-length cDNA clone of bovine ToV (BToV) in a bacterial artificial chromosome (BAC). Recombinant BToV harboring genetic markers had the same phenotype as wild-type (wt) BToV. To generate two types of recombinant virus, the hemagglutinin-esterase (HE) gene was edited, as cell-adapted wtBToV generally loses full-length HE (HEf), resulting in soluble HE (HEs). First, recombinant viruses with HEf and hemagglutinin (HA)-tagged HEf or HEs genes were rescued. These exhibited no significant differences in their effect on virus growth in HRT18 cells, suggesting that HE is not essential for viral replication in these cells. Thereafter, we generated a recombinant virus (rEGFP) wherein HE was replaced by the enhanced green fluorescent protein (EGFP) gene. rEGFP expressed EGFP in infected cells but showed significantly lower levels of viral growth than wtBToV. Moreover, rEGFP readily deleted the EGFP gene after one passage. Interestingly, rEGFP variants with two mutations (C1442F and I3562T) in nonstructural proteins (NSPs) that emerged during passage exhibited improved EGFP expression, EGFP gene retention, and viral replication. An rEGFP into which both mutations were introduced displayed a phenotype similar to that of these variants, suggesting that the mutations contributed to EGFP gene acceptance. The current findings provide new insights into BToV, and reverse genetics will help advance the current understanding of this neglected pathogen. IMPORTANCE ToVs are diarrhea-causing pathogens detected in various species, including humans. Through the development of a BAC-based BToV, we introduced the first reverse genetics system for Tobaniviridae. Utilizing this system, recombinant BToVs with a full-length HE gene were generated. Remarkably, although clinical BToVs generally lose the HE gene after a few passages, some recombinant viruses generated in the current study retained the HE gene for up to 20 passages while accumulating mutations in NSPs, which suggested that these mutations may be involved in HE gene retention. The EGFP gene of recombinant viruses was unstable, but rEGFP into which two NSP mutations were introduced exhibited improved EGFP expression, gene retention, and viral replication. These data suggested the existence of an NSP-based acceptance or retention mechanism for exogenous RNA or HE genes. Recombinant BToVs and reverse genetics are powerful tools for understanding fundamental viral processes, pathogenesis, and BToV vaccine development.

Project description:Eimeria tenella is an apicomplexan parasite that causes coccidiosis in the domestic fowl. Infection with this parasite is diagnosed frequently in intensively reared poultry and its control is usually accorded a high priority, especially in chickens raised for meat. Prophylactic chemotherapy has been the primary method used for the control of coccidiosis. However, drug efficacy can be compromised by drug-resistant parasites and the lack of new drugs highlights demands for alternative control strategies including vaccination. In the long term, sustainable control of coccidiosis will most likely be achieved through integrated drug and vaccination programmes. Characterisation of the E. tenella transcriptome may provide a better understanding of the biology of the parasite and aid in the development of a more effective control for coccidiosis.More than 15,000 partial sequences were generated from the 5' and 3' ends of clones randomly selected from an E. tenella second generation merozoite full-length cDNA library. Clustering of these sequences produced 1,529 unique transcripts (UTs). Based on the transcript assembly and subsequently primer walking, 433 full-length cDNA sequences were successfully generated. These sequences varied in length, ranging from 441 bp to 3,083 bp, with an average size of 1,647 bp. Simple sequence repeat (SSR) analysis identified CAG as the most abundant trinucleotide motif, while codon usage analysis revealed that the ten most infrequently used codons in E. tenella are UAU, UGU, GUA, CAU, AUA, CGA, UUA, CUA, CGU and AGU. Subsequent analysis of the E. tenella complete coding sequences identified 25 putative secretory and 60 putative surface proteins, all of which are now rational candidates for development as recombinant vaccines or drug targets in the effort to control avian coccidiosis.This paper describes the generation and characterisation of full-length cDNA sequences from E. tenella second generation merozoites and provides new insights into the E. tenella transcriptome. The data generated will be useful for the development and validation of diagnostic and control strategies for coccidiosis and will be of value in annotation of the E. tenella genome sequence.

Project description:BACKGROUND: Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. RESULTS: High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. CONCLUSION: This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well.

Dataset Information

Characterization of 954 bovine full-CDS cDNA sequences.

Publications

Characterization of 954 bovine full-CDS cDNA sequences.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets