Unknown

Dataset Information

0

EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome.


ABSTRACT: Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12,063 ACEGs (assembly of contiguous ESTs based on genome) and generated 15,857 contigs of average length 934 nt. We predict that roughly 3000 of our contigs represent full-length transcripts. Compared to previous assemblies, ACEGs show extended contig length, increased accuracy and a reduction in redundancy. Because our assembly protocol also uses ESTs with no corresponding genomic sequences, it provides sequence information for genes interrupted by sequence gaps. Detailed analysis of randomly sampled ACEGs reveals several hundred putative cases of alternative splicing, many overlapping transcription units and new genes not identified by gene prediction algorithms. Our protocol, although developed for and tailored to the C. reinhardtii dataset, can be exploited by any eukaryotic genome project for which both a draft genome sequence and ESTs are available.

SUBMITTER: Jain M 

PROVIDER: S-EPMC1874618 | biostudies-other | 2007

REPOSITORIES: biostudies-other

altmetric image

Publications

EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome.

Jain Monica M   Shrager Jeff J   Harris Elizabeth H EH   Halbrook Renee R   Grossman Arthur R AR   Hauser Charles C   Vallon Olivier O  

Nucleic acids research 20070313 6


Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12,063 AC  ...[more]

Similar Datasets

| S-EPMC7484070 | biostudies-literature
| S-EPMC2830987 | biostudies-literature
| S-EPMC4090777 | biostudies-literature
| S-EPMC4549274 | biostudies-literature
2017-11-27 | GSE101944 | GEO
| S-EPMC6274750 | biostudies-literature
| S-EPMC2390635 | biostudies-literature
2013-05-09 | GSE43004 | GEO
| S-EPMC154841 | biostudies-literature
| S-EPMC7887394 | biostudies-literature