Unknown

Dataset Information

0

Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence.


ABSTRACT: During the determination of a DNA sequence, the introduction of artifactual frameshifts and/or in-frame stop codons in putative genes can lead to misprediction of gene products. Detection of such errors with a method based on protein similarity matching is only possible when related sequences are available in databases. Here, we present a method to detect frameshift errors in DNA sequences that is based on the intrinsic properties of the coding sequences. It combines the results of two analyses, the search for translational initiation/termination sites and the prediction of coding regions. This method was used to screen the complete Bacillus subtilis genome sequence and the regions flanking putative errors were resequenced for verification. This procedure allowed us to correct the sequence and to analyze in detail the nature of the errors. Interestingly, in several cases in-frame termination codons or frameshifts were not sequencing errors but confirmed to be present in the chromosome, indicating that the genes are either nonfunctional (pseudogenes) or subject to regulatory processes such as programmed translational frameshifts. The method can be used for checking the quality of the sequences produced by any prokaryotic genome sequencing project.

SUBMITTER: Medigue C 

PROVIDER: S-EPMC310837 | biostudies-literature | 1999 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence.

Médigue C C   Rose M M   Viari A A   Danchin A A  

Genome research 19991101 11


During the determination of a DNA sequence, the introduction of artifactual frameshifts and/or in-frame stop codons in putative genes can lead to misprediction of gene products. Detection of such errors with a method based on protein similarity matching is only possible when related sequences are available in databases. Here, we present a method to detect frameshift errors in DNA sequences that is based on the intrinsic properties of the coding sequences. It combines the results of two analyses,  ...[more]

Similar Datasets

| S-EPMC3735065 | biostudies-literature
| S-EPMC4357751 | biostudies-literature
| S-EPMC4966479 | biostudies-literature
2015-01-26 | GSE65272 | GEO
| S-EPMC7409865 | biostudies-literature
| S-EPMC7953295 | biostudies-literature
| S-EPMC4424299 | biostudies-literature
| S-EPMC3868870 | biostudies-literature
2021-12-31 | GSE166082 | GEO
| S-EPMC113120 | biostudies-literature