Unknown

Dataset Information

0

SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence.


ABSTRACT: SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.

SUBMITTER: Lopez-Maestre H 

PROVIDER: S-EPMC5100560 | biostudies-literature | 2016 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence.

Lopez-Maestre Hélène H   Brinza Lilia L   Marchet Camille C   Kielbassa Janice J   Bastien Sylvère S   Boutigny Mathilde M   Monnin David D   Filali Adil El AE   Carareto Claudia Marcia CM   Vieira Cristina C   Picard Franck F   Kremer Natacha N   Vavre Fabrice F   Sagot Marie-France MF   Lacroix Vincent V  

Nucleic acids research 20160725 19


SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables  ...[more]

Similar Datasets

| S-EPMC3163565 | biostudies-literature
| S-EPMC3891352 | biostudies-literature
| S-EPMC4673977 | biostudies-literature
| S-EPMC4071332 | biostudies-literature
| S-EPMC3491382 | biostudies-literature
| S-EPMC4103496 | biostudies-literature
| S-EPMC7059966 | biostudies-literature
| S-EPMC6523396 | biostudies-literature
| S-EPMC3571712 | biostudies-literature
| S-EPMC4985018 | biostudies-literature