Unknown

Dataset Information

0

Sequence database for the single-copy, nuclear-encoded, core photosynthetic gene psbO


ABSTRACT: We have compiled a psbO sequence database for its use as phytoplankton marker gene (see Pierella Karlusich et al 2023 Molecular Ecology Resources doi:10.1111/1755-0998.13592). psbO is nuclear-encoded and only present in photosynthetic organisms (both cyanobacteria and eukaryotic phototrophs), mainly in one copy per genome. The database contains >18,000 unique psbO sequences covering cyanobacteria, photosynthetic protists, macroalgae and land plants. It includes sequences retrieved from IMG, NCBI, MMETSP and other sequenced genomes and transcriptomes, as well as from the environmental sequence catalogs of Global Ocean Sampling and Tara Oceans. The taxonomic assignment of environmental sequences of psbO was determined by the placement of their translated sequences on a PsbO protein reference phylogeny. This reference phylogeny was built in the following way. The sequences were retrieved using HMMer version 3.2.1 with gathering threshold option (http://hmmer.org/) for the corresponding Pfam domain (MSP; PF01716) against the translated sequenced genomes and transcriptomes from the literature and from PhycoCosm, MMETSP and IMG databases. The translated Pfam region of each sequence was retrieved and the redundancy of the dataset was reduced using CDHIT version 4.6.4 (W. Li & Godzik, 2006) at a 80% identity cut-off. These translated sequences were then aligned with MAFFT version 6 using the G-INS-I strategy (Katoh & Toh, 2008). The reference phylogenetic trees was generated with PhyML version 3.0 using the LG substitution model plus gamma-distributed rates and four substitution rate categories (Guindon et al., 2010). The starting tree was a BIONJ tree and the type of tree improvement was subtree pruning and regrafting. Branch support was calculated using the approximate likelihood ratio test (aLRT) with a Shimodaira–Hasegawa-like (SH-like) procedure. Contaminant sequences were carefully removed based on phylogenetic incongruence. The corresponding curated final alignment was used as reference. For parallelization of the taxonomic annotation task, a set of 50 environmental sequences were translated and the PsbO specific Pfam region (PF01716) were retrieved for the following analysis. First, they were aligned against the reference alignment using the option --add of MAFFT version 6 with the G-INS-I strategy (Katoh and Toh 2008 Brief Bioinformatics 9:286-298). Second, the resulting alignment was used for building a phylogeny as described above. Finally, the sequences were classified according to their grouping in monophyletic branches of statistical support >0.7 with reference sequences of the same taxonomic group.

ORGANISM(S): cyanobacteria phytoplankton algae land plants

SUBMITTER:  

PROVIDER: S-BSST659 | biostudies-other |

REPOSITORIES: biostudies-other

Similar Datasets

| S-EPMC29379 | biostudies-literature
| S-EPMC3351400 | biostudies-literature
| S-EPMC360649 | biostudies-other
| S-EPMC3416576 | biostudies-literature
| S-EPMC1688541 | biostudies-other
| S-EPMC3792871 | biostudies-literature
| S-EPMC6858290 | biostudies-literature
| S-EPMC3486802 | biostudies-literature
| S-EPMC5991577 | biostudies-literature
| S-EPMC6283280 | biostudies-literature