Dataset Information

KmerGO: A Tool to Identify Group-Specific Sequences With k-mers.

ABSTRACT: Capturing group-specific sequences between two groups of genomic/metagenomic sequences is critical for the follow-up identifications of singular nucleotide variants (SNVs), gene families, microbial species or other elements associated with each group. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered a "group-specific" sequence in our study. We developed a user-friendly tool, KmerGO, to identify group-specific sequences between two groups of genomic/metagenomic long sequences or high-throughput sequencing datasets. Compared with other tools, KmerGO captures group-specific k-mers (k up to 40 bps) with much lower requirements for computing resources in much shorter running time. For a 1.05 TB dataset (.fasta), it takes KmerGO about 21.5 h (including k-mer counting) to return assembled group-specific sequences on a regular stand-alone workstation with no more than 1 GB memory. Furthermore, KmerGO can also be applied to capture trait-associated sequences for continuous-trait. Through multi-process parallel computing, KmerGO is implemented with both graphic user interface and command line on Linux and Windows free from any pre-installed supporting environments, packages, and complex configurations. The output group-specific k-mers or sequences from KmerGO could be the inputs of other tools for the downstream discovery of biomarkers, such as genetic variants, species, or genes. KmerGO is available at https://github.com/ChnMasterOG/KmerGO.

SUBMITTER: Wang Y

PROVIDER: S-EPMC7477287 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

KmerGO: A Tool to Identify Group-Specific Sequences With <i>k</i>-mers.

Wang Ying Y Chen Qi Q Deng Chao C Zheng Yiluan Y Sun Fengzhu F

Frontiers in microbiology 20200825

Capturing group-specific sequences between two groups of genomic/metagenomic sequences is critical for the follow-up identifications of singular nucleotide variants (SNVs), gene families, microbial species or other elements associated with each group. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered a "group-specific" sequence in our study. We developed a user-friendly tool, KmerGO, to identify group-specific sequences between two groups of ...[more]

PMID: 32983048

Dataset Information

KmerGO: A Tool to Identify Group-Specific Sequences With k-mers.

Publications

KmerGO: A Tool to Identify Group-Specific Sequences With <i>k</i>-mers.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

VIRALpro: a tool to identify viral capsid and tail sequences.
| S-EPMC5860506 | biostudies-literature

Efficient association mapping from k-mers-An application in finding sex-specific sequences.
| S-EPMC7790365 | biostudies-literature

Gene Unprediction with Spurio: A tool to identify spurious protein sequences.
| S-EPMC5897793 | biostudies-literature

K-CLASP: A Tool to Identify Phosphosite Specific Kinases and Interacting Proteins.
| S-EPMC5481203 | biostudies-literature

Group-specific archaeological signatures of stone tool use in wild macaques.
| S-EPMC6805154 | biostudies-literature

An efficient strategy using k-mers to analyse 16S rRNA sequences.
| S-EPMC5537200 | biostudies-other

Kinase inhibition profiles as a tool to identify kinases for specific phosphorylation sites.
| S-EPMC7125195 | biostudies-literature

Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences.
| S-EPMC3831467 | biostudies-literature

MERS-CoV‒Specific T-Cell Responses in Camels after Single MVA-MERS-S Vaccination.
| S-EPMC10202854 | biostudies-literature