Unknown

Dataset Information

0

GeoBoost2: a natural languageprocessing pipeline for GenBank metadata enrichment for virus phylogeography.


ABSTRACT:

Summary

We present GeoBoost2, a natural language-processing pipeline for extracting the location of infected hosts for enriching metadata in nucleotide sequences repositories like National Center of Biotechnology Information's GenBank for downstream analysis including phylogeography and genomic epidemiology. The increasing number of pathogen sequences requires complementary information extraction methods for focused research, including surveillance within countries and between borders. In this article, we describe the enhancements from our earlier release including improvement in end-to-end extraction performance and speed, availability of a fully functional web-interface and state-of-the-art methods for location extraction using deep learning.

Availability and implementation

Application is freely available on the web at https://zodo.asu.edu/geoboost2. Source code, usage examples and annotated data for GeoBoost2 is freely available at https://github.com/ZooPhy/geoboost2.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Magge A 

PROVIDER: S-EPMC7755405 | biostudies-literature | 2020 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

GeoBoost2: a natural languageprocessing pipeline for GenBank metadata enrichment for virus phylogeography.

Magge Arjun A   Weissenbacher Davy D   O'Connor Karen K   Tahsin Tasnia T   Gonzalez-Hernandez Graciela G   Scotch Matthew M  

Bioinformatics (Oxford, England) 20201201 20


<h4>Summary</h4>We present GeoBoost2, a natural language-processing pipeline for extracting the location of infected hosts for enriching metadata in nucleotide sequences repositories like National Center of Biotechnology Information's GenBank for downstream analysis including phylogeography and genomic epidemiology. The increasing number of pathogen sequences requires complementary information extraction methods for focused research, including surveillance within countries and between borders. I  ...[more]

Similar Datasets

| S-EPMC5925778 | biostudies-literature
| S-EPMC6225896 | biostudies-literature
| S-EPMC4997033 | biostudies-literature
| S-EPMC2275786 | biostudies-literature
| S-EPMC6343335 | biostudies-literature
| S-EPMC6027284 | biostudies-literature
| S-EPMC8108552 | biostudies-literature
| S-EPMC4028801 | biostudies-literature
| S-EPMC10558085 | biostudies-literature
| S-EPMC10845048 | biostudies-literature