Dataset Information

Predicting the host of influenza viruses based on the word vector.

ABSTRACT: Newly emerging influenza viruses continue to threaten public health. A rapid determination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to predict the host of influenza viruses using the Support Vector Machine (SVM) classifier based on the word vector, a new representation and feature extraction method for biological sequences. The results show that the length of the word within the word vector, the sequence type (DNA or protein) and the species from which the sequences were derived for generating the word vector all influence the performance of models in predicting the host of influenza viruses. In nearly all cases, the models built on the surface proteins hemagglutinin (HA) and neuraminidase (NA) (or their genes) produced better results than internal influenza proteins (or their genes). The best performance was achieved when the model was built on the HA gene based on word vectors (words of three-letters long) generated from DNA sequences of the influenza virus. This results in accuracies of 99.7% for avian, 96.9% for human and 90.6% for swine influenza viruses. Compared to the method of sequence homology best-hit searches using the Basic Local Alignment Search Tool (BLAST), the word vector-based models still need further improvements in predicting the host of influenza A viruses.

SUBMITTER: Xu B

PROVIDER: S-EPMC5518728 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Predicting the host of influenza viruses based on the word vector.

Xu Beibei B Tan Zhiying Z Li Kenli K Jiang Taijiao T Peng Yousong Y

PeerJ 20170718

Newly emerging influenza viruses continue to threaten public health. A rapid determination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to predict the host of influenza viruses using the Support Vector Machine (SVM) classifier based on the word vector, a new representation and feature extraction method for biological sequences. The results show that the length of the word within the word vector, the sequence type (DNA ...[more]

PMID: 28729956

Similar Datasets

Project description:Influenza B virus (IBV) undergoes seasonal antigenic drift more slowly than influenza A virus, but the reasons for this difference are unclear. While the evolutionary dynamics of influenza viruses play out globally, they are fundamentally driven by mutation, reassortment, drift, and selection at the level of individual hosts. These processes have recently been described for influenza A virus, but little is known about the evolutionary dynamics of IBV during individual infections and transmission events. Here, we define the within-host evolutionary dynamics of IBV by sequencing virus populations from naturally infected individuals enrolled in a prospective, community-based cohort over 8,176 person-seasons of observation. Through analysis of high depth-of-coverage sequencing data from samples from 91 individuals with influenza B, we find that IBV accumulates lower genetic diversity than previously observed for influenza A virus during acute infections. Consistent with studies of influenza A viruses, the within-host evolution of IBVs is characterized by purifying selection and the general absence of widespread positive selection of within-host variants. Analysis of shared genetic diversity across 15 sequence-validated transmission pairs suggests that IBV experiences a tight transmission bottleneck similar to that of influenza A virus. These patterns of local-scale evolution are consistent with the lower global evolutionary rate of IBV.IMPORTANCE The evolution of influenza virus is a significant public health problem and necessitates the annual evaluation of influenza vaccine formulation to keep pace with viral escape from herd immunity. Influenza B virus is a serious health concern for children, in particular, yet remains understudied compared to influenza A virus. Influenza B virus evolves more slowly than influenza A virus, but the factors underlying this are not completely understood. We studied how the within-host diversity of influenza B virus relates to its global evolution by sequencing viruses from a community-based cohort. We found that influenza B virus populations have lower within-host genetic diversity than influenza A virus and experience a tight genetic bottleneck during transmission. Our work provides insights into the varying dynamics of influenza viruses in human infection.

Dataset Information

Predicting the host of influenza viruses based on the word vector.

Publications

Predicting the host of influenza viruses based on the word vector.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets