Dataset Information

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity Database.

ABSTRACT:

Background

In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database.

Methods

The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing.

Results and conclusions

Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS.

SUBMITTER: Pannarale P

PROVIDER: S-EPMC3303717 | biostudies-literature | 2012 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity Database.

Pannarale Paolo P Catalano Domenico D De Caro Giorgio G Grillo Giorgio G Leo Pietro P Pappadà Graziano G Rubino Francesco F Scioscia Gaetano G Licciulli Flavio F

BMC bioinformatics 20120328

<h4>Background</h4>In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and pri ...[more]

PMID: 22536971

Similar Datasets

Project description:IntroductionHIV-1 genotypic resistance test (GRT) interpretation systems (IS) require updates as new studies on HIV-1 drug resistance are published and as treatment guidelines evolve.MethodsAn expert panel was created to provide recommendations for the update of the Stanford HIV Drug Resistance Database (HIVDB) GRT-IS. The panel was polled on the ARVs to be included in a GRT report, and the drug-resistance interpretations associated with 160 drug-resistance mutation (DRM) pattern-ARV combinations. The DRM pattern-ARV combinations included 52 nucleoside RT inhibitor (NRTI) DRM pattern-ARV combinations (13 patterns x 4 NRTIs), 27 nonnucleoside RT inhibitor (NNRTI) DRM pattern-ARV combinations (9 patterns x 3 NNRTIs), 39 protease inhibitor (PI) DRM pattern-ARV combinations (13 patterns x 3 PIs) and 42 integrase strand transfer inhibitor (INSTI) DRM pattern-ARV combinations (14 patterns x 3 INSTIs).ResultsThere was universal agreement that a GRT report should include the NRTIs lamivudine, abacavir, zidovudine, emtricitabine, and tenofovir disoproxil fumarate; the NNRTIs efavirenz, etravirine, nevirapine, and rilpivirine; the PIs atazanavir/r, darunavir/r, and lopinavir/r (with "/r" indicating pharmacological boosting with ritonavir or cobicistat); and the INSTIs dolutegravir, elvitegravir, and raltegravir. There was a range of opinion as to whether the NRTIs stavudine and didanosine and the PIs nelfinavir, indinavir/r, saquinavir/r, fosamprenavir/r, and tipranavir/r should be included. The expert panel members provided highly concordant DRM pattern-ARV interpretations with only 6% of NRTI, 6% of NNRTI, 5% of PI, and 3% of INSTI individual expert interpretations differing from the expert panel median by more than one resistance level. The expert panel median differed from the HIVDB 7.0 GRT-IS for 20 (12.5%) of the 160 DRM pattern-ARV combinations including 12 NRTI, two NNRTI, and six INSTI pattern-ARV combinations. Eighteen of these differences were updated in HIVDB 8.1 GRT-IS to reflect the expert panel median. Additionally, HIVDB users are now provided with the option to exclude those ARVs not considered to be universally required.ConclusionsThe HIVDB GRT-IS was updated through a collaborative process to reflect changes in HIV drug resistance knowledge, treatment guidelines, and expert opinion. Such a process broadens consensus among experts and identifies areas requiring further study.

Dataset Information

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity Database.

Background

Methods

Results and conclusions

Publications

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity Database.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets