Dataset Information

VarioML framework for comprehensive variation data representation and exchange.

ABSTRACT: BACKGROUND:Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. RESULTS:The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. CONCLUSIONS:VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.

SUBMITTER: Byrne M

PROVIDER: S-EPMC3507772 | biostudies-literature | 2012 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

VarioML framework for comprehensive variation data representation and exchange.

Byrne Myles M Fokkema Ivo Fac IF Lancaster Owen O Adamusiak Tomasz T Ahonen-Bishopp Anni A Atlan David D Béroud Christophe C Cornell Michael M Dalgleish Raymond R Devereau Andrew A Patrinos George P GP Swertz Morris A MA Taschner Peter Em PE Thorisson Gudmundur A GA Vihinen Mauno M Brookes Anthony J AJ Muilu Juha J

BMC bioinformatics 20121003

<h4>Background</h4>Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement.<h4>Results</h4>The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pair ...[more]

PMID: 23031277

Similar Datasets

Project description:BackgroundModeling patient data, particularly electronic health records (EHR), is one of the major focuses of machine learning studies in healthcare, as these records provide clinicians with valuable information that can potentially assist them in disease diagnosis and decision-making.MethodsIn this study, we present a multi-level graph-based framework called MedMGF, which models both patient medical profiles extracted from EHR data and their relationship network of health profiles in a single architecture. The medical profiles consist of several layers of data embedding derived from interval records obtained during hospitalization, and the patient-patient network is created by measuring the similarities between these profiles. We also propose a modification to the Focal Loss (FL) function to improve classification performance in imbalanced datasets without the need to imputate the data. MedMGF's performance was evaluated against several Graphical Convolutional Network (GCN) baseline models implemented with Binary Cross Entropy (BCE), FL, class balancing parameter α , and Synthetic Minority Oversampling Technique (SMOTE).ResultsOur proposed framework achieved high classification performance (AUC: 0.8098, ACC: 0.7503, SEN: 0.8750, SPE: 0.7445, NPV: 0.9923, PPV: 0.1367) on an extreme imbalanced pediatric sepsis dataset (n=3,014, imbalance ratio of 0.047). It yielded a classification improvement of 3.81% for AUC, 15% for SEN compared to the baseline GCN+ α FL (AUC: 0.7717, ACC: 0.8144, SEN: 0.7250, SPE: 0.8185, PPV: 0.1559, NPV: 0.9847), and an improvement of 5.88% in AUC and 22.5% compared to GCN+FL+SMOTE (AUC: 0.7510, ACC: 0.8431, SEN: 0.6500, SPE: 0.8520, PPV: 0.1688, NPV: 0.9814). It also showed a classification improvement of 3.86% for AUC, 15% for SEN compared to the baseline GCN+ α BCE (AUC: 0.7712, ACC: 0.8133, SEN: 0.7250, SPE: 0.8173, PPV: 0.1551, NPV: 0.9847), and an improvement of 14.33% in AUC and 27.5% in comparison to GCN+BCE+SMOTE (AUC: 0.6665, ACC: 0.7271, SEN: 0.6000, SPE: 0.7329, PPV: 0.0941, NPV: 0.9754).ConclusionWhen compared to all baseline models, MedMGF achieved the highest SEN and AUC results, demonstrating the potential for several healthcare applications.

Project description:BackgroundIncreased digitalization of healthcare comes along with the cost of cybercrime proliferation. This results to patients' and healthcare providers' skepticism to adopt Health Information Technologies (HIT). In Europe, this shortcoming hampers efficient cross-border health data exchange, which requires a holistic, secure and interoperable framework. This study aimed to provide the foundations for designing a secure and interoperable toolkit for cross-border health data exchange within the European Union (EU), conducted in the scope of the KONFIDO project. Particularly, we present our user requirements engineering methodology and the obtained results, driving the technical design of the KONFIDO toolkit.MethodsOur methodology relied on four pillars: (a) a gap analysis study, reviewing a range of relevant projects/initiatives, technologies as well as cybersecurity strategies for HIT interoperability and cybersecurity; (b) the definition of user scenarios with major focus on cross-border health data exchange in the three pilot countries of the project; (c) a user requirements elicitation phase containing a threat analysis of the business processes entailed in the user scenarios, and (d) surveying and discussing with key stakeholders, aiming to validate the obtained outcomes and identify barriers and facilitators for HIT adoption linked with cybersecurity and interoperability.ResultsAccording to the gap analysis outcomes, full adherence with information security standards is currently not universally met. Sustainability plans shall be defined for adapting existing/evolving frameworks to the state-of-the-art. Overall, lack of integration in a holistic security approach was clearly identified. For each user scenario, we concluded with a comprehensive workflow, highlighting challenges and open issues for their application in our pilot sites. The threat analysis resulted in a set of 30 user goals in total, documented in detail. Finally, indicative barriers of HIT acceptance include lack of awareness regarding HIT risks and legislations, lack of a security-oriented culture and management commitment, as well as usability constraints, while important facilitators concern the adoption of standards and current efforts for a common EU legislation framework.ConclusionsOur study provides important insights to address secure and interoperable health data exchange, while our methodological framework constitutes a paradigm for investigating diverse cybersecurity-related risks in the health sector.

Dataset Information

VarioML framework for comprehensive variation data representation and exchange.

Publications

VarioML framework for comprehensive variation data representation and exchange.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets