Dataset Information

A PostgreSQL Tripal solution for large-scale genotypic and phenotypic data.

ABSTRACT: Researchers are seeking cost-effective solutions for management and analysis of large-scale genotypic and phenotypic data. Open-source software is uniquely positioned to fill this need through user-focused, crowd-sourced development. Tripal, an open-source toolkit for developing biological data web portals, uses the GMOD Chado database schema to achieve flexible, ontology-driven storage in PostgreSQL. Tripal also aids research-focused web portals in providing data according to findable, accessible, interoperable, reusable (FAIR) principles. We describe here a fully relational PostgreSQL solution to handle large-scale genotypic and phenotypic data that is implemented as a collection of freely available, open-source modules. These Tripal extension modules provide a holistic approach for importing, storage, display and analysis within a relational database schema. Furthermore, they embody the Tripal approach to FAIR data by providing multiple search tools and ensuring metadata is fully described and interoperable. Our solution focuses on data integrity, as well as optimizing performance to provide a fully functional system that is currently being used in the production of Tripal portals for crop species. We fully describe the implementation of our solution and discuss why a PostgreSQL-powered web portal provides an efficient environment for researcher-driven genotypic and phenotypic data analysis.

SUBMITTER: Sanderson LA

PROVIDER: S-EPMC8363843 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:There is a growing need for flexible methods for the analysis of large-scale functional magnetic resonance imaging (fMRI) data for the estimation of global signatures that summarize the population while preserving individual-specific traits. Independent vector analysis (IVA) is a data-driven method that jointly estimates global spatio-temporal patterns from multi-subject fMRI data, and effectively preserves subject variability. However, as we show, IVA performance is negatively affected when the number of datasets and components increases especially when there is low component correlation across the datasets. We study the problem and its relationship with respect to correlation across the datasets, and propose an effective method for addressing the issue by incorporating reference information of the estimation patterns into the formulation, as a guidance in high dimensional scenarios. Constrained IVA (cIVA) provides an efficient framework for incorporating references, however its performance depends on a user-defined constraint parameter, which enforces the association between the reference signals and estimation patterns to a fixed level. We propose adaptive cIVA (acIVA) that tunes the constraint parameter to allow flexible associations between the references and estimation patterns, and enables incorporating multiple reference signals, without enforcing inaccurate conditions. Our results indicate that acIVA can reliably estimate high-dimensional multivariate sources from large-scale simulated datasets, when compared with standard IVA. It also successfully extracts meaningful functional networks from a large-scale fMRI dataset for which standard IVA did not converge. The method also efficiently captures subject-specific information, which is demonstrated through observed gender differences in spectral power, higher spectral power in males at low frequencies and in females at high frequencies, within the motor, attention, visual and default mode networks.

Dataset Information

A PostgreSQL Tripal solution for large-scale genotypic and phenotypic data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets