Project description:ObjectiveThe study sought to design, pilot, and evaluate a federated data completeness tracking system (CTX) for assessing completeness in research data extracted from electronic health record data across the Accessible Research Commons for Health (ARCH) Clinical Data Research Network.Materials and methodsThe CTX applies a systems-based approach to design workflow and technology for assessing completeness across distributed electronic health record data repositories participating in a queryable, federated network. The CTX invokes 2 positive feedback loops that utilize open source tools (DQe-c and Vue) to integrate technology and human actors in a system geared for increasing capacity and taking action. A pilot implementation of the system involved 6 ARCH partner sites between January 2017 and May 2018.ResultsThe ARCH CTX has enabled the network to monitor and, if needed, adjust its data management processes to maintain complete datasets for secondary use. The system allows the network and its partner sites to profile data completeness both at the network and partner site levels. Interactive visualizations presenting the current state of completeness in the context of the entire network as well as changes in completeness across time were valued among the CTX user base.DiscussionDistributed clinical data networks are complex systems. Top-down approaches that solely rely on technology to report data completeness may be necessary but not sufficient for improving completeness (and quality) of data in large-scale clinical data networks. Improving and maintaining complete (high-quality) data in such complex environments entails sociotechnical systems that exploit technology and empower human actors to engage in the process of high-quality data curating.ConclusionsThe CTX has increased the network's capacity to rapidly identify data completeness issues and empowered ARCH partner sites to get involved in improving the completeness of respective data in their repositories.

Project description:BackgroundDNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database.ResultsGEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories.ConclusionGEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at http://cgs.pharm.kyoto-u.ac.jp/services/network.

Project description:BackgroundCommonly, several traits are assessed in agronomic experiments to better understand the factors under study. However, it is also common to see that even when several traits are available, researchers opt to follow the easiest way by applying univariate analyses and post-hoc tests for mean comparison for each trait, which arouses the hypothesis that the benefits of a multi-trait framework analysis may have not been fully exploited in this area.ResultsIn this paper, we extended the theoretical foundations of the multi-trait genotype-ideotype distance index (MGIDI) to analyze multivariate data either in simple experiments (e.g., one-way layout with few treatments and traits) or complex experiments (e.g., with a factorial treatment structure). We proposed an optional weighting process that makes the ranking of treatments that stands out in traits with higher weights more likely. Its application is illustrated using (1) simulated data and (2) real data from a strawberry experiment that aims to select better factor combinations (namely, cultivar, transplant origin, and substrate mixture) based on the desired performance of 22 phenological, productive, physiological, and qualitative traits. Our results show that most of the strawberry traits are influenced by the cultivar, transplant origin, cultivation substrates, as well as by the interaction between cultivar and transplant origin. The MGIDI ranked the Albion cultivar originated from Imported transplants and the Camarosa cultivar originated from National transplants as the better factor combinations. The substrates with burned rice husk as the main component (70%) showed satisfactory physical proprieties, providing higher water use efficiency. The strengths and weakness view provided by the MGIDI revealed that looking for an ideal treatment should direct the efforts on increasing fruit production of Albion transplants from Imported origin. On the other hand, this treatment has strengths related to productive precocity, total soluble solids, and flesh firmness.ConclusionsOverall, this study opens the door to the use of MGIDI beyond the plant breeding context, providing a unique, practical, robust, and easy-to-handle multi-trait-based framework to analyze multivariate data. There is an exciting possibility for this to open up new avenues of research, mainly because using the MGIDI in future studies will dramatically reduce the number of tables/figures needed, serving as a powerful tool to guide researchers toward better treatment recommendations.

Project description:BackgroundNew approaches and tools were needed to support the strategic planning, implementation and management of a Program launched by the Brazilian Government to fund research, development and capacity building on neglected tropical diseases with strong focus on the North, Northeast and Center-West regions of the country where these diseases are prevalent.Methodology/principal findingsBased on demographic, epidemiological and burden of disease data, seven diseases were selected by the Ministry of Health as targets of the initiative. Publications on these diseases by Brazilian researchers were retrieved from international databases, analyzed and processed with text-mining tools in order to standardize author- and institution's names and addresses. Co-authorship networks based on these publications were assembled, visualized and analyzed with social network analysis software packages. Network visualization and analysis generated new information, allowing better design and strategic planning of the Program, enabling decision makers to characterize network components by area of work, identify institutions as well as authors playing major roles as central hubs or located at critical network cut-points and readily detect authors or institutions participating in large international scientific collaborating networks.Conclusions/significanceTraditional criteria used to monitor and evaluate research proposals or R&D Programs, such as researchers' productivity and impact factor of scientific publications, are of limited value when addressing research areas of low productivity or involving institutions from endemic regions where human resources are limited. Network analysis was found to generate new and valuable information relevant to the strategic planning, implementation and monitoring of the Program. It afforded a more proactive role of the funding agencies in relation to public health and equity goals, to scientific capacity building objectives and a more consistent engagement of institutions and authors from endemic regions based on innovative criteria and parameters anchored on objective scientific data.

Dataset Information

Consore: A Powerful Federated Data Mining Tool Driving a French Research Network to Accelerate Cancer Research

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets