Unknown

Dataset Information

0

Phrase mining of textual data to analyze extracellular matrix protein patterns across cardiovascular disease.


ABSTRACT: Extracellular matrix (ECM) proteins have been shown to play important roles regulating multiple biological processes in an array of organ systems, including the cardiovascular system. Using a novel bioinformatics text-mining tool, we studied six categories of cardiovascular disease (CVD), namely, ischemic heart disease, cardiomyopathies, cerebrovascular accident, congenital heart disease, arrhythmias, and valve disease, anticipating novel ECM protein-disease and protein-protein relationships hidden within vast quantities of textual data. We conducted a phrase-mining analysis, delineating the relationships of 709 ECM proteins with the 6 groups of CVDs reported in 1,099,254 abstracts. The technology pipeline known as Context-Aware Semantic Online Analytical Processing was applied to semantically rank the association of proteins to each CVD and all six CVDs, performing analyses to quantify each protein-disease relationship. We performed principal component analysis and hierarchical clustering of the data, where each protein was visualized as a six-dimensional vector. We found that ECM proteins display variable degrees of association with the six CVDs; certain CVDs share groups of associated proteins, whereas others have divergent protein associations. We identified 82 ECM proteins sharing associations with all 6 CVDs. Our bioinformatics analysis ascribed distinct ECM pathways (via Reactome) from this subset of proteins, namely, insulin-like growth factor regulation and interleukin-4 and interleukin-13 signaling, suggesting their contribution to the pathogenesis of all six CVDs. Finally, we performed hierarchical clustering analysis and identified protein clusters predominantly associated with a targeted CVD; analyses of these proteins revealed unexpected insights underlying the key ECM-related molecular pathogenesis of each CVD, including virus assembly and release in arrhythmias. NEW & NOTEWORTHY The present study is the first application of a text-mining algorithm to characterize the relationships of 709 extracellular matrix-related proteins with 6 categories of cardiovascular disease described in 1,099,254 abstracts. Our analysis informed unexpected extracellular matrix functions, pathways, and molecular relationships implicated in the six cardiovascular diseases.

SUBMITTER: Liem DA 

PROVIDER: S-EPMC6230912 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Phrase mining of textual data to analyze extracellular matrix protein patterns across cardiovascular disease.

Liem David A DA   Murali Sanjana S   Sigdel Dibakar D   Shi Yu Y   Wang Xuan X   Shen Jiaming J   Choi Howard H   Caufield John H JH   Wang Wei W   Ping Peipei P   Han JiaWei J  

American journal of physiology. Heart and circulatory physiology 20180518 4


Extracellular matrix (ECM) proteins have been shown to play important roles regulating multiple biological processes in an array of organ systems, including the cardiovascular system. Using a novel bioinformatics text-mining tool, we studied six categories of cardiovascular disease (CVD), namely, ischemic heart disease, cardiomyopathies, cerebrovascular accident, congenital heart disease, arrhythmias, and valve disease, anticipating novel ECM protein-disease and protein-protein relationships hid  ...[more]

Similar Datasets

| S-EPMC7525263 | biostudies-literature
| S-EPMC1555615 | biostudies-literature
| S-EPMC3128586 | biostudies-other
| S-EPMC7332573 | biostudies-literature
| S-EPMC5784921 | biostudies-literature
| S-EPMC6954439 | biostudies-literature
| S-EPMC5039192 | biostudies-literature
| S-EPMC7318947 | biostudies-literature
| S-EPMC5203946 | biostudies-other
| S-EPMC6698742 | biostudies-literature