Unknown

Dataset Information

0

Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis.


ABSTRACT: Motivation:Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse. Results:We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database (ReGEO). ReGEO reorganizes and categorizes GEO series and makes them searchable by two new attributes extracted automatically from each series' metadata. These attributes are the number of time points tested in the experiment and the disease being investigated. ReGEO also makes series searchable by other attributes available in GEO, such as platform organism, experiment type, associated PubMed ID as well as general keywords in the study's description. Our approach greatly expands the usability of GEO data, demonstrating a credible approach to improve the utility of vast amount of publicly available data in the era of Big Data research.

SUBMITTER: Chen G 

PROVIDER: S-EPMC6333964 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis.

Chen Guocai G   Ramírez Juan Camilo JC   Deng Nan N   Qiu Xing X   Wu Canglin C   Zheng W Jim WJ   Wu Hulin H  

Database : the journal of biological databases and curation 20190101


<h4>Motivation</h4>Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse.<h4>Results</h4>We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database (ReGEO). ReGEO reorganizes and categorizes GEO series and makes them searchable by two new attributes extracted automatically from each series' metadata. These attributes are the number  ...[more]

Similar Datasets

| S-EPMC5643580 | biostudies-literature
| S-EPMC1619899 | biostudies-other
| S-EPMC5868185 | biostudies-other
| S-EPMC7874475 | biostudies-literature
| S-EPMC6438035 | biostudies-literature
| S-EPMC5448611 | biostudies-literature
| S-EPMC6839211 | biostudies-literature
| S-EPMC5751806 | biostudies-literature
| S-EPMC4944384 | biostudies-literature
| S-EPMC1619900 | biostudies-literature