Unknown

Dataset Information

0

A bioinformatics analysis of the cell line nomenclature.


ABSTRACT: Cell lines are used extensively in biomedical research, but the nomenclature describing cell lines has not been standardized. The problems are both linguistic and experimental. Many ambiguous cell line names appear in the published literature. Users of the same cell line may refer to it in different ways, and cell lines may mutate or become contaminated without the knowledge of the user. As a first step towards rationalizing this nomenclature, we created a cell line knowledgebase (CLKB) with a well-structured collection of names and descriptive data for cell lines cultured in vitro. The objectives of this work are: (i) to assist users in extracting useful information from biomedical text and (ii) to highlight the importance of standardizing cell line names in biomedical research. This CLKB contains a broad collection of cell line names compiled from ATCC, Hyper CLDB and MeSH. In addition to names, the knowledgebase specifies relationships between cell lines. We analyze the use of cell line names in biomedical text. Issues include ambiguous names, polymorphisms in the use of names and the fact that some cell line names are also common English words. Linguistic patterns associated with the occurrence of cell line names are analyzed. Applying these patterns to find additional cell line names in the literature identifies only a small number of additional names. Annotation of microarray gene expression studies is used as a test case. The CLKB facilitates data exploration and comparison of different cell lines in support of clinical and experimental research.The web ontology file for this cell line collection can be downloaded at http://www.stateslab.org/data/celllineOntology/cellline.zip.

SUBMITTER: Sarntivijai S 

PROVIDER: S-EPMC2639272 | biostudies-literature | 2008 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

A bioinformatics analysis of the cell line nomenclature.

Sarntivijai Sirarat S   Ade Alexander S AS   Athey Brian D BD   States David J DJ  

Bioinformatics (Oxford, England) 20081010 23


<h4>Motivation</h4>Cell lines are used extensively in biomedical research, but the nomenclature describing cell lines has not been standardized. The problems are both linguistic and experimental. Many ambiguous cell line names appear in the published literature. Users of the same cell line may refer to it in different ways, and cell lines may mutate or become contaminated without the knowledge of the user. As a first step towards rationalizing this nomenclature, we created a cell line knowledgeb  ...[more]

Similar Datasets

| S-EPMC6391799 | biostudies-literature
| S-EPMC3888431 | biostudies-literature
| S-EPMC6875409 | biostudies-literature
| S-EPMC4753510 | biostudies-literature
| S-EPMC2706371 | biostudies-literature
| S-EPMC9734853 | biostudies-literature
| S-EPMC2486548 | biostudies-other
2020-07-09 | PXD020267 |
| S-EPMC7790494 | biostudies-literature
| S-ECPF-GEOD-52105 | biostudies-other