Dataset Information

Characterizing Human Cell Types and Tissue Origin Using the Benford Law.

ABSTRACT: Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL. Then, we used the Benford adherence scores of these genes as inputs to machine-learning algorithms and tested their separation accuracy. We found that genes selected based on their first-digit distributions can distinguish between cell types and tissue origins. Moreover, despite the simplicity of this novel feature-selection method, its separation accuracy is higher than that of the mean-expression level approach and is similar to that of the differential expression approach. Thus, the BL can be used to obtain biological insights from massive amounts of numerical genomics data-a capability that could be utilized in various biomedical applications, e.g., to resolve samples of unknown primary origin, identify possible sample contaminations, and provide insights into the molecular basis of cancer subtypes.

SUBMITTER: Morag S

PROVIDER: S-EPMC6770594 | biostudies-literature | 2019 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Characterizing Human Cell Types and Tissue Origin Using the Benford Law.

Morag Sne S Salmon-Divon Mali M

Cells 20190829 9

Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL ...[more]

PMID: 31470662

Dataset Information

Characterizing Human Cell Types and Tissue Origin Using the Benford Law.

Publications

Characterizing Human Cell Types and Tissue Origin Using the Benford Law.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Elucidating tissue specific genes using the Benford distribution.
| S-EPMC4979126 | biostudies-literature

Newcomb-Benford law and the detection of frauds in international trade.
| S-EPMC6320519 | biostudies-literature

Jointly characterizing epigenetic dynamics across multiple human cell types.
| S-EPMC5772166 | biostudies-literature

Using the Newcomb-Benford law to study the association between a country's COVID-19 reporting accuracy and its development.
| S-EPMC8617306 | biostudies-literature

In Situ Classification of Cell Types in Human Kidney Tissue Using 3D Nuclear Staining.
| S-EPMC8382162 | biostudies-literature

Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor.
| S-EPMC5830442 | biostudies-literature

Unraveling the origin of exponential law in intra-urban human mobility.
| S-EPMC3798880 | biostudies-literature

Characterizing human pluripotent-stem-cell-derived vascular cells for tissue engineering applications.
| S-EPMC4313392 | biostudies-literature

Rules of tissue packing involving different cell types: human muscle organization.
| S-EPMC5223128 | biostudies-literature

Characterizing flexible and intrinsically unstructured biological macromolecules by SAS using the Porod-Debye law.
| S-EPMC3103662 | biostudies-literature