Unknown

Dataset Information

0

A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data.


ABSTRACT: Accurate identification of the cancer types is essential to cancer diagnoses and treatments. Since cancer tissue and normal tissue have different gene expression, gene expression data can be used as an efficient feature source for cancer classification. However, accurate cancer classification directly using original gene expression profiles remains challenging due to the intrinsic high-dimension feature and the small size of the data samples. We proposed a new self-training subspace clustering algorithm under low-rank representation, called SSC-LRR, for cancer classification on gene expression data. Low-rank representation (LRR) is first applied to extract discriminative features from the high-dimensional gene expression data; the self-training subspace clustering (SSC) method is then used to generate the cancer classification predictions. The SSC-LRR was tested on two separate benchmark datasets in control with four state-of-the-art classification methods. It generated cancer classification predictions with an overall accuracy 89.7 percent and a general correlation 0.920, which are 18.9 and 24.4 percent higher than that of the best control method respectively. In addition, several genes (RNF114, HLA-DRB5, USP9Y, and PTPN20) were identified by SSC-LRR as new cancer identifiers that deserve further clinical investigation. Overall, the study demonstrated a new sensitive avenue to recognize cancer classifications from large-scale gene expression data.

SUBMITTER: Xia CQ 

PROVIDER: S-EPMC5986621 | biostudies-literature | 2018 Jul-Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data.

Xia Chun-Qiu CQ   Han Ke K   Qi Yong Y   Zhang Yang Y   Yu Dong-Jun DJ  

IEEE/ACM transactions on computational biology and bioinformatics 20170606 4


Accurate identification of the cancer types is essential to cancer diagnoses and treatments. Since cancer tissue and normal tissue have different gene expression, gene expression data can be used as an efficient feature source for cancer classification. However, accurate cancer classification directly using original gene expression profiles remains challenging due to the intrinsic high-dimension feature and the small size of the data samples. We proposed a new self-training subspace clustering a  ...[more]

Similar Datasets

| S-EPMC6509871 | biostudies-literature
| S-EPMC10070828 | biostudies-literature
| S-EPMC3602020 | biostudies-literature
| S-EPMC6504107 | biostudies-literature
| S-EPMC5441581 | biostudies-literature
| S-EPMC7611820 | biostudies-literature
| S-EPMC5953310 | biostudies-literature
| S-EPMC2383906 | biostudies-literature
| S-EPMC5641478 | biostudies-literature
| S-EPMC5539778 | biostudies-other