Unknown

Dataset Information

0

Bipartite tight spectral clustering (BiTSC) algorithm for identifying conserved gene co-clusters in two species.


ABSTRACT:

Motivation

Gene clustering is a widely used technique that has enabled computational prediction of unknown gene functions within a species. However, it remains a challenge to refine gene function prediction by leveraging evolutionarily conserved genes in another species. This challenge calls for a new computational algorithm to identify gene co-clusters in two species, so that genes in each co-cluster exhibit similar expression levels in each species and strong conservation between the species.

Results

Here, we develop the bipartite tight spectral clustering (BiTSC) algorithm, which identifies gene co-clusters in two species based on gene orthology information and gene expression data. BiTSC novelly implements a formulation that encodes gene orthology as a bipartite network and gene expression data as node covariates. This formulation allows BiTSC to adopt and combine the advantages of multiple unsupervised learning techniques: kernel enhancement, bipartite spectral clustering, consensus clustering, tight clustering and hierarchical clustering. As a result, BiTSC is a flexible and robust algorithm capable of identifying informative gene co-clusters without forcing all genes into co-clusters. Another advantage of BiTSC is that it does not rely on any distributional assumptions. Beyond cross-species gene co-clustering, BiTSC also has wide applications as a general algorithm for identifying tight node co-clusters in any bipartite network with node covariates. We demonstrate the accuracy and robustness of BiTSC through comprehensive simulation studies. In a real data example, we use BiTSC to identify conserved gene co-clusters of Drosophila melanogaster and Caenorhabditis elegans, and we perform a series of downstream analysis to both validate BiTSC and verify the biological significance of the identified co-clusters.

Availability and implementation

The Python package BiTSC is open-access and available at https://github.com/edensunyidan/BiTSC.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Sun YE 

PROVIDER: S-EPMC8599197 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC4908362 | biostudies-literature
| S-EPMC7299311 | biostudies-literature
| S-EPMC8336801 | biostudies-literature
| S-EPMC4807765 | biostudies-literature
| S-EPMC3244763 | biostudies-literature
| S-EPMC6409843 | biostudies-other
| S-EPMC7275956 | biostudies-literature
| S-EPMC5793492 | biostudies-literature
2022-03-16 | PXD022124 | Pride
| S-EPMC8187398 | biostudies-literature