Unknown

Dataset Information

0

Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data.


ABSTRACT: With the increased use of gene expression profiling for personalized oncology, optimized RNA sequencing (RNA-seq) protocols and algorithms are necessary to provide comparable expression measurements between exome capture (EC)-based and poly-A RNA-seq. Here, we developed and optimized an EC-based protocol for processing formalin-fixed, paraffin-embedded samples and a machine-learning algorithm, Procrustes, to overcome batch effects across RNA-seq data obtained using different sample preparation protocols like EC-based or poly-A RNA-seq protocols. Applying Procrustes to samples processed using EC and poly-A RNA-seq protocols showed the expression of 61% of genes (N = 20,062) to correlate across both protocols (concordance correlation coefficient > 0.8, versus 26% before transformation by Procrustes), including 84% of cancer-specific and cancer microenvironment-related genes (versus 36% before applying Procrustes; N = 1,438). Benchmarking analyses also showed Procrustes to outperform other batch correction methods. Finally, we showed that Procrustes can project RNA-seq data for a single sample to a larger cohort of RNA-seq data. Future application of Procrustes will enable direct gene expression analysis for single tumor samples to support gene expression-based treatment decisions.

SUBMITTER: Kotlov N 

PROVIDER: S-EPMC10981711 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications


With the increased use of gene expression profiling for personalized oncology, optimized RNA sequencing (RNA-seq) protocols and algorithms are necessary to provide comparable expression measurements between exome capture (EC)-based and poly-A RNA-seq. Here, we developed and optimized an EC-based protocol for processing formalin-fixed, paraffin-embedded samples and a machine-learning algorithm, Procrustes, to overcome batch effects across RNA-seq data obtained using different sample preparation p  ...[more]

Similar Datasets

| S-EPMC4736986 | biostudies-literature
| S-EPMC9968332 | biostudies-literature
| S-EPMC4101981 | biostudies-literature
| S-EPMC7763177 | biostudies-literature
| S-EPMC7854649 | biostudies-literature
| S-EPMC5905914 | biostudies-other
| S-EPMC9284682 | biostudies-literature
| S-EPMC7442834 | biostudies-literature
| S-EPMC10187255 | biostudies-literature
| S-EPMC10827746 | biostudies-literature