Browse
Submit Data
Databases
API
Help

Dataset Information

17 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Fast analysis of scATAC-seq data using a predefined set of genomic regions.

ABSTRACT: Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. Methods: Public data for 10k PBMC were downloaded from 10x Genomics web site. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We compared our results with the ones publicly available derived by cellranger-atac. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We also found that cell identification is robust when analysis is performed using DHS-derived reference in place of de novo identification of ATAC peaks. Lastly, we found that our approach is suitable for reliable quantification of gene activity based on scATAC-seq signal, thus allows for efficient labelling of cell groups based on marker genes. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations.

SUBMITTER: Giansanti V

PROVIDER: S-EPMC7308914 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

A scATAC-seq atlas of chromatin accessibility in axolotl brain regions.

Project description:Axolotl (Ambystoma mexicanum) is an excellent model for investigating regeneration, the interaction between regenerative and developmental processes, comparative genomics, and evolution. The brain, which serves as the material basis of consciousness, learning, memory, and behavior, is the most complex and advanced organ in axolotl. The modulation of transcription factors is a crucial aspect in determining the function of diverse regions within the brain. There is, however, no comprehensive understanding of the gene regulatory network of axolotl brain regions. Here, we utilized single-cell ATAC sequencing to generate the chromatin accessibility landscapes of 81,199 cells from the olfactory bulb, telencephalon, diencephalon and mesencephalon, hypothalamus and pituitary, and the rhombencephalon. Based on these data, we identified key transcription factors specific to distinct cell types and compared cell type functions across brain regions. Our results provide a foundation for comprehensive analysis of gene regulatory programs, which are valuable for future studies of axolotl brain development, regeneration, and evolution, as well as on the mechanisms underlying cell-type diversity in vertebrate brains.

| S-EPMC10502032 | biostudies-literature

Multiplexed Analysis of Retinal Gene Expression and Chromatin Accessibility using scRNA-Seq and scATAC-Seq.

Project description:Powerful next generation sequencing techniques offer robust and comprehensive analysis to investigate how retinal gene regulatory networks function during development and in disease states. Single-cell RNA sequencing allows us to comprehensively profile gene expression changes observed in retinal development and disease at a cellular level, while single-cell ATAC-Seq allows analysis of chromatin accessibility and transcription factor binding to be profiled at similar resolution. Here the use of these techniques in the developing retina is described, and MULTI-Seq is demonstrated, where individual samples are labeled with a modified oligonucleotide-lipid complex, enabling researchers to both increase the scope of individual experiments and substantially reduce costs.

| S-EPMC8356148 | biostudies-literature

MOCHA's advanced statistical modeling of scATAC-seq data enables functional genomic inference in large human cohorts.

Project description:Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) is being increasingly used to study gene regulation. However, major analytical gaps limit its utility in studying gene regulatory programs in complex diseases. In response, MOCHA (Model-based single cell Open CHromatin Analysis) presents major advances over existing analysis tools, including: 1) improving identification of sample-specific open chromatin, 2) statistical modeling of technical drop-out with zero-inflated methods, 3) mitigation of false positives in single cell analysis, 4) identification of alternative transcription-starting-site regulation, and 5) modules for inferring temporal gene regulatory networks from longitudinal data. These advances, in addition to open chromatin analyses, provide a robust framework after quality control and cell labeling to study gene regulatory programs in human disease. We benchmark MOCHA with four state-of-the-art tools to demonstrate its advances. We also construct cross-sectional and longitudinal gene regulatory networks, identifying potential mechanisms of COVID-19 response. MOCHA provides researchers with a robust analytical tool for functional genomic inference from scATAC-seq data.

| S-EPMC11316085 | biostudies-literature

scNCL: transferring labels from scRNA-seq to scATAC-seq data with neighborhood contrastive regularization.

Project description:MotivationscATAC-seq has enabled chromatin accessibility landscape profiling at the single-cell level, providing opportunities for determining cell-type-specific regulation codes. However, high dimension, extreme sparsity, and large scale of scATAC-seq data have posed great challenges to cell-type identification. Thus, there has been a growing interest in leveraging the well-annotated scRNA-seq data to help annotate scATAC-seq data. However, substantial computational obstacles remain to transfer information from scRNA-seq to scATAC-seq, especially for their heterogeneous features.ResultsWe propose a new transfer learning method, scNCL, which utilizes prior knowledge and contrastive learning to tackle the problem of heterogeneous features. Briefly, scNCL transforms scATAC-seq features into gene activity matrix based on prior knowledge. Since feature transformation can cause information loss, scNCL introduces neighborhood contrastive learning to preserve the neighborhood structure of scATAC-seq cells in raw feature space. To learn transferable latent features, scNCL uses a feature projection loss and an alignment loss to harmonize embeddings between scRNA-seq and scATAC-seq. Experiments on various datasets demonstrated that scNCL not only realizes accurate and robust label transfer for common types, but also achieves reliable detection of novel types. scNCL is also computationally efficient and scalable to million-scale datasets. Moreover, we prove scNCL can help refine cell-type annotations in existing scATAC-seq atlases.Availability and implementationThe source code and data used in this paper can be found in https://github.com/CSUBioGroup/scNCL-release.

| S-EPMC10457667 | biostudies-literature

scDART: integrating unmatched scRNA-seq and scATAC-seq data and learning cross-modality relationship simultaneously.

Project description:It is a challenging task to integrate scRNA-seq and scATAC-seq data obtained from different batches. Existing methods tend to use a pre-defined gene activity matrix to convert the scATAC-seq data into scRNA-seq data. The pre-defined gene activity matrix is often of low quality and does not reflect the dataset-specific relationship between the two data modalities. We propose scDART, a deep learning framework that integrates scRNA-seq and scATAC-seq data and learns cross-modalities relationships simultaneously. Specifically, the design of scDART allows it to preserve cell trajectories in continuous cell populations and can be applied to trajectory inference on integrated data.

| S-EPMC9238247 | biostudies-literature

scAWMV: an adaptively weighted multi-view learning framework for the integrative analysis of parallel scRNA-seq and scATAC-seq data.

Project description:MotivationTechnological advances have enabled us to profile single-cell multi-omics data from the same cells, providing us with an unprecedented opportunity to understand the cellular phenotype and links to its genotype. The available protocols and multi-omics datasets [including parallel single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) data profiled from the same cell] are growing increasingly. However, such data are highly sparse and tend to have high level of noise, making data analysis challenging. The methods that integrate the multi-omics data can potentially improve the capacity of revealing the cellular heterogeneity.ResultsWe propose an adaptively weighted multi-view learning (scAWMV) method for the integrative analysis of parallel scRNA-seq and scATAC-seq data profiled from the same cell. scAWMV considers both the difference in importance across different modalities in multi-omics data and the biological connection of the features in the scRNA-seq and scATAC-seq data. It generates biologically meaningful low-dimensional representations for the transcriptomic and epigenomic profiles via unsupervised learning. Application to four real datasets demonstrates that our framework scAWMV is an efficient method to dissect cellular heterogeneity for single-cell multi-omics data.Availability and implementationThe software and datasets are available at https://github.com/pengchengzeng/scAWMV.Supplementary informationSupplementary data are available at Bioinformatics online.

| S-EPMC9805575 | biostudies-literature

Building gene regulatory networks from scATAC-seq and scRNA-seq using Linked Self Organizing Maps.

Project description:Rapid advances in single-cell assays have outpaced methods for analysis of those data types. Different single-cell assays show extensive variation in sensitivity and signal to noise levels. In particular, scATAC-seq generates extremely sparse and noisy datasets. Existing methods developed to analyze this data require cells amenable to pseudo-time analysis or require datasets with drastically different cell-types. We describe a novel approach using self-organizing maps (SOM) to link scATAC-seq regions with scRNA-seq genes that overcomes these challenges and can generate draft regulatory networks. Our SOMatic package generates chromatin and gene expression SOMs separately and combines them using a linking function. We applied SOMatic on a mouse pre-B cell differentiation time-course using controlled Ikaros over-expression to recover gene ontology enrichments, identify motifs in genomic regions showing similar single-cell profiles, and generate a gene regulatory network that both recovers known interactions and predicts new Ikaros targets during the differentiation process. The ability of linked SOMs to detect emergent properties from multiple types of highly-dimensional genomic data with very different signal properties opens new avenues for integrative analysis of heterogeneous data.

| S-EPMC6855564 | biostudies-literature

Fast and interpretable genomic data analysis using multiple approximate kernel learning.

Project description:MotivationDataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that integrates kernel approximation and group Lasso formulations into a conjoint model. Our method extracts significant and meaningful information from the genomic data while conjointly learning a model for out-of-sample prediction. It is scalable with increasing sample size by approximating instead of calculating distinct kernel matrices.ResultsTo test our computational framework, namely, Multiple Approximate Kernel Learning (MAKL), we demonstrated our experiments on three cancer datasets and showed that MAKL is capable to outperform the baseline algorithm while using only a small fraction of the input features. We also reported selection frequencies of approximated kernel matrices associated with feature subsets (i.e. gene sets/pathways), which helps to see their relevance for the given classification task. Our fast and interpretable MKL algorithm producing sparse solutions is promising for computational biology applications considering its scalability and highly correlated structure of genomic datasets, and it can be used to discover new biomarkers and new therapeutic guidelines.Availability and implementationMAKL is available at https://github.com/begumbektas/makl together with the scripts that replicate the reported experiments. MAKL is also available as an R package at https://cran.r-project.org/web/packages/MAKL.Supplementary informationSupplementary data are available at Bioinformatics online.

| S-EPMC9235505 | biostudies-literature

Multidimensional gene set analysis of genomic data.

Project description:Understanding the functional implications of changes in gene expression, mutations, etc., is the aim of most genomic experiments. To achieve this, several functional profiling methods have been proposed. Such methods study the behaviour of different gene modules (e.g. gene ontology terms) in response to one particular variable (e.g. differential gene expression). In spite to the wealth of information provided by functional profiling methods, a common limitation to all of them is their inherent unidimensional nature. In order to overcome this restriction we present a multidimensional logistic model that allows studying the relationship of gene modules with different genome-scale measurements (e.g. differential expression, genotyping association, methylation, copy number alterations, heterozygosity, etc.) simultaneously. Moreover, the relationship of such functional modules with the interactions among the variables can also be studied, which produces novel results impossible to be derived from the conventional unidimensional functional profiling methods. We report sound results of gene sets associations that remained undetected by the conventional one-dimensional gene set analysis in several examples. Our findings demonstrate the potential of the proposed approach for the discovery of new cell functionalities with complex dependences on more than one variable.

| S-EPMC2860497 | biostudies-literature

Integrative analysis of scRNA-seq and scATAC-seq revealed transit-amplifying thymic epithelial cells expressing autoimmune regulator.

Project description:Medullary thymic epithelial cells (mTECs) are critical for self-tolerance induction in T cells via promiscuous expression of tissue-specific antigens (TSAs), which are controlled by the transcriptional regulator, AIRE. Whereas AIRE-expressing (Aire+) mTECs undergo constant turnover in the adult thymus, mechanisms underlying differentiation of postnatal mTECs remain to be discovered. Integrative analysis of single-cell assays for transposase-accessible chromatin (scATAC-seq) and single-cell RNA sequencing (scRNA-seq) suggested the presence of proliferating mTECs with a specific chromatin structure, which express high levels of Aire and co-stimulatory molecules, CD80 (Aire+CD80hi). Proliferating Aire+CD80hi mTECs detected using Fucci technology express a minimal number of Aire-dependent TSAs and are converted into quiescent Aire+CD80hi mTECs expressing high levels of TSAs after a transit amplification. These data provide evidence for the existence of transit-amplifying Aire+mTEC precursors during the Aire+mTEC differentiation process of the postnatal thymus.

| S-EPMC9113748 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data