Unknown

Dataset Information

0

Single-cell spatial transcriptomic data of synovial tissue from Rheumatoid Arthritis (RA) patients in active disease and sustained clinical remission.


ABSTRACT: We used the Nanostring CosMx Spatial Molecular Imaging platform to measure expression of 960 genes discriminating transcriptional profiles and spatial localization of 127,199 cells (69 fields-of-view (FOV)) in paraffin-embedded synovial biopsies from 3 active and 3 remission RA patients (~11 FOV per donor). Cell Segmentation. Initial image segmentation was performed with Mesmer6189 with the following parameters: mesmer_mode = “both”, scale = pixel size of the images. We used the cell boundaries estimated by Mesmer as a prior for refinement of the segmentation with Baysor6290 based on transcript densities, using the R wrapper (https://github.com/korsunskylab/baysorrr). Following successful cell assignment, we generated a gene-cell expression matrix and performed quality control, removing any cells with less than 30 counts and/or expression of less than 20 genes. Additionally, cells with radius less than 2 µm were also removed. Cells which passed QC filtering were then annotated using pipeline for cell type labelling described in Chen et al. (2024)91. Coarse cell type annotation. Briefly, read counts were normalized and log-transformed to median total counts of all cells remaining after filtering. PCA was performed and embeddings were corrected by integration with Harmony8583 (0.1.0), specifying sigma value of 0.25 and theta values of 0 for both, sample run and FOV batch variables. Harmony corrected PC-embeddings were used to generate two-dimensional UMAP92 (uwot 0.1.16) and cell clusters were identified by shared nearest neighbour (SNN) modularity clustering. Clusters of coarse cell types were annotated based on marker genes identified by differential expression analysis performed using presto wrapper (1.0.0, https://github.com/immunogenomics/presto/tree/glmm/) for Generalised Linear Mixed Model (GLMM) estimation with lme4 (1.1-34) as described in Chen et al. (2024)91. Genes were considered significant when adjusted p value was less than 0.01 and an average logFC more than 0.5. Preparation of scRNAseq for reference annotation of spatial data. Following coarse cell type annotation, CosMx data were integrated with our synovial tissue scRNAseq dataset for reference annotation of subclusters. In preparation for this, our synovial tissue scRNAseq dataset was refined by removing genes that are not present in the CosMx SMI gene (n=922) panel. The data was then re-filtered to remove cells that now have low number of counts/features due to reduced gene panel. Variable features were identified, and the data was renormalized, adjusting the scale factor to account for reduced number of counts (median = 1255). We then followed standard pipeline for Seurat pre-processing and clustering of scRNAseq data, as described above, for coarse cell type annotation. Coarse cell type populations of interest were isolated, and as before, data was re-integrated, a new UMAP generated, and data was re-clustered with reduced gene panel. Any clusters that were indistinguishable with CosMx gene panel were removed. Relevant genes for population of interest arranged by z-score (presto, 1.0.0) and we ran sensitivity analysis by running the pipeline for integration and clustering using from 50-900 genes top variable genes and selecting the minimum number of genes necessary to distinguish our described DC and T-cell subsets. We then harmonized and clustered with minimum relevant genes selected from sensitivity analysis and annotated clusters based on correlation of gene expression with original annotations. Cells with clashing labels were removed and the number of cells per cluster was down sampled to median number cells per cluster. Reference annotation of CosMx spatial data with refined scRNAseq data. Each population of interest (Myeloid, Stromal, Endothelial and T Lymphocyte) from the spatial data was isolated based on coarse cell type annotation for integration with the appropriate scRNAseq reference. CosMx data was reduced to genes selected from sensitivity analysis in preparation of scRNAseq reference for that cell type. This allowed us to minimize noise and focus only on minimum genes necessary to define clusters. The data was then merged and renormalized adjusting the scale factor to account for reduced number of counts between both CosMx and scRNAseq dataset before following standard harmony pipeline for integration across modalities, accounting for source of the data (spatial/scRNAseq, theta=2) and sample ID (donor/sequencing run, theta=0) as batch variables. The integrated dataset was then re-clustered and new integrated clusters were identified. To do so, a heatmap of correlation matrix comparing the marker genes of new clusters with marker genes of the original single cell clusters was visualized. We also generated a confusion matrix – a heatmap illustrating the frequency of cells from original single cell reference clusters within each of the new integrated clusters. In the case that the new integrated cross-modality clusters contained multiple of scRNAseq reference clusters we performed subclustering and revisualization of gene correlation matrix and confusion matrix. Fractions of cells of each cluster from different sources was also visualized as stacked bar plot to identify any populations unique to CosMx spatial technology. The new integrated clusters were automatically reference annotated using the gene correlation matrix, annotating new clusters with the name scRNAseq reference cluster with the highest correlation of gene expression. This reference annotation was also performed manually, and results compared to finalize annotations before transferring new cell labels. Once all coarse cell populations of interest in spatial transcriptomic were isolated, integrated with scRNAseq reference, re-clustered and annotated, the new fine type cell annotations were transferred to the original CosMx spatial dataset containing all cell types. Spatial localization of coarse and fine type cell annotations were plotted using ggplot2 (geom_sf) allowing for visualization of cell geometries identified from segmentation (described above) manipulated using sf package (1.0.16). Niche and colocalization analysis of CosMx spatial transcriptomic data. To do spatial segmentation we first identify low-quality regions within the tissue, performing the following steps: (1) FOV region annotation and gridding, (2) spatial smoothing, and (3) dimensional reduction and clustering. FOV region annotation and gridding. We gridded the cellular region of each FOV by performing Voronoi tessellation on the cell centroids with the FOV boundary as the bounding box. Voronoi tessellation divides the space such that: Distance (PVk,Ci=k)≤ Distance(PVk,Ci,i≠k) where PVk is any point P(x,y) in the Voronoi region Vk, and Ci is the centroid of the Voronoi region Vi. Because Voronoi tessellation grids the whole FOV irrespective of empty spaces within the tissue, we chose to perform Voronoi tessellation only between cells that are less than 50 µm apart from at least one other cell. Cells that are over 50µm apart from other cells are included in the analysis but with their original cell polygons instead of Voronoi regions. For most of the FOVs, we observed a gap between the last layer of cells and the FOV boundary. This led to edge effects where the cells closer to the edges had elongated shapes. To correct this, we changed the shapes of Voronoi regions of the edge cells to an intersection between a circular buffer of 15µm from the cell centroid of the boundary cells and the corresponding Voronoi region. This marked the end of gridding of the cellular region of the tissue. We merged all the Voronoi regions in each FOV and annotated it as “tissue”. We determined “glass” regions in each FOV by finding the non-intersecting region between the bounding box of the FOV and a 30µm buffered tissue region of the FOV. We buffered the tissue region to ensure we didn’t capture probes in the boundary regions between glass and tissue. Our rationale behind ignoring boundary transcripts is that these probes could belong to cells but were not assigned to cells due to segmentation errors. Grouping these into “glass” regions could skew our background identification. We then tiled the glass region of the FOV into 4-sided polygons that contain the same number of transcripts as the mean number of transcripts per Voronoi region in that FOV. Spatial smoothing. To construct the gene expression matrix of the tissue region, we mapped only the transcripts (both positive and negative probes) assigned to cells during segmentation to Voronoi regions. Because negative probes are excluded during cell segmentation, we assigned negative probes to cells by assigning a cell ID to a negative probe if it was within a cell boundary and 0 otherwise. To construct the glass region's gene expression matrix, we used the “st_intersect” function to map transcripts to the glass tiles. We then combined both expression matrices to build a gene-polygon matrix for each FOV. From this point on, we will refer to both the Voronoi regions and the glass tiles as “polygons” and original cell shapes as “cell polygons”. To perform spatial smoothing, we ensured each cell captures a fraction of its neighbors (in addition to all transcripts from itself) in a diffusion-based method controlling for how aggressively we borrow transcripts (l) from our neighbors and how many degrees of neighbors we want to borrow transcripts from (k). The first step of spatial smoothing is to construct an adjacency matrix. We did that by constructing an unweighted Delaunay graph on the polygon centroids and pruning the edges between tissue and glass polygons. Pruning is important because our goal was to identify regions in the tissue that have similar gene expression profiles as glass, and borrowing transcripts from glass would make some tissue regions look like glass because of smoothing and not because they are low quality. After calculating the adjacency matrix, we smoothed it by diffusion process where the smoothed matrix M is calculated as: M = (I + lA)^k Where I is the Identity matrix, l is the rate of diffusion, A is the adjacency matrix, and k is the number of steps of diffusion. We row-normalized the smoothed matrix and built the smoothed gene expression matrix (G) as: G=G_raw*t(M) Dimension reduction and clustering. We then performed log-normalization, scaling, weighted-PCA, Harmony to correct for batch effects (sigma = 0.2, batch variables = SampleID, SampleFOV, nPCs = 20), UMAP, and clustering as described in the cell type labeling section to identify the tissue regions clustering with glass regions. These regions were labeled “low-quality” regions and removed from the analysis. Region annotation. To identify regions, we perform spatial smoothing, dimension reduction and clustering as described above on high-quality tissue regions. Clusters are annotated based on their cell composition. Furthermore, we performed colocalization analysis to define organization of cell subsets within the described tissue niches. Applying a permutation approach, as in Chen et al.91, we identified nearest neighbours and then randomized the positions of cells surrounding the defined cell type of interest and determined whether or not the colocalization of two subsets was not expected by chance. Significant colocalizations (adjusted pvalue < 0.05) were plotted as Z-score.

ORGANISM(S): Homo sapiens (human)

SUBMITTER:  

PROVIDER: S-BSST1483 | biostudies-other |

REPOSITORIES: biostudies-other

Similar Datasets

2024-12-06 | E-MTAB-14213 | biostudies-arrayexpress
2020-04-18 | E-MTAB-8322 | biostudies-arrayexpress
| S-EPMC6474583 | biostudies-literature
| S-EPMC5880849 | biostudies-literature
2012-12-19 | GSE37425 | GEO
| S-EPMC8251940 | biostudies-literature
2012-12-19 | E-GEOD-37425 | biostudies-arrayexpress
| S-EPMC7313155 | biostudies-literature
| S-EPMC7615888 | biostudies-literature
| S-EPMC3773654 | biostudies-literature