Dataset Information

NIMBus: a negative binomial regression based Integrative Method for mutation Burden Analysis.

ABSTRACT:

Background

Identifying frequently mutated regions is a key approach to discover DNA elements influencing cancer progression. However, it is challenging to identify these burdened regions due to mutation rate heterogeneity across the genome and across different individuals. Moreover, it is known that this heterogeneity partially stems from genomic confounding factors, such as replication timing and chromatin organization. The increasing availability of cancer whole genome sequences and functional genomics data from the Encyclopedia of DNA Elements (ENCODE) may help address these issues.

Results

We developed a negative binomial regression-based Integrative Method for mutation Burden analysiS (NIMBus). Our approach addresses the over-dispersion of mutation count statistics by (1) using a Gamma-Poisson mixture model to capture the mutation-rate heterogeneity across different individuals and (2) estimating regional background mutation rates by regressing the varying local mutation counts against genomic features extracted from ENCODE. We applied NIMBus to whole-genome cancer sequences from the PanCancer Analysis of Whole Genomes project (PCAWG) and other cohorts. It successfully identified well-known coding and noncoding drivers, such as TP53 and the TERT promoter. To further characterize the burdening of non-coding regions, we used NIMBus to screen transcription factor binding sites in promoter regions that intersect DNase I hypersensitive sites (DHSs). This analysis identified mutational hotspots that potentially disrupt gene regulatory networks in cancer. We also compare this method to other mutation burden analysis methods.

Conclusion

NIMBus is a powerful tool to identify mutational hotspots. The NIMBus software and results are available as an online resource at github.gersteinlab.org/nimbus.

SUBMITTER: Zhang J

PROVIDER: S-EPMC7580035 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

NIMBus: a negative binomial regression based Integrative Method for mutation Burden Analysis.

Zhang Jing J Liu Jason J McGillivray Patrick P Yi Caroline C Lochovsky Lucas L Lee Donghoon D Gerstein Mark M

BMC bioinformatics 20201022 1

<h4>Background</h4>Identifying frequently mutated regions is a key approach to discover DNA elements influencing cancer progression. However, it is challenging to identify these burdened regions due to mutation rate heterogeneity across the genome and across different individuals. Moreover, it is known that this heterogeneity partially stems from genomic confounding factors, such as replication timing and chromatin organization. The increasing availability of cancer whole genome sequences and fu ...[more]

PMID: 33092526

Similar Datasets

Project description:BackgroundThe odds ratio (OR) is used as an important metric of comparison of two or more groups in many biomedical applications when the data measure the presence or absence of an event or represent the frequency of its occurrence. In the latter case, researchers often dichotomize the count data into binary form and apply the well-known logistic regression technique to estimate the OR. In the process of dichotomizing the data, however, information is lost about the underlying counts which can reduce the precision of inferences on the OR.MethodsWe propose analyzing the count data directly using regression models with the log odds link function. With this approach, the parameter estimates in the model have the exact same interpretation as in a logistic regression of the dichotomized data, yielding comparable estimates of the OR. We prove analytically, using the Fisher information matrix, that our approach produces more precise estimates of the OR than logistic regression of the dichotomized data. We also show the gains in precision using simulation studies and real-world datasets. We focus on three related distributions for count data: geometric, Poisson, and negative binomial.ResultsIn simulation studies, confidence intervals for the OR were 56-65% as wide (geometric model), 75-79% as wide (Poisson model), and 61-69% as wide (negative binomial model) as the corresponding interval from a logistic regression produced by dichotomizing the data. When we analyzed existing datasets using our approach, we found that confidence intervals for the OR could be up to 64% shorter (36% as wide) compared to if the data had been dichotomized and analyzed using logistic regression.ConclusionsMore precise estimates of the OR can be obtained directly from the count data by using the log odds link function. This analytic approach is easy to implement in software packages that are capable of fitting generalized linear models or of maximizing user-defined likelihood functions.

Dataset Information

NIMBus: a negative binomial regression based Integrative Method for mutation Burden Analysis.

Background

Results

Conclusion

Publications

NIMBus: a negative binomial regression based Integrative Method for mutation Burden Analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets