Project description:The most widely used task functional magnetic resonance imaging (fMRI) analyses use parametric statistical methods that depend on a variety of assumptions. In this work, we use real resting-state data and a total of 3 million random task group analyses to compute empirical familywise error rates for the fMRI software packages SPM, FSL, and AFNI, as well as a nonparametric permutation method. For a nominal familywise error rate of 5%, the parametric statistical methods are shown to be conservative for voxelwise inference and invalid for clusterwise inference. Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape. By comparison, the nonparametric permutation test is found to produce nominal results for voxelwise as well as clusterwise inference. These findings speak to the need of validating the statistical methods being used in the field of neuroimaging.
Project description:Recent reports of inflated false-positive rates (FPRs) in FMRI group analysis tools by Eklund and associates in 2016 have become a large topic within (and outside) neuroimaging. They concluded that existing parametric methods for determining statistically significant clusters had greatly inflated FPRs ("up to 70%," mainly due to the faulty assumption that the noise spatial autocorrelation function is Gaussian shaped and stationary), calling into question potentially "countless" previous results; in contrast, nonparametric methods, such as their approach, accurately reflected nominal 5% FPRs. They also stated that AFNI showed "particularly high" FPRs compared to other software, largely due to a bug in 3dClustSim. We comment on these points using their own results and figures and by repeating some of their simulations. Briefly, while parametric methods show some FPR inflation in those tests (and assumptions of Gaussian-shaped spatial smoothness also appear to be generally incorrect), their emphasis on reporting the single worst result from thousands of simulation cases greatly exaggerated the scale of the problem. Importantly, FPR statistics depends on "task" paradigm and voxelwise p value threshold; as such, we show how results of their study provide useful suggestions for FMRI study design and analysis, rather than simply a catastrophic downgrading of the field's earlier results. Regarding AFNI (which we maintain), 3dClustSim's bug effect was greatly overstated-their own results show that AFNI results were not "particularly" worse than others. We describe further updates in AFNI for characterizing spatial smoothness more appropriately (greatly reducing FPRs, although some remain >5%); in addition, we outline two newly implemented permutation/randomization-based approaches producing FPRs clustered much more tightly about 5% for voxelwise p ≤ 0.01.
Project description:The false positive rates (FPR) for surface-based group analysis of cortical thickness, surface area, and volume were evaluated for parametric and non-parametric clusterwise correction for multiple comparisons for a range of smoothing levels and cluster-forming thresholds (CFT) using real data under group assignments that should not yield significant results. For whole cortical surface analysis, thickness showed modest inflation in parametric FPRs above the nominal level (10% versus 5%). Surface area and volume FPRs were much higher (20-30%). In the analysis of interhemispheric thickness asymmetries, FPRs were well controlled by parametric correction, but FPRs for surface area and volume asymmetries were still inflated. In all cases, non-parametric permutation adequately controlled the FPRs. It was found that inflated parametric FPRs were caused by violations in the parametric assumptions, namely a heavier-than-Gaussian spatial correlation. The non-Gaussian spatial correlation originates from anatomical features unique to individuals (e.g., a patch of cortex slightly thicker or thinner than average) and is not a by-product of scanning or processing. Thickness performed better than surface area and volume because thickness does not require a Jacobian correction.
Project description:Two-point linkage analyses of whole genome sequence data are a promising approach to identify rare variants that segregate with complex diseases in large pedigrees because, in theory, the causal variants have been genotyped. We used whole genome sequence data and simulated traits provided by Genetic Analysis Workshop 18 to evaluate the proportion of false-positive findings in a binary trait using classic two-point parametric linkage analysis. False-positive genome-wide significant log of odds (LOD) scores were identified in more than 80% of 50 replicates for a binary phenotype generated by dichotomizing a quantitative trait that was simulated with a polygenic component (that was not based on any of the provided whole genome sequence genotypes). In contrast, when the trait was truly nongenetic (created by randomly assigning affected-unaffected status), the number of false-positive results was well controlled. These results suggest that when using two-point linkage analyses on whole genome sequence data, one should carefully examine regions yielding significant two-point LOD scores with multipoint analysis and that a more stringent significance threshold may be needed.
Project description:BackgroundPrior genome-wide association studies have identified numerous lung cancer risk loci and reveal substantial etiologic heterogeneity across histologic subtypes. Analyzing the shared genetic architecture underlying variation in complex traits can elucidate common genetic etiologies across phenotypes. Exploring pairwise genetic correlations between lung cancer and other polygenic traits can reveal the common genetic etiology of correlated phenotypes.MethodsUsing cross-trait linkage disequilibrium score regression, we estimated the pairwise genetic correlation and heritability between lung cancer and multiple traits using publicly available summary statistics. Identified genetic relationships were also examined after excluding genomic regions known to be associated with smoking behaviors, a major risk factor for lung cancer.ResultsWe observed several traits showing moderate single nucleotide polymorphism-based heritability and significant genetic correlations with lung cancer. We observed highly significant correlations between the genetic architectures of lung cancer and emphysema/chronic bronchitis across all histologic subtypes, as well as among lung cancer occurring among smokers. Our analyses revealed highly significant positive correlations between lung cancer and paternal history of lung cancer. We also observed a strong negative correlation with parental longevity. We observed consistent directions in genetic patterns after excluding genomic regions associated with smoking behaviors.ConclusionsThis study identifies numerous phenotypic traits that share genomic architecture with lung carcinogenesis and are not fully accounted for by known smoking-associated genomic loci.ImpactThese findings provide new insights into the etiology of lung cancer by identifying traits that are genetically correlated with increased risk of lung cancer.
Project description:Issues with inflated false positive rates (FPRs) in brain imaging have recently received significant attention. However, to what extent FPRs present a problem for voxelwise analyses of Positron Emission Tomography (PET) data remains unknown. In this work, we evaluate the FPR using real PET data under group assignments that should yield no significant results after correcting for multiple comparisons. We used data from 159 healthy participants, imaged with the serotonin transporter ([11C]DASB; N = 100) or the 5-HT4 receptor ([11C]SB207145; N = 59). Using this null data, we estimated the FPR by performing 1,000 group analyses with randomly assigned groups of either 10 or 20, for each tracer, and corrected for multiple comparisons using parametric Monte Carlo simulations (MCZ) or non-parametric permutation testing. Our analyses show that for group sizes of 10 or 20, the FPR for both tracers was 5-99% using MCZ, much higher than the expected 5%. This was caused by a heavier-than-Gaussian spatial autocorrelation, violating the parametric assumptions. Permutation correctly controlled the FPR in all cases. In conclusion, either a conservative cluster forming threshold and high smoothing levels, or a non-parametric correction for multiple comparisons should be performed in voxelwise analyses of brain PET data.
Project description:BackgroundIncreasing studies have demonstrated potential disproportionate functional and ecological contributions of rare taxa in a microbial community. However, the study of the microbial rare biosphere is hampered by their inherent scarcity and the deficiency of currently available techniques. Sample-wise cross contaminations might be introduced by sample index misassignment in the most widely used metabarcoding amplicon sequencing approach. Although downstream bioinformatic quality control and clustering or denoising algorithms could remove sequencing errors and non-biological artifact reads, no algorithm could eliminate high quality reads from sample-wise cross contaminations introduced by index misassignment, making it difficult to distinguish between bona fide rare taxa and potential false positives in metabarcoding studies.ResultsWe thoroughly evaluated the rate of index misassignment of the widely used NovaSeq 6000 and DNBSEQ-G400 sequencing platforms using both commercial and customized mock communities, and observed significant lower (0.08% vs. 5.68%) fraction of potential false positive reads for DNBSEQ-G400 as compared to NovaSeq 6000. Significant batch effects could be caused by stochastically introduced false positive or false negative rare taxa. These false detections could also lead to inflated alpha diversity of relatively simple microbial communities and underestimated that of complex ones. Further test using a set of cow rumen samples reported differential rare taxa by different sequencing platforms. Correlation analysis of the rare taxa detected by each sequencing platform demonstrated that the rare taxa identified by DNBSEQ-G400 platform had a much higher possibility to be correlated with the physiochemical properties of rumen fluid as compared to NovaSeq 6000 platform. Community assembly mechanism and microbial network correlation analysis indicated that false positive or negative rare taxa detection could lead to biased community assembly mechanism and identification of fake keystone species of the community.ConclusionsWe highly suggest proper positive/negative/blank controls, technical replicate settings, and proper sequencing platform selection in future amplicon studies, especially when the microbial rare biosphere would be focused.
Project description:Retention is the most common complication of capsule endoscopy (CE), and is reported to occur in 0-13% of cases. To avoid retention, a PillCam patency capsule (PC) is used in patients with suspected intestinal stenosis. However, a relatively low positive predictive value of the PC examination has been reported previously. The aims of this study were to clarify the accuracy of PC examination and to evaluate clinical factors related to cases of false-positive detection.We performed a retrospective single-center study of 282 consecutive patients referred for PC examination. Patients in which the PC could not pass through the small bowel within 33 h were classified into the 'no patency' group. The 'no patency' group was investigated for evidence of significant stenosis upon further examinations, including CE, double-balloon endoscopy, and small bowel follow-through after PC examination. Clinical factors related to small bowel patency and false-positive cases were evaluated.We included 161 male (57.1%) and 121 female (42.9%) patients with a mean age of 47.5 ± 17.7 years. Of the 282 patients enrolled, 27 patients exhibited 'no patency' upon PC examination. Multivariate analysis showed that clinical factors related to 'no patency' included Crohn's disease, abdominal symptoms, stenosis upon imaging, and previous abdominal surgery. Upon further examination, nine cases in the 'no patency' group had significant stenosis. Sensitivity, specificity, and negative and positive predictive values of PC examination for detecting small bowel stenosis were 93.8%, 96.6%, 99.6%, and 62.5%, respectively, and the only clinical factor related to false-positive cases was constipation (p < 0.05).We found a relatively low positive predictive value of PC examination and that constipation was related to false-positive results. To extend the implications of CE indications, clinical study focusing on these results is expected.
Project description:Genotype imputation is widely used in genetic studies to boost the power of GWAS, to combine multiple studies for meta-analysis and to perform fine mapping. With advances of imputation tools and large reference panels, genotype imputation has become mature and accurate. However, the uncertain nature of imputed genotypes can cause bias in the downstream analysis. Many studies have compared the performance of popular imputation approaches, but few investigated bias characteristics of downstream association analyses. Herein, we showed that the imputation accuracy is diminished if the real genotypes contain minor alleles. Although these genotypes are less common, which is particularly true for loci with low minor allele frequency, a large discordance between imputed and observed genotypes significantly inflated the association results, especially in data with a large portion of uncertain SNPs. The significant discordance of P-values happened as the P-value approached 0 or the imputation quality was poor. Although elimination of poorly imputed SNPs can remove false positive (FP) SNPs, it sacrificed, sometimes, more than 80% true positive (TP) SNPs. For top ranked SNPs, removing variants with moderate imputation quality cannot reduce the proportion of FP SNPs, and increasing sample size in reference panels did not greatly benefit the results as well. Additionally, samples with a balanced ratio between cases and controls can dramatically improve the number of TP SNPs observed in the imputation based GWAS. These results raise concerns about results from analysis of association studies when rare variants are studied, particularly when case-control studies are unbalanced.
Project description:BackgroundSystematic technical effects-also called batch effects-are a considerable challenge when analyzing DNA methylation (DNAm) microarray data, because they can lead to false results when confounded with the variable of interest. Methods to correct these batch effects are error-prone, as previous findings have shown.ResultsHere, we demonstrate how using the R function ComBat to correct simulated Infinium HumanMethylation450 BeadChip (450 K) and Infinium MethylationEPIC BeadChip Kit (EPIC) DNAm data can lead to a large number of false positive results under certain conditions. We further provide a detailed assessment of the consequences for the highly relevant problem of p-value inflation with subsequent false positive findings after application of the frequently used ComBat method. Using ComBat to correct for batch effects in randomly generated samples produced alarming numbers of false discovery rate (FDR) and Bonferroni-corrected (BF) false positive results in unbalanced as well as in balanced sample distributions in terms of the relation between the outcome of interest variable and the technical position of the sample during the probe measurement. Both sample size and number of batch factors (e.g. number of chips) were systematically simulated to assess the probability of false positive findings. The effect of sample size was simulated using n = 48 up to n = 768 randomly generated samples. Increasing the number of corrected factors led to an exponential increase in the number of false positive signals. Increasing the number of samples reduced, but did not completely prevent, this effect.ConclusionsUsing the approach described, we demonstrate, that using ComBat for batch correction in DNAm data can lead to false positive results under certain conditions and sample distributions. Our results are thus contrary to previous publications, considering a balanced sample distribution as unproblematic when using ComBat. We do not claim completeness in terms of reporting all technical conditions and possible solutions of the occurring problems as we approach the problem from a clinician's perspective and not from that of a computer scientist. With our approach of simulating data, we provide readers with a simple method to assess the probability of false positive findings in DNAm microarray data analysis pipelines.