Dataset Information

Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data.

ABSTRACT:

Background

Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thus, post hoc tests, such as linear contrasts, template correlations, and pairwise comparisons are used. Linear contrasts and template correlations work extremely well, especially when the researcher has a priori information pointing to a particular pattern/template among the different treatment groups. Further, all pairwise comparisons can be used to identify particular, treatment group-dependent patterns of gene expression. However, these approaches are biased by the researcher's assumptions, and some treatment-based patterns may fail to be detected using these approaches. Finally, different patterns may have different probabilities of occurring by chance, importantly influencing researchers' conclusions about a pattern and its constituent genes.

Results

We developed a four step, post hoc pattern matching (PPM) algorithm to automate single channel gene expression pattern identification/significance. First, 1-Way Analysis of Variance (ANOVA), coupled with post hoc 'all pairwise' comparisons are calculated for all genes. Second, for each ANOVA-significant gene, all pairwise contrast results are encoded to create unique pattern ID numbers. The # genes found in each pattern in the data is identified as that pattern's 'actual' frequency. Third, using Monte Carlo simulations, those patterns' frequencies are estimated in random data ('random' gene pattern frequency). Fourth, a Z-score for overrepresentation of the pattern is calculated ('actual' against 'random' gene pattern frequencies). We wrote a Visual Basic program (StatiGen) that automates PPM procedure, constructs an Excel workbook with standardized graphs of overrepresented patterns, and lists of the genes comprising each pattern. The visual basic code, installation files for StatiGen, and sample data are available as supplementary material.

Conclusion

The PPM procedure is designed to augment current microarray analysis procedures by allowing researchers to incorporate all of the information from post hoc tests to establish unique, overarching gene expression patterns in which there is no overlap in gene membership. In our hands, PPM works well for studies using from three to six treatment groups in which the researcher is interested in treatment-related patterns of gene expression. Hardware/software limitations and extreme number of theoretical expression patterns limit utility for larger numbers of treatment groups. Applied to a published microarray experiment, the StatiGen program successfully flagged patterns that had been manually assigned in prior work, and further identified other gene expression patterns that may be of interest. Thus, over a moderate range of treatment groups, PPM appears to work well. It allows researchers to assign statistical probabilities to patterns of gene expression that fit a priori expectations/hypotheses, it preserves the data's ability to show the researcher interesting, yet unanticipated gene expression patterns, and assigns the majority of ANOVA-significant genes to non-overlapping patterns.

SUBMITTER: Hulshizer R

PROVIDER: S-EPMC1934919 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:Digital anthropomorphic breast phantoms have emerged in the past decade because of recent advances in 3D breast x-ray imaging techniques. Computer phantoms in the literature have incorporated power-law noise to represent glandular tissue and branching structures to represent linear components such as ducts. When power-law noise is added to those phantoms in one piece, the simulated fibroglandular tissue is distributed randomly throughout the breast, resulting in dense tissue placement that may not be observed in a real breast. The authors describe a method for enhancing an existing digital anthropomorphic breast phantom by adding binarized power-law noise to a limited area of the breast.Phantoms with (0.5 mm)(3) voxel size were generated using software developed by Bakic et al. Between 0% and 40% of adipose compartments in each phantom were replaced with binarized power-law noise (β = 3.0) ranging from 0.1 to 0.6 volumetric glandular fraction. The phantoms were compressed to 7.5 cm thickness, then blurred using a 3 × 3 boxcar kernel and up-sampled to (0.1 mm)(3) voxel size using trilinear interpolation. Following interpolation, the phantoms were adjusted for volumetric glandular fraction using global thresholding. Monoenergetic phantom projections were created, including quantum noise and simulated detector blur. Texture was quantified in the simulated projections using power-spectrum analysis to estimate the power-law exponent β from 25.6 × 25.6 mm(2) regions of interest.Phantoms were generated with total volumetric glandular fraction ranging from 3% to 24%. Values for β (averaged per projection view) were found to be between 2.67 and 3.73. Thus, the range of textures of the simulated breasts covers the textures observed in clinical images.Using these new techniques, digital anthropomorphic breast phantoms can be generated with a variety of glandular fractions and patterns. β values for this new phantom are comparable with published values for breast tissue in x-ray projection modalities. The combination of conspicuous linear structures and binarized power-law noise added to a limited area of the phantom qualitatively improves its realism.

Project description:BackgroundTuberculosis is an important risk factor for chronic respiratory disease in resource poor settings. The persistence of abnormal spirometry and symptoms after treatment are well described, but the structural abnormalities underlying these changes remain poorly defined, limiting our ability to phenotype post-TB lung disease in to meaningful categories for clinical management, prognostication, and ongoing research. The relationship between post-TB lung damage and patient-centred outcomes including functional impairment, respiratory symptoms, and health related quality of life also remains unclear.MethodsWe performed a systematic literature review to determine the prevalence and pattern of imaging-defined lung pathology in adults after medical treatment for pleural, miliary, or pulmonary TB disease. Data were collected on study characteristics, and the modality, timing, and findings of thoracic imaging. The proportion of studies relating imaging findings to spirometry results and patient morbidity was recorded. Study quality was assessed using a modified Newcastle-Ottowa score. (Prospero Registration number CRD42015027958).ResultsWe identified 37 eligible studies. The principle features seen on CXR were cavitation (8.3-83.7%), bronchiectasis (4.3-11.2%), and fibrosis (25.0-70.4%), but prevalence was highly variable. CT imaging identified a wider range of residual abnormalities than CXR, including nodules (25.0-55.8%), consolidation (3.7-19.2%), and emphysema (15.0-45.0%). The prevalence of cavitation was generally lower (7.4-34.6%) and bronchiectasis higher (35.0-86.0%) on CT vs. CXR imaging. A paucity of prospective data, and data from HIV-infected adults and sub-Saharan Africa (sSA) was noted. Few studies related structural damage to physiological impairment, respiratory symptoms, or patient morbidity.ConclusionsPost-TB structural lung pathology is common. Prospective data are required to determine the evolution of this lung damage and its associated morbidity over time. Further data are required from HIV-infected groups and those living in sSA.

Project description:BackgroundSerum antibody-based target identification has been used to identify tumor-associated antigens (TAAs) for development of anti-cancer vaccines. A similar approach can be helpful to identify biologically relevant and clinically meaningful targets in M. tuberculosis (MTB) infection for diagnosis or TB vaccine development in clinically well defined populations.MethodWe constructed a high-content peptide microarray with 61 M. tuberculosis proteins as linear 15 aa peptide stretches with 12 aa overlaps resulting in 7446 individual peptide epitopes. Antibody profiling was carried with serum from 34 individuals with active pulmonary TB and 35 healthy individuals in order to obtain an unbiased view of the MTB epitope pattern recognition pattern. Quality data extraction was performed, data sets were analyzed for significant differences and patterns predictive of TB+/-.FindingsThree distinct patterns of IgG reactivity were identified: 89/7446 peptides were differentially recognized (in 34/34 TB+ patients and in 35/35 healthy individuals) and are highly predictive of the division into TB+ and TB-, other targets were exclusively recognized in all patients with TB (e.g. sigmaF) but not in any of the healthy individuals, and a third peptide set was recognized exclusively in healthy individuals (35/35) but no in TB+ patients. The segregation between TB+ and TB- does not cluster into specific recognition of distinct MTB proteins, but into specific peptide epitope 'hotspots' at different locations within the same protein. Antigen recognition pattern profiles in serum from TB+ patients from Armenia vs. patients recruited in Sweden showed that IgG-defined MTB epitopes are very similar in individuals with different genetic background.ConclusionsA uniform target MTB IgG-epitope recognition pattern exists in pulmonary tuberculosis. Unbiased, high-content peptide microarray chip-based testing of clinically well-defined populations allows to visualize biologically relevant targets useful for development of novel TB diagnostics and vaccines.

Project description:A method for improving crystallographic phases is presented that is based on the preferential occurrence of certain local patterns of electron density in macromolecular electron-density maps. The method focuses on the relationship between the value of electron density at a point in the map and the pattern of density surrounding this point. Patterns of density that can be superimposed by rotation about the central point are considered equivalent. Standard templates are created from experimental or model electron-density maps by clustering and averaging local patterns of electron density. The clustering is based on correlation coefficients after rotation to maximize the correlation. Experimental or model maps are also used to create histograms relating the value of electron density at the central point to the correlation coefficient of the density surrounding this point with each member of the set of standard patterns. These histograms are then used to estimate the electron density at each point in a new experimental electron-density map using the pattern of electron density at points surrounding that point and the correlation coefficient of this density to each of the set of standard templates, again after rotation to maximize the correlation. The method is strengthened by excluding any information from the point in question from both the templates and the local pattern of density in the calculation. A function based on the origin of the Patterson function is used to remove information about the electron density at the point in question from nearby electron density. This allows an estimation of the electron density at each point in a map, using only information from other points in the process. The resulting estimates of electron density are shown to have errors that are nearly independent of the errors in the original map using model data and templates calculated at a resolution of 2.6 A. Owing to this independence of errors, information from the new map can be combined in a simple fashion with information from the original map to create an improved map. An iterative phase-improvement process using this approach and other applications of the image-reconstruction method are described and applied to experimental data at resolutions ranging from 2.4 to 2.8 A.