Dataset Information

Significance analysis of lexical bias in microarray data.

ABSTRACT:

Background

Genes that are determined to be significantly differentially regulated in microarray analyses often appear to have functional commonalities, such as being components of the same biochemical pathway. This results in certain words being under- or overrepresented in the list of genes. Distinguishing between biologically meaningful trends and artifacts of annotation and analysis procedures is of the utmost importance, as only true biological trends are of interest for further experimentation. A number of sophisticated methods for identification of significant lexical trends are currently available, but these methods are generally too cumbersome for practical use by most microarray users.

Results

We have developed a tool, LACK, for calculating the statistical significance of apparent lexical bias in microarray datasets. The frequency of a user-specified list of search terms in a list of genes which are differentially regulated is assessed for statistical significance by comparison to randomly generated datasets. The simplicity of the input files and user interface targets the average microarray user who wishes to have a statistical measure of apparent lexical trends in analyzed datasets without the need for bioinformatics skills. The software is available as Perl source or a Windows executable.

Conclusion

We have used LACK in our laboratory to generate biological hypotheses based on our microarray data. We demonstrate the program's utility using an example in which we confirm significant upregulation of SPI-2 pathogenicity island of Salmonella enterica serovar Typhimurium by the cation chelator dipyridyl.

SUBMITTER: Kim CC

PROVIDER: S-EPMC153504 | biostudies-literature | 2003 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Significance analysis of lexical bias in microarray data.

Kim Charles C CC Falkow Stanley S

BMC bioinformatics 20030403

<h4>Background</h4>Genes that are determined to be significantly differentially regulated in microarray analyses often appear to have functional commonalities, such as being components of the same biochemical pathway. This results in certain words being under- or overrepresented in the list of genes. Distinguishing between biologically meaningful trends and artifacts of annotation and analysis procedures is of the utmost importance, as only true biological trends are of interest for further expe ...[more]

PMID: 12697067

Similar Datasets

Project description:BackgroundAlthough fold change is a commonly used criterion in quantitative proteomics for differentiating regulated proteins, it does not provide an estimation of false positive and false negative rates that is often desirable in a large-scale quantitative proteomic analysis. We explore the possibility of applying the Significance Analysis of Microarray (SAM) method (PNAS 98:5116-5121) to a differential proteomics problem of two samples with replicates. The quantitative proteomic analysis was carried out with nanoliquid chromatography/linear iron trap-Fourier transform mass spectrometry. The biological sample model included two Mycobacterium smegmatis unlabeled cell cultures grown at pH 5 and pH 7. The objective was to compare the protein relative abundance between the two unlabeled cell cultures, with an emphasis on significance analysis of protein differential expression using the SAM method. Results using the SAM method are compared with those obtained by fold change and the conventional t-test.ResultsWe have applied the SAM method to solve the two-sample significance analysis problem in liquid chromatography/mass spectrometry (LC/MS) based quantitative proteomics. We grew the pH5 and pH7 unlabelled cell cultures in triplicate resulting in 6 biological replicates. Each biological replicate was mixed with a common 15N-labeled reference culture cells for normalization prior to SDS/PAGE fractionation and LC/MS analysis. For each biological replicate, one center SDS/PAGE gel fraction was selected for triplicate LC/MS analysis. There were 121 proteins quantified in at least 5 of the 6 biological replicates. Of these 121 proteins, 106 were significant in differential expression by the t-test (p < 0.05) based on peptide-level replicates, 54 were significant in differential expression by SAM with Delta = 0.68 cutoff and false positive rate at 5%, and 29 were significant in differential expression by the t-test (p < 0.05) based on protein-level replicates. The results indicate that SAM appears to overcome the false positives one encounters using the peptide-based t-test while allowing for identification of a greater number of differentially expressed proteins than the protein-based t-test.ConclusionWe demonstrate that the SAM method can be adapted for effective significance analysis of proteomic data. It provides much richer information about the protein differential expression profiles and is particularly useful in the estimation of false discovery rates and miss rates.

Dataset Information

Significance analysis of lexical bias in microarray data.

Background

Results

Conclusion

Publications

Significance analysis of lexical bias in microarray data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets