Dataset Information

An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases.

ABSTRACT: BACKGROUND: Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, most existing databases provide information only on functional variants at specific locations on the genome, or deal with only a few genes associated with disease. There is no combined resource to widely support gene-, SNP-, and disease-related information, and to capture relationships among such data. Therefore, we developed an integrated database-pipeline system for studying SNPs and diseases. RESULTS: To implement the pipeline system for the integrated database, we first unified complicated and redundant disease terms and gene names using the Unified Medical Language System (UMLS) for classification and noun modification, and the HUGO Gene Nomenclature Committee (HGNC) and NCBI gene databases. Next, we collected and integrated representative databases for three categories of information. For genes and proteins, we examined the NCBI mRNA, UniProt, UCSC Table Track and MitoDat databases. For genetic variants we used the dbSNP, JSNP, ALFRED, and HGVbase databases. For disease, we employed OMIM, GAD, and HGMD databases. The database-pipeline system provides a disease thesaurus, including genes and SNPs associated with disease. The search results for these categories are available on the web page http://diseasome.kobic.re.kr/, and a genome browser is also available to highlight findings, as well as to permit the convenient review of potentially deleterious SNPs among genes strongly associated with specific diseases and clinical phenotypes. CONCLUSION: Our system is designed to capture the relationships between SNPs associated with disease and disease-causing genes. The integrated database-pipeline provides a list of candidate genes and SNP markers for evaluation in both epidemiological and molecular biological approaches to diseases-gene association studies. Furthermore, researchers then can decide semi-automatically the data set for association studies while considering the relationships between genetic variation and diseases. The database can also be economical for disease-association studies, as well as to facilitate an understanding of the processes which cause disease. Currently, the database contains 14,674 SNP records and 109,715 gene records associated with human diseases and it is updated at regular intervals.

SUBMITTER: Yang JO

PROVIDER: S-EPMC2638159 | biostudies-literature | 2008

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases.

Yang Jin Ok JO Hwang Sohyun S Oh Jeongsu J Bhak Jong J Sohn Tae-Kwon TK

BMC bioinformatics 20081212

<h4>Background</h4>Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, most existing databases provide information only on functional variants at specific locations on the genome, or deal with only a few genes associated with disease. There is no combined resource to ...[more]

PMID: 19091018

Similar Datasets

Project description:Ovarian cancer (OC) stands as a formidable adversary among women, remaining a leading cause of cancer-related mortality owing to its aggressive and invasive nature. Investigating prognostic markers intricately linked to OC's molecular pathogenesis represents a critical avenue for enhancing patient outcomes and survival prospects. In this comprehensive study, we embarked on a bioinformatics journey, leveraging the vast repository of single nucleotide polymorphism (SNP) data from OC patients available within the TCGA database. Our overarching goal was to unearth the genetic underpinnings of OC, shedding light on potential prognostic markers that could significantly impact clinical decision-making and patient care. Our meticulous analysis led to the discovery of five mutated genes-APOB, BRCA1, COL6A3, LRP1, and LRP1B-engaged in the intricate world of lipid metabolism. These genes, previously unexplored in the context of OC, emerged as prominent figures in our investigation, showcasing their potential roles in OC progression. The intricate interplay between lipid metabolism and cancer development has garnered considerable attention in recent years, and our findings underscore the relevance of these genes in the context of OC. To fortify our discoveries, we delved into the realm of survival analysis, a pivotal component of our investigation. The results yielded compelling evidence of significant correlations between patient survival and the expression levels of the aforementioned genes. This critical insight underscores the potential utility of these genes as prognostic markers, illuminating a path toward more personalized and effective approaches to patient care. Our study represents a multifaceted approach to unraveling the complex molecular pathogenesis of OC. By harnessing the power of high-throughput data mining, we uncovered genetic insights that may reshape our understanding of this formidable disease. We complemented these findings with advanced techniques such as RT-qPCR and Western blot, further dissecting the intricacies of OC's molecular landscape. This holistic approach not only deepens our understanding but also provides essential bioinformatics information that holds promise in assessing patient prognosis. In summary, our study represents a significant stride in the quest to decode the molecular intricacies of ovarian cancer. Our findings spotlight the potential prognostic significance of APOB, BRCA1, COL6A3, LRP1, and LRP1B, inviting further exploration into their roles in OC progression. Ultimately, our research carries the potential to shape the future of OC management, offering a glimpse into a more personalized and effective approach to patient care.

Project description:Aquaporins and aquaglyceroporins belong to the superfamily of major intrinsic proteins (MIPs), and they transport water and other neutral solutes such as glycerol. These channel proteins are involved in vital physiological processes and are implicated in several human diseases. Experimentally determined structures of MIPs from diverse organisms reveal a unique hour-glass fold with six transmembrane helices and two half-helices. MIP channels have two constrictions formed by Asn-Pro-Ala (NPA) motifs and aromatic/arginine selectivity filters (Ar/R SFs). Several reports have found associations among single-nucleotide polymorphisms (SNPs) in human aquaporins (AQPs) with diseases in specific populations. In this study, we have compiled 2798 SNPs that give rise to missense mutations in 13 human AQPs. To understand the nature of missense substitutions, we have systematically analyzed the pattern of substitutions. We found several examples in which substitutions could be considered as non-conservative that include small to big or hydrophobic to charged residues. We also analyzed these substitutions in the context of structure. We have identified SNPs that occur in NPA motifs or Ar/R SFs, and they will most certainly disrupt the structure and/or transport properties of human AQPs. We found 22 examples in which missense SNP substitutions that are mostly non-conservative in nature have given rise to pathogenic conditions as found in the Online Mendelian Inheritance in Man database. It is most likely that not all missense SNPs in human AQPs will result in diseases. However, understanding the effect of missense SNPs on the structure and function of human AQPs is important. In this direction, we have developed a database dbAQP-SNP that contains information about all 2798 SNPs. This database has several features and search options that can help the user to find SNPs in specific positions of human AQPs including the functionally and/or structurally important regions. dbAQP-SNP (http://bioinfo.iitk.ac.in/dbAQP-SNP) is freely available to the academic community. Database URL http://bioinfo.iitk.ac.in/dbAQP-SNP.

Project description:BackgroundThe mapping of quantitative trait loci in rat and mouse has been extremely successful in identifying chromosomal regions associated with human disease-related phenotypes. However, identifying the specific phenotype-causing DNA sequence variations within a quantitative trait locus has been much more difficult. The recent availability of genomic sequence from several mouse inbred strains (including C57BL/6J, 129X1/SvJ, 129S1/SvImJ, A/J, and DBA/2J) has made it possible to catalog DNA sequence differences within a quantitative trait locus derived from crosses between these strains. However, even for well-defined quantitative trait loci (<10 Mb) the identification of candidate functional DNA sequence changes remains challenging due to the high density of sequence variation between strains.DescriptionTo help identify functional DNA sequence variations within quantitative trait loci we have used the Ensembl annotated genome sequence to compile a database of mouse single nucleotide polymorphisms (SNPs) that are predicted to cause missense, nonsense, frameshift, or splice site mutations (available at http://bioinfo.embl.it/SnpApplet/). For missense mutations we have used the PolyPhen and PANTHER algorithms to predict whether amino acid changes are likely to disrupt protein function.ConclusionWe have developed a database of mouse SNPs predicted to cause missense, nonsense, frameshift, and splice-site mutations. Our analysis revealed that 20% and 14% of missense SNPs are likely to be deleterious according to PolyPhen and PANTHER, respectively, and 6% are considered deleterious by both algorithms. The database also provides gene expression and functional annotations from the Symatlas, Gene Ontology, and OMIM databases to further assess candidate phenotype-causing mutations. To demonstrate its utility, we show that Mouse SNP Miner successfully finds a previously identified candidate SNP in the taste receptor, Tas1r3, that underlies sucrose preference in the C57BL/6J strain. We also use Mouse SNP Miner to derive a list of candidate phenotype-causing mutations within a previously uncharacterized QTL for response to morphine in the 129/Sv strain.

Dataset Information

An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases.

Publications

An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets