Dataset Information

MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets.

ABSTRACT: Peptide recognition domains and transcription factors play crucial roles in cellular signaling. They bind linear stretches of amino acids or nucleotides, respectively, with high specificity. Experimental techniques that assess the binding specificity of these domains, such as microarrays or phage display, can retrieve thousands of distinct ligands, providing detailed insight into binding specificity. In particular, the advent of next-generation sequencing has recently increased the throughput of such methods by several orders of magnitude. These advances have helped reveal the presence of distinct binding specificity classes that co-exist within a set of ligands interacting with the same target. Here, we introduce a software system called MUSI that can rapidly analyze very large data sets of binding sequences to determine the relevant binding specificity patterns. Our pipeline provides two major advances. First, it can detect previously unrecognized multiple specificity patterns in any data set. Second, it offers integrated processing of very large data sets from next-generation sequencing machines. The results are visualized as multiple sequence logos describing the different binding preferences of the protein under investigation. We demonstrate the performance of MUSI by analyzing recent phage display data for human SH3 domains as well as microarray data for mouse transcription factors.

SUBMITTER: Kim T

PROVIDER: S-EPMC3315295 | biostudies-literature | 2012 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets.

Kim Taehyung T Tyndel Marc S MS Huang Haiming H Sidhu Sachdev S SS Bader Gary D GD Gfeller David D Kim Philip M PM

Nucleic acids research 20111230 6

Peptide recognition domains and transcription factors play crucial roles in cellular signaling. They bind linear stretches of amino acids or nucleotides, respectively, with high specificity. Experimental techniques that assess the binding specificity of these domains, such as microarrays or phage display, can retrieve thousands of distinct ligands, providing detailed insight into binding specificity. In particular, the advent of next-generation sequencing has recently increased the throughput of ...[more]

PMID: 22210894

Similar Datasets

Project description:BackgroundSurveillance of ectopic pregnancy (EP) using electronic databases is important. To our knowledge, no published study has assessed the validity of EP case ascertainment using electronic health records.ObjectiveWe aimed to assess the validity of an enhanced version of a previously validated algorithm, which used a combination of encounters with EP-related diagnostic/procedure codes and methotrexate injections.MethodsMedical records of 500 women aged 15-44 years with membership at Kaiser Permanente Southern and Northern California between 2009 and 2018 and a potential EP were randomly selected for chart review, and true cases were identified. The enhanced algorithm included diagnostic/procedure codes from the International Classification of Diseases, Tenth Revision, used telephone appointment visits, and excluded cases with only abdominal EP diagnosis codes. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and overall performance (Youden index and F-score) of the algorithm were evaluated and compared to the validated algorithm.ResultsThere were 334 true positive and 166 true negative EP cases with available records. True positive and true negative EP cases did not differ significantly according to maternal age, race/ethnicity, and smoking status. EP cases with only one encounter and non-tubal EPs were more likely to be misclassified. The sensitivity, specificity, PPV, and NPV of the enhanced algorithm for EP were 97.6%, 84.9%, 92.9%, and 94.6%, respectively. The Youden index and F-score were 82.5% and 95.2%, respectively. The sensitivity and NPV were lower for the previously published algorithm at 94.3% and 88.1%, respectively. The sensitivity of surgical procedure codes from electronic chart abstraction to correctly identify surgical management was 91.9%. The overall accuracy, defined as the percentage of EP cases with correct management (surgical, medical, and unclassified) identified by electronic chart abstraction, was 92.3%.ConclusionsThe performance of the enhanced algorithm for EP case ascertainment in integrated health care databases is adequate to allow for use in future epidemiological studies. Use of this algorithm will likely result in better capture of true EP cases than the previously validated algorithm.

Project description:BackgroundWith the current focus on personalized medicine, patient/subject level inference is often of key interest in translational research. As a result, random effects models (REM) are becoming popular for patient level inference. However, for very large data sets that are characterized by large sample size, it can be difficult to fit REM using commonly available statistical software such as SAS since they require inordinate amounts of computer time and memory allocations beyond what are available preventing model convergence. For example, in a retrospective cohort study of over 800,000 Veterans with type 2 diabetes with longitudinal data over 5 years, fitting REM via generalized linear mixed modeling using currently available standard procedures in SAS (e.g. PROC GLIMMIX) was very difficult and same problems exist in Stata's gllamm or R's lme packages. Thus, this study proposes and assesses the performance of a meta regression approach and makes comparison with methods based on sampling of the full data.DataWe use both simulated and real data from a national cohort of Veterans with type 2 diabetes (n=890,394) which was created by linking multiple patient and administrative files resulting in a cohort with longitudinal data collected over 5 years.Methods and resultsThe outcome of interest was mean annual HbA1c measured over a 5 years period. Using this outcome, we compared parameter estimates from the proposed random effects meta regression (REMR) with estimates based on simple random sampling and VISN (Veterans Integrated Service Networks) based stratified sampling of the full data. Our results indicate that REMR provides parameter estimates that are less likely to be biased with tighter confidence intervals when the VISN level estimates are homogenous.ConclusionWhen the interest is to fit REM in repeated measures data with very large sample size, REMR can be used as a good alternative. It leads to reasonable inference for both Gaussian and non-Gaussian responses if parameter estimates are homogeneous across VISNs.

Dataset Information

MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets.

Publications

MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets